Model Deployments
Deploy models from HuggingFace Hub, R2, or any URL as persistent services. The platform handles weight pulling, process management, health checking, and cost metering.
Deployment Targets
Section titled “Deployment Targets”| Target | Description | Cost |
|---|---|---|
prism_node | Your machine via prism node up | Free (your hardware) |
runpod | RunPod cloud GPU | Per-hour GPU pricing |
lambda | Lambda cloud GPU | Per-hour GPU pricing |
Create a Deployment
Section titled “Create a Deployment”POST /api/v1/compute/deployments{ "name": "mace-mp-0", "image": "marc27/mace:latest", "target": "prism_node", "gpu_type": "A100-80GB", "deploy_config": { "port": 8080, "health_path": "/health" }, "budget_max_usd": 50.0}Or deploy from a marketplace resource:
{ "name": "mace-prod", "resource_slug": "mace-mp-0", "target": "prism_node"}Generic Model Deploy (Runtime)
Section titled “Generic Model Deploy (Runtime)”The runtime has a universal model deployment endpoint that pulls weights from any source:
POST /deploy{ "deployment_id": "my-model", "weights_source": "hf://sentence-transformers/paraphrase-MiniLM-L3-v2", "framework": "auto", "port": 9000, "gpu": false, "command": ["python", "serve.py", "--port", "9000"]}Weight sources:
hf://org/model— HuggingFace Hub (supports@revision)r2://path/to/weights— R2/S3 object storagehttps://url/to/model.bin— Direct URL/local/path— Local filesystem
Lifecycle
Section titled “Lifecycle”provisioning → pulling → starting → running → stopping → stopped ↘ unhealthy (3 consecutive failures) ↘ failed (crash/budget exceeded)Cost Metering
Section titled “Cost Metering”A background tick runs every 60 seconds:
- Increments
cost_accrued_usdfor all running deployments - Deployments exceeding
budget_max_usdare auto-stopped
WebSocket Protocol (Nodes)
Section titled “WebSocket Protocol (Nodes)”When a PRISM node receives a DeployModel message:
{"type": "deploy_model", "deployment_id": "...", "image": "...", "env_vars": {...}, "deploy_config": {...}}It should:
- Pull the image/weights
- Start the container with GPU passthrough
- Send
DeploymentReadywith endpoint URL - Periodically send
DeploymentHealthUpdate - Send
DeploymentStoppedon shutdown
API Endpoints
Section titled “API Endpoints”POST /api/v1/compute/deployments — CreateGET /api/v1/compute/deployments — List (with ?status= filter)GET /api/v1/compute/deployments/{id} — DetailDELETE /api/v1/compute/deployments/{id} — StopGET /api/v1/compute/deployments/{id}/health — Force health checkPRISM CLI
Section titled “PRISM CLI”prism deploy create --name mace --image marc27/mace:latest --gpu A100-80GBprism deploy create --name my-model --weights hf://org/model --target localprism deploy listprism deploy status <id>prism deploy stop <id>