API reference

Resource-oriented endpoints for the marketplace. All paths are prefixed with /v1. Timestamps are RFC3339; amounts are in minor units where applicable.

Conventions

Use Idempotency-Key on mutating requests to safely retry from workers. Pagination follows cursor style with limit and cursor.

GET/v1/services

List services available to your workspace with capability tags.

POST/v1/complete

Run a text completion on a selected LLM service.

POST/v1/embeddings

Create embeddings for passages or batched inputs.

GET/v1/usage

Retrieve usage aggregates and per-service spend.

Services

Each service has a stable service_id, pricing tier, and capability manifest (modalities, max context, region hints).

GET /v1/services — 200 response (truncated)

{
  "data": [
    {
      "service_id": "llm.general.v1",
      "provider": "acme-ml",
      "capabilities": ["chat", "json_mode"],
      "max_context_tokens": 128000,
      "regions": ["us-east-1", "eu-west-1"]
    }
  ],
  "next_cursor": null
}

POST /complete

Primary inference endpoint for chat-style models. Optional response_format enforces JSON when the service supports structured output.

Request body

{
  "service_id": "llm.general.v1",
  "messages": [
    { "role": "system", "content": "You are a concise assistant." },
    { "role": "user", "content": "Draft a release note." }
  ],
  "temperature": 0.2,
  "max_tokens": 512,
  "response_format": { "type": "json_object" }
}

Response body

{
  "id": "cmp_01h2xz3k9q8y9z0",
  "model": "general-v1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\"title\":\"v2.4\",\"highlights\":[\"Faster cold start\"]}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 42, "completion_tokens": 118, "total_tokens": 160 }
}

Errors & limits

Errors use a unified envelope. Retry on 429 and 503 with exponential backoff; respect the Retry-After header when present.

Error object

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Too many requests for this workspace.",
    "param": null,
    "code": "rate_limit"
  }
}