API reference
Resource-oriented endpoints for the marketplace. All paths are prefixed with /v1. Timestamps are RFC3339; amounts are in minor units where applicable.
Conventions
Use Idempotency-Key on mutating requests to safely retry from workers. Pagination follows cursor style with limit and cursor.
/v1/servicesList services available to your workspace with capability tags.
/v1/completeRun a text completion on a selected LLM service.
/v1/embeddingsCreate embeddings for passages or batched inputs.
/v1/usageRetrieve usage aggregates and per-service spend.
Services
Each service has a stable service_id, pricing tier, and capability manifest (modalities, max context, region hints).
{
"data": [
{
"service_id": "llm.general.v1",
"provider": "acme-ml",
"capabilities": ["chat", "json_mode"],
"max_context_tokens": 128000,
"regions": ["us-east-1", "eu-west-1"]
}
],
"next_cursor": null
}POST /complete
Primary inference endpoint for chat-style models. Optional response_format enforces JSON when the service supports structured output.
{
"service_id": "llm.general.v1",
"messages": [
{ "role": "system", "content": "You are a concise assistant." },
{ "role": "user", "content": "Draft a release note." }
],
"temperature": 0.2,
"max_tokens": 512,
"response_format": { "type": "json_object" }
}{
"id": "cmp_01h2xz3k9q8y9z0",
"model": "general-v1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"title\":\"v2.4\",\"highlights\":[\"Faster cold start\"]}"
},
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 42, "completion_tokens": 118, "total_tokens": 160 }
}Errors & limits
Errors use a unified envelope. Retry on 429 and 503 with exponential backoff; respect the Retry-After header when present.
{
"error": {
"type": "rate_limit_exceeded",
"message": "Too many requests for this workspace.",
"param": null,
"code": "rate_limit"
}
}