the gateway · to private · local models
Your prompt enters a hardware-sealed enclave, the model runs, the answer leaves. Private by physics, not by policy.
Every model on Faraday runs inside a Trusted Execution Environment— a hardware-isolated region where the operating system, hypervisor, and your hosting provider are cryptographically locked out of memory.
There is no non-TEE lane. Inference never leaves the enclave in plaintext. Your prompts are not our business — and by design, they cannot be.
We are not asking you to trust a privacy policy. The silicon itself enforces the contract. Verify the attestation receipt. Trust the math.
# completions curl https://api.faraday.space/v1/chat/completions \ -H "Authorization: Bearer fara-XKQJ-NV7R-M4TL" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen/qwen3-235b-a22b", "messages": [{"role":"user","content":"Hello"}], "max_tokens": 512 }'
# drop-in replacement for OpenAI import openai client = openai.OpenAI( base_url="https://api.faraday.space/v1", api_key="fara-XKQJ-NV7R-M4TL", ) response = client.chat.completions.create( model="qwen/qwen3-235b-a22b", messages=[{"role": "user", "content": "Hello"}], max_tokens=512, ) print(response.choices[0].message.content)
// drop-in replacement for OpenAI import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.faraday.space/v1", apiKey: "fara-XKQJ-NV7R-M4TL", }); const response = await client.chat.completions.create({ model: "qwen/qwen3-235b-a22b", messages: [{ role: "user", content: "Hello" }], max_tokens: 512, }); console.log(response.choices[0].message.content);
# streaming — works exactly like OpenAI import openai client = openai.OpenAI( base_url="https://api.faraday.space/v1", api_key="fara-XKQJ-NV7R-M4TL", ) stream = client.chat.completions.create( model="deepseek/deepseek-r1", messages=[{"role": "user", "content": "Explain TEEs"}], stream=True, ) for chunk in stream: print(chunk.choices[0].delta.content or "", end="")
| model slug | enclave | ctx | input (τ / M) | output (τ / M) | status |
|---|---|---|---|---|---|
| qwen/qwen3-235b-a22b | NVIDIA H100 CC | 128k | 0.32 | 1.14 | live |
| qwen/qwen3-32b | NVIDIA H100 CC | 128k | 0.12 | 0.38 | live |
| qwen/qwen3-8b | AMD SEV-SNP | 128k | 0.04 | 0.14 | live |
| google/gemma-3-27b-it | Intel TDX | 128k | 0.09 | 0.28 | live |
| google/gemma-3-12b-it | Intel TDX | 128k | 0.04 | 0.13 | live |
| deepseek/deepseek-r1 | NVIDIA H100 CC | 64k | 0.41 | 1.60 | live |
| deepseek/deepseek-v3 | NVIDIA H100 CC | 64k | 0.22 | 0.88 | live |
| meta/llama-4-scout | AMD SEV-SNP | 256k | 0.08 | 0.24 | live |
| meta/llama-3.3-70b | AMD SEV-SNP | 128k | 0.18 | 0.52 | live |
| mistral/mistral-small-3.1 | Intel TDX | 128k | 0.06 | 0.18 | live |
| microsoft/phi-4 | Intel TDX | 16k | 0.03 | 0.09 | live |
| cohere/command-r-plus | NVIDIA H100 CC | 128k | 0.28 | 0.96 | live |
A Trusted Execution Environment is a hardware-isolated region inside a CPU or GPU where memory is encrypted in silicon. The host OS, hypervisor, and cloud provider cannot read it — not with root access, not with physical access.
Before inference begins, the silicon vendor signs an attestation receipt: a cryptographic proof that the correct, unmodified enclave firmware is running on genuine hardware. You can verify that proof yourself, any time, for free.
This is not a promise in a terms-of-service document. It is a property of physics and mathematics.
Your identity is a 16-character base32 code generated entirely client-side. We store only its Argon2id hash — your raw code never touches our servers.
There are no accounts, no email addresses, no passwords to phish, no OAuth flows to compromise. The code is your key. Treat it like a private key — back it up offline.
Lost your code? There is no recovery path. This is intentional. Recovery mechanisms are attack surfaces. We removed the attack surface entirely.
Your derived Bittensor address is deterministic from your code. TAO balance is always readable on-chain — it belongs to you, not Faraday.
We apply a 15% flat markup over the underlying compute cost on every model, at every scale. No tiers. No minimums. No maximums. No volume discounts you have to qualify for. No subscription. No seat fees. Pay for exactly what you use in TAO — Bittensor's native settlement token.
X-Faraday-Attestation response header (base64-encoded).
Our open-source fara verify CLI downloads the receipt, checks the
signature against Intel/AMD/NVIDIA root certificates, and prints a pass/fail
with the full certificate chain. Run it before any sensitive session.
"stream": true in your request body. The response uses
standard server-sent events (SSE), identical to the OpenAI streaming format.
All 12 models support streaming. Token billing accrues at the end of the
stream, not per chunk. There is no latency penalty for streaming vs. non-streaming.
fara verify attestation CLI, and the
token derivation logic are fully open source on GitHub. The billing layer,
orchestration, and router are currently closed-source. We intend to open
more of the stack over time — verifiability is the point of everything we build.
All API requests use HTTPS. There is no HTTP fallback.
https://api.faraday.space/v1
All requests require a bearer token in the Authorization header.
Tokens follow the format fara-XXXX-XXXX-XXXX where each segment
is 4 base32 characters. Generate yours from the dashboard — it is created
client-side and never transmitted in plaintext.
Authorization: Bearer fara-XKQJ-NV7R-M4TL
Send your first request in under 60 seconds:
faraday.space/dashboardcurl https://api.faraday.space/v1/chat/completions \ -H "Authorization: Bearer fara-XKQJ-NV7R-M4TL" \ -H "Content-Type: application/json" \ -d '{ "model": "microsoft/phi-4", "messages": [{"role":"user","content":"Hello from the enclave!"}], "max_tokens": 64 }'
The API is a strict superset of the OpenAI Chat Completions API.
POST /v1/chat/completions accepts standard OpenAI fields.
Faraday-specific extensions are prefixed with x_fara_.
Set "stream": true to receive a server-sent events (SSE) stream.
Each event is a data: line containing a JSON delta chunk, identical
to the OpenAI format. The stream ends with data: [DONE].
Token billing is calculated at the end of the full stream. Abandoned streams (client disconnect) are billed for tokens generated up to the point of disconnect.
# SSE response format data: {"id":"cmpl-abc","choices":[{"delta":{"content":"Hello"},"index":0}]} data: {"id":"cmpl-abc","choices":[{"delta":{"content":" from"},"index":0}]} data: {"id":"cmpl-abc","choices":[{"delta":{"content":" the"},"index":0}]} data: {"id":"cmpl-abc","choices":[{"delta":{"content":" enclave"},"index":0}]} data: [DONE]
Errors follow the OpenAI error envelope:
{"error":{"code":"...","message":"...","type":"..."}}
Rate limits are applied per token, not per IP. Current defaults:
When a limit is hit the response is 429 rate_limit_exceeded.
The Retry-After header contains the number of seconds to wait.
If you need higher limits, contact us — we can raise them on request.
Install the open-source fara CLI (Go, Linux/macOS):
# install go install github.com/faraday-space/fara@latest # verify the enclave serving qwen3-235b right now fara verify --model qwen/qwen3-235b-a22b # output (example) ✓ enclave firmware hash 0xd4e9...f2a1 matches published digest ✓ AMD SEV-SNP VCEK cert chained to AMD root CA ✓ TLS public key matches cert inside attestation ✓ all checks passed — this enclave is genuine and unmodified
Pass "x_fara_attest": true in any request (or set
X-Fara-Attest: 1 as a request header) to receive the
full attestation receipt in the response:
# response headers (excerpt) X-Faraday-Attestation: <base64-encoded receipt> X-Faraday-Enclave-ID: amd-sev-snp::prod::0xd4e9f2a1 X-Faraday-Fw-Digest: sha256:d4e9...f2a1 # decode and verify manually echo "$RECEIPT" | base64 -d | fara verify --stdin
X-Faraday-Attest-Expires
header contains the Unix timestamp of the next refresh.