live · 12 TEE-attested models

Sealed-room
LLM inference

the gateway · to private · local models

Your prompt enters a hardware-sealed enclave, the model runs, the answer leaves. Private by physics, not by policy.

enter the faraday read the docs →
scroll to enter

§ I — what

A doorway to
private inference

Every model on Faraday runs inside a Trusted Execution Environment— a hardware-isolated region where the operating system, hypervisor, and your hosting provider are cryptographically locked out of memory.

There is no non-TEE lane. Inference never leaves the enclave in plaintext. Your prompts are not our business — and by design, they cannot be.

We are not asking you to trust a privacy policy. The silicon itself enforces the contract. Verify the attestation receipt. Trust the math.


§ II — how

Three steps.
That's it.

01 — generate
Get your token
Open the dashboard. Your 16-character bearer token is generated client-side, hashed with Argon2id, and never transmitted in plaintext.
fara-XKQJ-NV7R-M4TL
02 — fund
Deposit TAO
Send TAO to your derived Bittensor address. No email, no credit card, no KYC. Balance reflects on-chain state within one block.
03 — call
Use the API
Standard OpenAI-compatible endpoint. Drop in your bearer token. Works with any existing OpenAI SDK, LangChain, or LlamaIndex setup.
# completions
curl https://api.faraday.space/v1/chat/completions \
  -H "Authorization: Bearer fara-XKQJ-NV7R-M4TL" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-235b-a22b",
    "messages": [{"role":"user","content":"Hello"}],
    "max_tokens": 512
  }'
# drop-in replacement for OpenAI
import openai

client = openai.OpenAI(
    base_url="https://api.faraday.space/v1",
    api_key="fara-XKQJ-NV7R-M4TL",
)

response = client.chat.completions.create(
    model="qwen/qwen3-235b-a22b",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=512,
)
print(response.choices[0].message.content)
// drop-in replacement for OpenAI
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.faraday.space/v1",
  apiKey:  "fara-XKQJ-NV7R-M4TL",
});

const response = await client.chat.completions.create({
  model:     "qwen/qwen3-235b-a22b",
  messages:  [{ role: "user", content: "Hello" }],
  max_tokens: 512,
});
console.log(response.choices[0].message.content);
# streaming — works exactly like OpenAI
import openai

client = openai.OpenAI(
    base_url="https://api.faraday.space/v1",
    api_key="fara-XKQJ-NV7R-M4TL",
)

stream = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=[{"role": "user", "content": "Explain TEEs"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

§ III — models

12 attested models

15% flat markup · no tiers · no minimums
model slug enclave ctx input (τ / M) output (τ / M) status
qwen/qwen3-235b-a22b NVIDIA H100 CC 128k 0.32 1.14 live
qwen/qwen3-32b NVIDIA H100 CC 128k 0.12 0.38 live
qwen/qwen3-8b AMD SEV-SNP 128k 0.04 0.14 live
google/gemma-3-27b-it Intel TDX 128k 0.09 0.28 live
google/gemma-3-12b-it Intel TDX 128k 0.04 0.13 live
deepseek/deepseek-r1 NVIDIA H100 CC 64k 0.41 1.60 live
deepseek/deepseek-v3 NVIDIA H100 CC 64k 0.22 0.88 live
meta/llama-4-scout AMD SEV-SNP 256k 0.08 0.24 live
meta/llama-3.3-70b AMD SEV-SNP 128k 0.18 0.52 live
mistral/mistral-small-3.1 Intel TDX 128k 0.06 0.18 live
microsoft/phi-4 Intel TDX 16k 0.03 0.09 live
cohere/command-r-plus NVIDIA H100 CC 128k 0.28 0.96 live

§ IV — why TEEs

Hardware cannot
lie to you

A Trusted Execution Environment is a hardware-isolated region inside a CPU or GPU where memory is encrypted in silicon. The host OS, hypervisor, and cloud provider cannot read it — not with root access, not with physical access.

Before inference begins, the silicon vendor signs an attestation receipt: a cryptographic proof that the correct, unmodified enclave firmware is running on genuine hardware. You can verify that proof yourself, any time, for free.

This is not a promise in a terms-of-service document. It is a property of physics and mathematics.

Intel TDX
Trust Domain Extensions. Hardware-level VM isolation with encrypted memory pages and TDREPORT-based remote attestation baked into 4th-gen Xeon silicon.
AMD SEV-SNP
Secure Encrypted Virtualization with Nested Page protection. Memory integrity enforced at page granularity; VCEK-signed attestation reports chained to AMD root.
NVIDIA H100 CC
Confidential Computing mode on Hopper. GPU VRAM encrypted by the GPC; RIM-based attestation from NVIDIA OCSP. First production GPU-side TEE at AI scale.

§ V — identity

No passwords.
No email. Nothing.

your account · base32 · 16 chars
XKQJNV7RM4TLPW2F_
generated in your browser
hashed with Argon2id before use
we cannot recover it — by design

Your identity is a 16-character base32 code generated entirely client-side. We store only its Argon2id hash — your raw code never touches our servers.

There are no accounts, no email addresses, no passwords to phish, no OAuth flows to compromise. The code is your key. Treat it like a private key — back it up offline.

Lost your code? There is no recovery path. This is intentional. Recovery mechanisms are attack surfaces. We removed the attack surface entirely.

Your derived Bittensor address is deterministic from your code. TAO balance is always readable on-chain — it belongs to you, not Faraday.


§ VI — pricing

Pay per token.
In TAO.

We apply a 15% flat markup over the underlying compute cost on every model, at every scale. No tiers. No minimums. No maximums. No volume discounts you have to qualify for. No subscription. No seat fees. Pay for exactly what you use in TAO — Bittensor's native settlement token.

our margin
15%
Flat. Always. Verified on-chain via the token contract. No hidden fees, no burst pricing.
minimum spend
τ0
No minimum deposit. Top up any amount. Unused balance stays in your on-chain wallet.
billing unit
token
Charged per token, not per request. Partial requests are prorated. Streaming counts at completion.

§ VII — questions

Things people
actually ask

It is unrecoverable. We store only the Argon2id hash of your code, never the code itself. If you lose it, generate a new one from the dashboard and move any remaining TAO balance on-chain to your new derived address. This is a deliberate design choice: recovery mechanisms are attack surfaces.
Prompt content is never logged. We record only aggregate metadata — token counts, latency, model slug — for billing and monitoring. The TEE enforcement means we cannot access prompt content even if we wanted to: TLS terminates inside the enclave, not on the host.
Faraday runs on Bittensor's inference subnet. TAO is the native settlement token — there is no conversion overhead or custodial exposure on our side. You fund directly from your own wallet; no intermediary holds your funds. USDC on-ramps are on the roadmap for later in 2026.
Every API response includes a signed attestation receipt from the silicon vendor in the X-Faraday-Attestation response header (base64-encoded). Our open-source fara verify CLI downloads the receipt, checks the signature against Intel/AMD/NVIDIA root certificates, and prints a pass/fail with the full certificate chain. Run it before any sensitive session.
No. TLS terminates inside the TEE. The TLS private key is generated inside the enclave at boot and never exported to the host. An attacker with full root access to the physical machine — or to our infrastructure — sees only ciphertext. The attestation receipt proves the TLS key is enclave-owned.
Yes. Set "stream": true in your request body. The response uses standard server-sent events (SSE), identical to the OpenAI streaming format. All 12 models support streaming. Token billing accrues at the end of the stream, not per chunk. There is no latency penalty for streaming vs. non-streaming.
The enclave firmware, the fara verify attestation CLI, and the token derivation logic are fully open source on GitHub. The billing layer, orchestration, and router are currently closed-source. We intend to open more of the stack over time — verifiability is the point of everything we build.
Not yet. Faraday is currently self-serve only. If you need dedicated capacity, custom SLAs, compliance documentation, or a private subnet deployment, reach out via X or the contact address in the footer — we are evaluating enterprise options for later in 2026.

§ VIII — docs

API
reference

Base URL

All API requests use HTTPS. There is no HTTP fallback.

https://api.faraday.space/v1
TLS terminates inside the TEE enclave, not on a load balancer. The TLS certificate's public key is included in the attestation receipt so you can verify the connection end-to-end.

Authentication

All requests require a bearer token in the Authorization header. Tokens follow the format fara-XXXX-XXXX-XXXX where each segment is 4 base32 characters. Generate yours from the dashboard — it is created client-side and never transmitted in plaintext.

Authorization: Bearer fara-XKQJ-NV7R-M4TL
Never share your token. Anyone with it can spend your TAO balance. There is no way to revoke a token — if compromised, generate a new one and move your balance to the new derived address.

Quickstart

Send your first request in under 60 seconds:

  • Generate a token at faraday.space/dashboard
  • Fund your derived TAO address (any amount ≥ 0.001 τ)
  • Copy the curl below, substitute your token, run it
curl https://api.faraday.space/v1/chat/completions \
  -H "Authorization: Bearer fara-XKQJ-NV7R-M4TL" \
  -H "Content-Type: application/json" \
  -d '{
    "model":      "microsoft/phi-4",
    "messages":   [{"role":"user","content":"Hello from the enclave!"}],
    "max_tokens": 64
  }'

Endpoints

The API is a strict superset of the OpenAI Chat Completions API.

method path description
POST /v1/chat/completions Create a chat completion (supports streaming)
GET /v1/models List all available attested models
GET /v1/balance Return current TAO balance for the authenticated token
GET /v1/attest Return latest attestation receipt for the enclave serving your request
GET /v1/attest/:model Return attestation receipt for a specific model's enclave

Request parameters

POST /v1/chat/completions accepts standard OpenAI fields. Faraday-specific extensions are prefixed with x_fara_.

parametertypedescription
modelrequired string Model slug from the models table (e.g. qwen/qwen3-235b-a22b)
messagesrequired array Array of message objects with role and content
max_tokensoptional integer Maximum tokens to generate. Defaults to model maximum.
temperatureoptional float Sampling temperature 0–2. Default 1.0.
streamoptional boolean Enable SSE streaming. Default false.
top_poptional float Nucleus sampling probability mass. Default 1.0.
stopoptional string | array Up to 4 stop sequences.
x_fara_attestoptional boolean Include full attestation receipt in response headers. Default false.

Streaming

Set "stream": true to receive a server-sent events (SSE) stream. Each event is a data: line containing a JSON delta chunk, identical to the OpenAI format. The stream ends with data: [DONE].

Token billing is calculated at the end of the full stream. Abandoned streams (client disconnect) are billed for tokens generated up to the point of disconnect.

# SSE response format
data: {"id":"cmpl-abc","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"cmpl-abc","choices":[{"delta":{"content":" from"},"index":0}]}
data: {"id":"cmpl-abc","choices":[{"delta":{"content":" the"},"index":0}]}
data: {"id":"cmpl-abc","choices":[{"delta":{"content":" enclave"},"index":0}]}
data: [DONE]

Error codes

Errors follow the OpenAI error envelope: {"error":{"code":"...","message":"...","type":"..."}}

httpcodemeaning
401 invalid_token Token not recognised or malformed. Check prefix and format.
402 insufficient_balance Your TAO balance is below the estimated cost of the request.
404 model_not_found The requested model slug does not exist or is not live.
429 rate_limit_exceeded Too many requests. Back off and retry with exponential delay.
503 enclave_unavailable The enclave is rebooting for attestation refresh. Retry in ~30 s.
500 inference_error Unexpected error inside the enclave. Not billed. Retry is safe.

Rate limits

Rate limits are applied per token, not per IP. Current defaults:

  • 60 requests / minute across all models
  • 200 000 tokens / minute (input + output combined)
  • No daily cap — limits are rolling 60-second windows

When a limit is hit the response is 429 rate_limit_exceeded. The Retry-After header contains the number of seconds to wait. If you need higher limits, contact us — we can raise them on request.


Verify the TEE yourself

Install the open-source fara CLI (Go, Linux/macOS):

# install
go install github.com/faraday-space/fara@latest

# verify the enclave serving qwen3-235b right now
fara verify --model qwen/qwen3-235b-a22b

# output (example)
✓ enclave firmware hash  0xd4e9...f2a1  matches published digest
✓ AMD SEV-SNP VCEK cert  chained to AMD root CA
✓ TLS public key         matches cert inside attestation
✓ all checks passed — this enclave is genuine and unmodified

Attestation response header

Pass "x_fara_attest": true in any request (or set X-Fara-Attest: 1 as a request header) to receive the full attestation receipt in the response:

# response headers (excerpt)
X-Faraday-Attestation: <base64-encoded receipt>
X-Faraday-Enclave-ID:  amd-sev-snp::prod::0xd4e9f2a1
X-Faraday-Fw-Digest:   sha256:d4e9...f2a1

# decode and verify manually
echo "$RECEIPT" | base64 -d | fara verify --stdin
Attestation receipts are refreshed every 6 hours during scheduled enclave reboots. The X-Faraday-Attest-Expires header contains the Unix timestamp of the next refresh.