FaradaySealed-room LLM inference

§ IIhow

Three steps.
That's it.

01generate

Get your token

Open the dashboard. Your 16-character bearer token is generated client-side, hashed with Argon2id, and never transmitted in plaintext.

fara-XKQJ-NV7R-M4TL

02fund

Deposit TAO

Send TAO to your derived Bittensor address. No email, no credit card, no KYC. Balance reflects on-chain state within one block.

03call

Use the API

Standard OpenAI-compatible endpoint. Drop in your bearer token. Works with any existing OpenAI SDK, LangChain, or LlamaIndex setup.

# completions
curl https://api.faraday.space/v1/chat/completions \
  -H "Authorization: Bearer fara-XKQJ-NV7R-M4TL" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-235b-a22b",
    "messages": [{"role":"user","content":"Hello"}],
    "max_tokens": 512
  }'

# drop-in replacement for OpenAI
import openai

client = openai.OpenAI(
    base_url="https://api.faraday.space/v1",
    api_key="fara-XKQJ-NV7R-M4TL",
)

response = client.chat.completions.create(
    model="qwen/qwen3-235b-a22b",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=512,
)
print(response.choices[0].message.content)

// drop-in replacement for OpenAI
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.faraday.space/v1",
  apiKey:  "fara-XKQJ-NV7R-M4TL",
});

const response = await client.chat.completions.create({
  model:     "qwen/qwen3-235b-a22b",
  messages:  [{ role: "user", content: "Hello" }],
  max_tokens: 512,
});
console.log(response.choices[0].message.content);

# streamingworks exactly like OpenAI
import openai

client = openai.OpenAI(
    base_url="https://api.faraday.space/v1",
    api_key="fara-XKQJ-NV7R-M4TL",
)

stream = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=[{"role": "user", "content": "Explain TEEs"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

§ IIImodels

12 attested models

15% flat markup · no tiers · no minimums

model slug	enclave	ctx	input (τ / M)	output (τ / M)	status
qwen/qwen3-235b-a22b	NVIDIA H100 CC	128k	0.32	1.14	live
qwen/qwen3-32b	NVIDIA H100 CC	128k	0.12	0.38	live
qwen/qwen3-8b	AMD SEV-SNP	128k	0.04	0.14	live
google/gemma-3-27b-it	Intel TDX	128k	0.09	0.28	live
google/gemma-3-12b-it	Intel TDX	128k	0.04	0.13	live
deepseek/deepseek-r1	NVIDIA H100 CC	64k	0.41	1.60	live
deepseek/deepseek-v3	NVIDIA H100 CC	64k	0.22	0.88	live
meta/llama-4-scout	AMD SEV-SNP	256k	0.08	0.24	live
meta/llama-3.3-70b	AMD SEV-SNP	128k	0.18	0.52	live
mistral/mistral-small-3.1	Intel TDX	128k	0.06	0.18	live
microsoft/phi-4	Intel TDX	16k	0.03	0.09	live
cohere/command-r-plus	NVIDIA H100 CC	128k	0.28	0.96	live

§ IVwhy TEEs

Hardware cannot
lie to you

A Trusted Execution Environment is a hardware-isolated region inside a CPU or GPU where memory is encrypted in silicon. The host OS, hypervisor, and cloud provider cannot read itnot with root access, not with physical access.

Before inference begins, the silicon vendor signs an attestation receipt: a cryptographic proof that the correct, unmodified enclave firmware is running on genuine hardware. You can verify that proof yourself, any time, for free.

This is not a promise in a terms-of-service document. It is a property of physics and mathematics.

⬡

Intel TDX

Trust Domain Extensions. Hardware-level VM isolation with encrypted memory pages and TDREPORT-based remote attestation baked into 4th-gen Xeon silicon.

⬡

AMD SEV-SNP

Secure Encrypted Virtualization with Nested Page protection. Memory integrity enforced at page granularity; VCEK-signed attestation reports chained to AMD root.

⬡

NVIDIA H100 CC

Confidential Computing mode on Hopper. GPU VRAM encrypted by the GPC; RIM-based attestation from NVIDIA OCSP. First production GPU-side TEE at AI scale.

§ Videntity

No passwords.
No email. Nothing.

your account · base32 · 16 chars

XKQJNV7RM4TLPW2F_

generated in your browser
hashed with Argon2id before use
we cannot recover itby design

Your identity is a 16-character base32 code generated entirely client-side. We store only its Argon2id hashyour raw code never touches our servers.

There are no accounts, no email addresses, no passwords to phish, no OAuth flows to compromise. The code is your key. Treat it like a private keyback it up offline.

Lost your code? There is no recovery path. This is intentional. Recovery mechanisms are attack surfaces. We removed the attack surface entirely.

Your derived Bittensor address is deterministic from your code. TAO balance is always readable on-chainit belongs to you, not Faraday.

§ VIpricing

Pay per token.
In TAO.

We apply a 15% flat markup over the underlying compute cost on every model, at every scale. No tiers. No minimums. No maximums. No volume discounts you have to qualify for. No subscription. No seat fees. Pay for exactly what you use in TAOBittensor's native settlement token.

our margin

15%

Flat. Always. Verified on-chain via the token contract. No hidden fees, no burst pricing.

minimum spend

τ0

No minimum deposit. Top up any amount. Unused balance stays in your on-chain wallet.

billing unit

token

Charged per token, not per request. Partial requests are prorated. Streaming counts at completion.

§ VIIquestions

Things people
actually ask

It is unrecoverable. We store only the Argon2id hash of your code, never the code itself. If you lose it, generate a new one from the dashboard and move any remaining TAO balance on-chain to your new derived address. This is a deliberate design choice: recovery mechanisms are attack surfaces.

Prompt content is never logged. We record only aggregate metadata token counts, latency, model slugfor billing and monitoring. The TEE enforcement means we cannot access prompt content even if we wanted to: TLS terminates inside the enclave, not on the host.

Faraday runs on Bittensor's inference subnet. TAO is the native settlement tokenthere is no conversion overhead or custodial exposure on our side. You fund directly from your own wallet; no intermediary holds your funds. USDC on-ramps are on the roadmap for later in 2026.

Every API response includes a signed attestation receipt from the silicon vendor in the X-Faraday-Attestation response header (base64-encoded). Our open-source fara verify CLI downloads the receipt, checks the signature against Intel/AMD/NVIDIA root certificates, and prints a pass/fail with the full certificate chain. Run it before any sensitive session.

No. TLS terminates inside the TEE. The TLS private key is generated inside the enclave at boot and never exported to the host. An attacker with full root access to the physical machineor to our infrastructuresees only ciphertext. The attestation receipt proves the TLS key is enclave-owned.

Yes. Set "stream": true in your request body. The response uses standard server-sent events (SSE), identical to the OpenAI streaming format. All 12 models support streaming. Token billing accrues at the end of the stream, not per chunk. There is no latency penalty for streaming vs. non-streaming.

The enclave firmware, the fara verify attestation CLI, and the token derivation logic are fully open source on GitHub. The billing layer, orchestration, and router are currently closed-source. We intend to open more of the stack over timeverifiability is the point of everything we build.

Not yet. Faraday is currently self-serve only. If you need dedicated capacity, custom SLAs, compliance documentation, or a private subnet deployment, reach out via X or the contact address in the footerwe are evaluating enterprise options for later in 2026.

§ VIIIdocs

API
reference

Base URL

All API requests use HTTPS. There is no HTTP fallback.

https://api.faraday.space/v1

ℹ

TLS terminates inside the TEE enclave, not on a load balancer. The TLS certificate's public key is included in the attestation receipt so you can verify the connection end-to-end.

Authentication

All requests require a bearer token in the Authorization header. Tokens follow the format fara-XXXX-XXXX-XXXX where each segment is 4 base32 characters. Generate yours from the dashboardit is created client-side and never transmitted in plaintext.

Authorization: Bearer fara-XKQJ-NV7R-M4TL

⚠

Never share your token. Anyone with it can spend your TAO balance. There is no way to revoke a tokenif compromised, generate a new one and move your balance to the new derived address.

Quickstart

Send your first request in under 60 seconds:

Generate a token at faraday.space/dashboard
Fund your derived TAO address (any amount ≥ 0.001 τ)
Copy the curl below, substitute your token, run it

curl https://api.faraday.space/v1/chat/completions \
  -H "Authorization: Bearer fara-XKQJ-NV7R-M4TL" \
  -H "Content-Type: application/json" \
  -d '{
    "model":      "microsoft/phi-4",
    "messages":   [{"role":"user","content":"Hello from the enclave!"}],
    "max_tokens": 64
  }'

Endpoints

The API is a strict superset of the OpenAI Chat Completions API.

method	path	description
POST	/v1/chat/completions	Create a chat completion (supports streaming)
GET	/v1/models	List all available attested models
GET	/v1/balance	Return current TAO balance for the authenticated token
GET	/v1/attest	Return latest attestation receipt for the enclave serving your request
GET	/v1/attest/:model	Return attestation receipt for a specific model's enclave

Request parameters

POST /v1/chat/completions accepts standard OpenAI fields. Faraday-specific extensions are prefixed with x_fara_.

parameter	type	description
modelrequired	string	Model slug from the models table (e.g. `qwen/qwen3-235b-a22b`)
messagesrequired	array	Array of message objects with `role` and `content`
max_tokensoptional	integer	Maximum tokens to generate. Defaults to model maximum.
temperatureoptional	float	Sampling temperature 0–2. Default 1.0.
streamoptional	boolean	Enable SSE streaming. Default false.
top_poptional	float	Nucleus sampling probability mass. Default 1.0.
stopoptional	string \| array	Up to 4 stop sequences.
x_fara_attestoptional	boolean	Include full attestation receipt in response headers. Default false.

Streaming

Set "stream": true to receive a server-sent events (SSE) stream. Each event is a data: line containing a JSON delta chunk, identical to the OpenAI format. The stream ends with data: [DONE].

Token billing is calculated at the end of the full stream. Abandoned streams (client disconnect) are billed for tokens generated up to the point of disconnect.

# SSE response format
data: {"id":"cmpl-abc","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"cmpl-abc","choices":[{"delta":{"content":" from"},"index":0}]}
data: {"id":"cmpl-abc","choices":[{"delta":{"content":" the"},"index":0}]}
data: {"id":"cmpl-abc","choices":[{"delta":{"content":" enclave"},"index":0}]}
data: [DONE]

Error codes

Errors follow the OpenAI error envelope: {"error":{"code":"...","message":"...","type":"..."}}

http	code	meaning
401	invalid_token	Token not recognised or malformed. Check prefix and format.
402	insufficient_balance	Your TAO balance is below the estimated cost of the request.
404	model_not_found	The requested model slug does not exist or is not live.
429	rate_limit_exceeded	Too many requests. Back off and retry with exponential delay.
503	enclave_unavailable	The enclave is rebooting for attestation refresh. Retry in ~30 s.
500	inference_error	Unexpected error inside the enclave. Not billed. Retry is safe.

Rate limits

Rate limits are applied per token, not per IP. Current defaults:

60 requests / minute across all models
200 000 tokens / minute (input + output combined)
No daily caplimits are rolling 60-second windows

When a limit is hit the response is 429 rate_limit_exceeded. The Retry-After header contains the number of seconds to wait. If you need higher limits, contact uswe can raise them on request.

Verify the TEE yourself

Install the open-source fara CLI (Go, Linux/macOS):

# install
go install github.com/faraday-space/fara@latest

# verify the enclave serving qwen3-235b right now
fara verify --model qwen/qwen3-235b-a22b

# output (example)
✓ enclave firmware hash  0xd4e9...f2a1  matches published digest
✓ AMD SEV-SNP VCEK cert  chained to AMD root CA
✓ TLS public key         matches cert inside attestation
✓ all checks passedthis enclave is genuine and unmodified

Attestation response header

Pass "x_fara_attest": true in any request (or set X-Fara-Attest: 1 as a request header) to receive the full attestation receipt in the response:

# response headers (excerpt)
X-Faraday-Attestation: <base64-encoded receipt>
X-Faraday-Enclave-ID:  amd-sev-snp::prod::0xd4e9f2a1
X-Faraday-Fw-Digest:   sha256:d4e9...f2a1

# decode and verify manually
echo "$RECEIPT" | base64 -d | fara verify --stdin

ℹ

Attestation receipts are refreshed every 6 hours during scheduled enclave reboots. The X-Faraday-Attest-Expires header contains the Unix timestamp of the next refresh.

Sealed-roomLLM inference

A doorway toprivate inference

Three steps.That's it.

12 attested models

Hardware cannotlie to you

No passwords.No email. Nothing.

Pay per token.In TAO.

Things peopleactually ask

APIreference

Sealed-room
LLM inference

A doorway to
private inference

Three steps.
That's it.

Hardware cannot
lie to you

No passwords.
No email. Nothing.

Pay per token.
In TAO.

Things people
actually ask

API
reference