N
Nova Brain Inference · Local + Edge
Checking… Home Empire
Local Ollama · MLX · Edge AI

Inference where
you live.

On-device Ollama, MLX-accelerated Qwen, and Cloudflare Workers AI for burst. Zero telemetry. Zero round-trips. Zero surrender to vendor cloud.

0
Models loaded
Total params
0
Tokens / sec
Library

Local model arsenal.

Every model on disk, with quantization, context length, and footprint. Hot-swap with one click. Fall back to MLX or Workers AI in milliseconds.

Lab

Streaming inference console.

Streams tokens as they generate. Cmd+Enter to send. Falls back to /api/nova-cloud/infer when local is offline.

Idle
Prompt model · —
Ready. Pick a model, write a prompt, send.
— tok · — ms · — t/s
Model
System prompt
Embeddings

Vector space, visualized.

Embed any text and see all dimensions as a heatmap. Compare cosine similarity to your previous query. Zero RAG vendors. Zero leaks.

Embed text
Cosine vs. previous
No embedding yet.
Dimension heatmap
— dims
Benchmarks

How fast, how good, how lean.

Live throughput, per-model quality on Nova's eval suite, and real-time memory pressure. The numbers don't lie.

Throughput · tokens / second
Quality radar · per-model evals
Memory pressure · per loaded model (GB)
Live