Inference where
you live.
On-device Ollama, MLX-accelerated Qwen, and Cloudflare Workers AI for burst. Zero telemetry. Zero round-trips. Zero surrender to vendor cloud.
Local model arsenal.
Every model on disk, with quantization, context length, and footprint. Hot-swap with one click. Fall back to MLX or Workers AI in milliseconds.
Streaming inference console.
Streams tokens as they generate. Cmd+Enter to send. Falls back to /api/nova-cloud/infer when local is offline.
Vector space, visualized.
Embed any text and see all dimensions as a heatmap. Compare cosine similarity to your previous query. Zero RAG vendors. Zero leaks.
How fast, how good, how lean.
Live throughput, per-model quality on Nova's eval suite, and real-time memory pressure. The numbers don't lie.