CURRENT PROGRESS

27 token/s on 8B parameter models

Consuming under 10W while inference

Costs under 100$

Everything you see in the video is run locally (voice transcription, voice to text, LLM)

Exponential progress coming!

Now, we're pushing performance per dollar

Want to contribute with custom inference firmware, hardware, or novel AI model? Let's talk!

High-performance computing setup for AI inference