Distributed Inference Bench runs LLMs across physical Android devices and edge nodes. Compare TTFT, tokens/sec, thermal/battery impact, and route inference to the best target — in real time.
Replace the mock image with a screenshot of your actual /bench view for authenticity.
Measure what matters on phones, not just GPUs. Reproduce runs, compare models, and pick the best device dynamically.
TTFT, tokens/sec, error rate, temperature, battery drain — unified across devices and models.
Pin prompts, seeds, adapters, and versions. Export CSVs. Compare apples‑to‑apples.
Send requests to the best target in real time based on health and performance.
Per‑device adapters, tags, and schedules. Handle thermal throttling gracefully.
Landing mimics your /bench aesthetics — cards, tables, crisp typography, dark mode.
Single HTML file. Drop into Cloudflare Pages, link /bench, and you’re live.
Embed a public view, or keep private and link out. Below is a placeholder iframe — point it to /bench or a read‑only view.
Agents run on phones/edge nodes. The gateway orchestrates jobs, collects metrics, and surfaces insights in the Bench UI. Route traffic programmatically via API or use the UI to compare and dispatch.
Charts are decorative placeholders. Swap with real images or a lightweight chart embed later.
We can provision specific phones, build adapters, and share a read‑only Bench for your team.
Or just hit /bench to explore on your own.