Evalyard is a hosted dashboard + a real Android device lab — TTFT, tokens/sec, P50/P95, throttling, temperature, and battery metrics.
Self-service dashboard is coming soon.
TTFT, tokens/sec, error rate, temperature, battery drain — unified across devices and models.
Pin prompts, seeds, adapters, and versions. Export CSVs. Compare apples-to-apples.
Send requests to the best target in real time based on health and performance.
Per-device adapters, tags, and schedules. Handle thermal throttling gracefully.
Landing mirrors the Evalyard dashboard aesthetics — cards, tables, crisp typography, dark mode.
Runs on Android (USB/TCP). Minimal overhead; metrics batched to avoid perturbing latency.
Bring your own devices or use phones from our lab, then plug Evalyard into your stack in four steps.
Install the agent on Android (USB or TCP).
Register model & adapter per device in the UI.
Run benchmarks or call the API for routing.
POST /api/...
Content-Type: application/json
{
"prompt": "Summarize this message...",
"device_tag": "fastest",
"max_tokens": 128
}
We can provision specific phones, build adapters, and share a read-only dashboard for your team. No spam.
Early access: first 15 teams get 90% off their first 4 months of Evalyard, then 30% off forever. Device rental is billed separately.
Need fully isolated infrastructure or shipped devices? Ask about Enterprise Fabric.