Benchmark & route LLMs on real Android devices

Evalyard is a hosted dashboard + a real Android device lab — TTFT, tokens/sec, P50/P95, throttling, temperature, and battery metrics.

Self-service dashboard is coming soon.


Get early access.
TTFT
First-token latency per device & model
Tokens/sec
Sustained throughput under load
Thermals & Battery
Temperature, throttle, drain
Evalyard
Evalyard dashboard preview
Qualcomm
Samsung
Google
MediaTek
Arm
OpenAI

Real-world phones, real metrics

Transparent metrics

TTFT, tokens/sec, error rate, temperature, battery drain — unified across devices and models.

Reproducible runs

Pin prompts, seeds, adapters, and versions. Export CSVs. Compare apples-to-apples.

Smart routing

Send requests to the best target in real time based on health and performance.

Device control

Per-device adapters, tags, and schedules. Handle thermal throttling gracefully.

Clean UI

Landing mirrors the Evalyard dashboard aesthetics — cards, tables, crisp typography, dark mode.

Lightweight agent

Runs on Android (USB/TCP). Minimal overhead; metrics batched to avoid perturbing latency.

How it works

Bring your own devices or use phones from our lab, then plug Evalyard into your stack in four steps.

Step 1 · Agent
Install on device

Install the agent on Android (USB or TCP).

Step 2 · Adapter
Wire models

Register model & adapter per device in the UI.

Step 3 · Benchmarks
Measure & route

Run benchmarks or call the API for routing.

Step 4 · API
Call Evalyard
POST /api/...
Content-Type: application/json

{
  "prompt": "Summarize this message...",
  "device_tag": "fastest",
  "max_tokens": 128
}

Get early access

We can provision specific phones, build adapters, and share a read-only dashboard for your team. No spam.

Early access: first 15 teams get 90% off their first 4 months of Evalyard, then 30% off forever. Device rental is billed separately.

FAQ

Quick answers
Do the plans include devices?
By default it’s BYOD (bring your own Android phones). Device rental / dedicated racks are available for Fabric and Enterprise on request.
What are device-hours?
Time a phone is actively running your jobs. Hitting the limit? Pause runs or enable pay-as-you-go overage.
How do I access the dashboard now?
The self-service dashboard is not publicly available yet. Please book a private demo. We’ll walk you through the metrics and provide screenshots from the current version.
Can I cancel anytime?
Yes — monthly billing, cancel anytime. No long-term lock-in.

Need fully isolated infrastructure or shipped devices? Ask about Enterprise Fabric.

Vote on what we build next

Tell us what to build →
High-load stress testing for on-device LLMs Automated output grading / evals Image-based / multimodal models Plugins / SDK for game engines Per-device battery & thermal tracking
Want early access to our lab? First 15 teams get 90% off. Request access