Can I see a live demo?

Yes — open the public demo or request a private walkthrough.

Evalyard — Real-device LLM Bench & Routing

Name: Evalyard
Brand: Evalyard
Price: 49 USD
Availability: InStock

Real-world phones, real metrics

Transparent metrics

TTFT, tokens/sec, error rate, temperature, battery drain — unified across devices and models.

Reproducible runs

Pin prompts, seeds, adapters, and versions. Export CSVs. Compare apples-to-apples.

Smart routing

Send requests to the best target in real time based on health and performance.

Device control

Per-device adapters, tags, and schedules. Handle thermal throttling gracefully.

Clean UI

Landing mirrors the Evalyard dashboard aesthetics — cards, tables, crisp typography, dark mode.

Lightweight agent

Runs on Android (USB/TCP). Minimal overhead; metrics batched to avoid perturbing latency.

How it works

Bring your own devices or use phones from our lab, then plug Evalyard into your stack in four steps.

Step 1 · Agent

Install on device

Install the agent on Android (USB or TCP).

Step 2 · Adapter

Wire models

Register model & adapter per device in the UI.

Step 3 · Benchmarks

Measure & route

Run benchmarks or call the API for routing.

Step 4 · API

Call Evalyard

POST /api/...
Content-Type: application/json

{
  "prompt": "Summarize this message...",
  "device_tag": "fastest",
  "max_tokens": 128
}

Get early access

We can provision specific phones, build adapters, and share a read-only dashboard for your team. No spam.

Early access: first 15 teams get 90% off their first 4 months of Evalyard, then 30% off forever. Device rental is billed separately.

FAQ

Quick answers

Do the plans include devices?

By default it’s BYOD (bring your own Android phones). Device rental / dedicated racks are available for Fabric and Enterprise on request.

What are device-hours?

Time a phone is actively running your jobs. Hitting the limit? Pause runs or enable pay-as-you-go overage.

How do I access the dashboard now?

The self-service dashboard is not publicly available yet. Please book a private demo. We’ll walk you through the metrics and provide screenshots from the current version.

Can I cancel anytime?

Yes — monthly billing, cancel anytime. No long-term lock-in.

Need fully isolated infrastructure or shipped devices? Ask about Enterprise Fabric.

Vote on what we build next

Tell us what to build →

High-load stress testing for on-device LLMs Automated output grading / evals Image-based / multimodal models Plugins / SDK for game engines Per-device battery & thermal tracking

Benchmark & route LLMs on real Android devices

Real-world phones, real metrics

How it works

Get early access

FAQ

Vote on what we build next