Head of AI/ML
Head of Research (AI Evaluation)
Vals AI · AI Infrastructure / AI Evaluation & Benchmarking · 5-20 employees · 📍 San Francisco Bay Area
Vals AI needs a founding research leader to define the science of LLM evaluation — building the benchmarks and methodologies that determine which AI models get trusted and deployed at scale.
AI Maturity
advanced
Reports To
CEO
Why Role Exists
New position
Company Stage
startup
Source
linkedin
Key Responsibilities
- Advance the science of LLM evaluation — develop new paradigms beyond judge models, static benchmarks, and HITL for long-horizon real-world tasks
- Oversee Vals' full research portfolio, setting direction across active and future projects
- Publish high-impact research intended to shape field-wide methodology
- Recruit, build, and lead the research team from near-zero
- Partner directly with enterprise customers and frontier lab partners on applied evaluation problems
Requirements
- PhD in ML/NLP (completed or in progress) or equivalent frontier industry research track record
- Deep expertise in LLM evaluation landscape: benchmarks, failure modes, judge-model approaches, HITL methodologies
- Research orientation toward real-world deployability over easily-gamed benchmarks
- Strong written and verbal communication for publishing, presenting, and customer/lab dialogue
- Ability to work full-time onsite in San Francisco
Signals
- ⚠ No salary range disclosed despite 'highly competitive' claim
- ⚠ Company description cut off mid-sentence ('About Us: Fo...') — incomplete posting
- ⚠ No previous postings make traction and culture hard to independently verify
- ⚠ Strict onsite-only requirement significantly limits candidate pool
- ✓ Clear, intellectually coherent product thesis — evaluation as infrastructure is a credible high-value wedge
- ✓ Explicit acknowledgment of existing research portfolio and enterprise + lab partnerships suggests real traction
- ✓ Strong equity offer implied for early research leader at founding-stage company
- ✓ Research ambition is field-level, not just product-level — rare mandate for a commercial role
- ✓ Relocation support offered, signaling willingness to invest in the right candidate
- ✓ Full benefits stack including meals, health, 401K