Vals AI logo

Member of Technical Staff - Research

Posted about 23 hours ago

OfficeSan FranciscoSE

About the Role

We are looking for exceptional researchers and research engineers to design and build the next generation of AI benchmarks. You will create high-impact, challenging evaluations that push the boundaries of what we can measure in foundation models. This role is perfect for someone with deep research expertise who wants to see their work directly influence how the world evaluates AI systems.

You will lead the design and development of novel benchmarks that assess real-world capabilities of LLMs. Our benchmark shape how foundation models are developed and generative AI applications are built. We work with every major foundation model lab - along with leading financial institutions and the application-layer companies pushing the frontier forward. Our work has been featured by the Wall Street Journal, Washington Post, and Bloomberg.

We are building the standard for evaluating the ability of LLMs to perform real-world tasks. You will be at the forefront of defining what that standard looks like.

What You'll Do

  • Design and develop novel, high-impact benchmarks that assess challenging real-world capabilities

  • Conduct research to ensure our benchmarks are valid, reliable, and meaningful

  • Collaborate with foundation model labs and enterprises to understand evaluation needs

  • Analyze model performance across benchmarks and communicate findings

  • Publish research findings and contribute to the broader evaluation research community

  • Work closely with the infrastructure team to implement your benchmark designs at scale

  • Stay current with the latest developments in LLM capabilities and evaluation methodologies

Requirements

  • Advanced research experience: Master's degree or PhD in Computer Science, NLP, Machine Learning, or related field. Undergrads with very strong research backgrounds may also be considered.

  • Publication track record: Published papers in reputable venues (NeurIPS, ICML, ACL, EMNLP, etc.) with focus on NLP, ML evaluation, or benchmarking

  • Research methodology: Strong understanding of experimental design, statistical analysis, and evaluation frameworks

  • Technical skills: Proficiency in Python for research and experimentation

  • Communication: Ability to clearly communicate complex research ideas to both technical and non-technical audiences

  • Collaboration: Experience working in research teams and integrating feedback

  • Portfolio: Demonstrated track record of impactful research work

  • Location: We are an in-person team based in San Francisco. We will support your relocation or transportation as needed.

Nice to Haves

  • Experience specifically in LLM evaluation or benchmarking research

  • Familiarity with foundation model architectures and capabilities

  • Experience working with industry partners or in applied research settings

  • Background in areas like human-computer interaction, psychology, or domain-specific evaluation

  • Experience at early-stage startups or research labs

  • Contributions to open-source evaluation tools or datasets

What We Offer

  • Highly competitive salary and meaningful ownership. Excellence is well rewarded.

  • Relocation and transportation support

  • Health/dental insurance coverage

  • Lunch and dinner provided, free snacks/coffee/drinks

  • Unlimited PTO

  • Opportunity to publish and present your work

About Us

Founding team: The core methodology behind this platform comes from NLP evaluation research we had done at Stanford. We raised a $5M seed from some of the top institutional and angel investors in the valley. Our team has prior work experience at NVIDIA, Meta, Microsoft, Palantir and HRT. Collectively, we have over 300 citations in our published work. Our early team include Stanford PhDs, ex-Jane Street quants, and the first designer at Snorkel.

Tech stack: We use Python for most things at Vals. Our platform is built on Django, with a React frontend. All of the infra is on AWS using CDK for IaC.

What We're Looking For

  • Learning velocity: The role encompasses a wide variety of tasks. Rather than expecting you to be an expert on Day 1, we are looking for someone who can learn new skills and technologies extremely quickly.

  • Ownership: Working in a small, talent-dense team, we expect everyone to show initiative to build where it's needed, not where it's asked. We strive for autonomy over consensus. This is especially true for this role.

  • Intensity: The LLM landscape is constantly changing. Foundation model labs are continuously pushing the frontier. The unicorn companies that will emerge from this technology shift are being built now. Those that win will have an incredibly high speed of execution.

  • Solution-oriented mindset: We're looking for people who see opportunities to craft solutions at each juncture, not those who pass hard problems to others or admit defeat.

Further Reading:

Job details
Workplace
Office
Location
San Francisco
Experience
SE

Private, domain-specific benchmarks in legal, tax, and finance.

Employees
20
Industry
Software Development
Headquarters
San Francisco
Company location
San Francisco, US

Key team members

Jun Sung An

Jun Sung An

Alex Pattison

Alex Pattison

Braden Wicker

Braden Wicker

Rayan K.

Rayan K.

Apply smarter with Jobr

Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.

Direct from company career pages
AI-personalised cover letters
Human review before every submit
Application tracking & follow-ups