<Back to Search
Research Scientist
Sonoma, CAApril 1st, 2026
Research ScientistSan Francisco (On-Site)About the RoleWe are looking for exceptional researchers and research engineers to design and build the next generation of AI benchmarks. You will create high-impact, challenging evaluations that push the boundaries of what we can measure in foundation models. This role is perfect for someone with deep research expertise who wants to see their work directly influence how the world evaluates AI systems.You will lead the design and development of novel benchmarks that assess real-world capabilities of LLMs. Our benchmark shapes how foundation models are developed and generative AI applications are built. We work with all the major foundation model labs, some of the largest financial institutions, and hospital systems in the world. Our work has been featured by the Wall Street Journal, Washington Post, and Bloomberg.We are building the standard for evaluating the ability of LLMs to perform real-world tasks. You will be at the forefront of defining what that standard looks like.What You'll DoDesign and develop novel, high-impact benchmarks that assess challenging real-world capabilitiesConduct research to ensure our benchmarks are valid, reliable, and meaningfulCollaborate with foundation model labs and enterprises to understand evaluation needsAnalyze model performance across benchmarks and communicate findingsPublish research findings and contribute to the broader evaluation research communityWork closely with the infrastructure team to implement your benchmark designs at scaleStay current with the latest developments in LLM capabilities and evaluation methodologiesRequirementsAdvanced research experience: Master's degree or PhD in Computer Science, NLP, Machine Learning, or related field. Undergrads with very strong research backgrounds may also be considered.Publication track record: Published papers in reputable venues (NeurIPS, ICML, ACL, EMNLP, etc.) with a focus on NLP, ML evaluation, or benchmarkingResearch methodology: Strong understanding of experimental design, statistical analysis, and evaluation frameworksTechnical skills: Proficiency in Python for research and experimentationCommunication: Ability to clearly communicate complex research ideas to both technical and non-technical audiencesCollaboration: Experience working in research teams and integrating feedbackPortfolio: Demonstrated track record of impactful research workLocation: We are an in-person team based in San Francisco. We will support your relocation or transportation as needed.Nice to HavesExperience specifically in LLM evaluation or benchmarking researchFamiliarity with foundation model architectures and capabilitiesExperience working with industry partners or in applied research settingsBackground in areas like human-computer interaction, psychology, or domain-specific evaluationExperience at early-stage startups or research labsContributions to open-source evaluation tools or datasets
Showing all 39 matching similar jobs
- Remote M&A Associate - AI Trainer ($50-$60 per hour)
- Occupational Therapist | Rehabilitation
- Online Survey Taker. Earn up to $25 per survey. - Remote
- Online Survey Taker. Earn up to $25 per survey. - Remote
- Online Survey Taker. Earn up to $25 per survey. - Remote
- Research Engineer
- Senior Perception Engineer
- Applied Scientist (AI for Healthcare)Sonoma, CAApril 1st, 2026
- Human Resources Specialist - Start Your Career with the US Army
- Freelance Research Participant Data Specialist (Hiring Immediately)
- Freelance Research Panelist (Hiring Immediately)
- Senior AI Systems Engineerstrativ groupcommercial and industrial machinery and equipment except automotive and electronic repair and maintenanceoffice furniture including fixtures manufacturingweb search portals libraries archives and other information servicescomputer systems design and related serviceshousehold and institutional furniture and kitchen cabinet manufacturingSonoma, CAApril 1st, 2026
- Earn up to $3,000/Study Out of Office Position/ Data Entry - High Earning Potential in Flexible Research Environment (Hiring Immediately)
- Remote Research Participant & Data Entry Clerk (Hiring Immediately)
- Flexible Home Research and Data Entry Associate (Hiring Immediately)
- High-Paying Research Contributor (Hiring Immediately)
- Member of Technical Staff | Robotics (Computer Vision / VLA / ML Infrastructure)
- Remote Paid Research and Data Contributor (Hiring Immediately)
- Remote Research Study Participant (Hiring Immediately)
- Flexible Data Entry and Survey Contributor (Hiring Immediately)
- Founding Computational Biologist, AI Agents for Pharma
- Staff Software Engineer
- Remote Research Participant and Data Entry Specialist (Hiring Immediately)
- Remote Research Studies Contributor (Hiring Immediately)
- Data Science Manager
- Embedded Software Developer C++ (Robotics)
- AI Software Engineer
- Senior AI Engineerstrativ groupcontinuing care retirement communities and assisted living facilities for the elderlycomputing infrastructure providers data processing web hosting and related servicesweb search portals libraries archives and other information servicesbusiness schools and computer and management trainingcomputer systems design and related servicesSonoma, CAApril 1st, 2026
- Founding AI Engineer
- Software Engineer
- Senior Staff Software Engineer