Research Engineer
Fluency is enabling the autonomous Enterprise. (in person)You're needed to push the boundaries of what our models can understand. We're not prompt engineering chatbots. We're building evaluation frameworks and research systems that measure, improve, and validate enterprise intelligence at a scale nobody has attempted.Fluency is looking for a Research Engineer to design experiments, build evaluation infrastructure, and drive model quality for our process conformance, productivity measurement, and AI impact analysis across Fortune 500 organisations.The Problem SpaceYou'll be developing the methodology and systems that determine whether our models actually work. Screenshots, OCR text, application metadata, behavioral signals: the inputs are messy and the ground truth is ambiguous. The challenge is building rigorous evaluation frameworks that quantify model performance and identify improvement opportunities.This means:Designing evaluation pipelines that measure accuracy, precision, and recall across classification tasksBuilding ground truth datasets from ambiguous, real-world enterprise dataRunning systematic prompt engineering experiments to optimize LLM performanceDeveloping A/B testing infrastructure for model comparisonResearching novel approaches to process understanding, activity classification, and intent extractionQuantifying cost-accuracy tradeoffs across different model architectures and prompting strategiesBuilding automated world-model training infrastructure from our ontologyThe playbook doesn't exist. You'll write it.We're backed by T1 VCs like Accel, research labs like from Princeton, and are hitting an inflection point with Enterprises all around the globe.You'll work directly with founders and our engineering team on technical challenges that span LLM evaluation, experimental design, and applied research.About The RoleWe're looking for someone with:Strong Python fundamentals and software engineering disciplineLLM prompt engineering and optimization (token efficiency, few-shot design, chain-of-thought)Experience evaluating model performance: accuracy measurement, error analysis, regression detectionAbility to read, synthesize, and apply ML research papersStatistical literacy: understanding when results are meaningful vs noiseComfort with ambiguity and novel problem domainsComputer Science Background, with caveat. If you don't have a CS background, you're challenged to beat one of the founders in a 1:1 whiteboard duel on DS&A judged by Hung. Neither founder has a formal CS background, but come prepped.There will be an expectation to stay up to business context, which could involve:Watching key customer callsInteracting with customersHelping with product thinkingStrongly PreferredExperience building evaluation frameworks and benchmarking systemsGround truth dataset creation and annotation pipeline experienceExperience with hybrid LLM/rule-based systemsOCR, document understanding, or computer vision backgroundCost optimization for LLM-heavy systemsClassification and NLP systems experiencePublished research or formal research methodology trainingFamiliarity with process mining or workflow analysisInteresting personal projects that demonstrate depthOur CustomersWe work with some of the world's largest:Financial services enterprises (Aon)Manufacturing enterprises (Misumi)And many more across the enterprise spectrum (PVH)Our CultureYou're expected to be in love with the craft. You're expected to like laughing. You're expected to want to work on novel problems. You're expected to find satisfaction in novelty. You're expected to solve under obscurity.Our ValuesIn hesitation lies destruction; in action, glory. Those who merely meet expectations abandon the pursuit of greatness. One who dwells within the forum must regard it as hallowed ground. One who has not tasted the grapes declares them sour. One who sits alone at the feast misses the richness of the table. LocationFull-time, in-person role based in San Francisco, CA.We offer E3 sponsorship for Australians to relocate with stipendCompensationUS$150K - $320K salary, depending on candidate and experienceSubstantial equity, every offer includes ownershipMac, Linux, or Windows, your callHigh-impact work with global enterprisesTechnical, product-led foundersDon't apply if:You want hybrid or remoteYou don't like working hard and with insane velocityYou want to work a 9 to 5You're not comfortable with rapid iterationYou think evaluation is grunt workYou've never shipped a model or evaluation system to productionYou don't have personal projectsYou dislike constraints (we have them: cost, latency, accuracy tradeoffs are real)You aren't ambitiousYou don't have a good reason for wanting to work at an early-stage companyHiring ProcessResume screen1:1 with founderTechnical deep-dive on past research and evaluation workWork through a real problem with the team - usually as a live coding exerciseOfferWe strongly encourage applicants from underrepresented backgrounds to apply. Diverse teams build better products, see value #5.Compensation Range: $120K - $320K