Senior Software Engineer, Research
Base Pay Range
$175,000.00/yr - $220,000.00/yr
About Us
At Sully.ai, we’re building the most impactful healthcare company on earth. We believe that access to a great doctor is a basic human right. Today, that’s not a reality. Delays, misdiagnoses, administrative chaos, and burnout plague the system.
Mission
One Human, One Doctor. We build AI teammates that augment clinicians—scribes, nurses, receptionists, translators—all powered by our own world‑class models and deployed in real‑world care.
Traction
450+ organizations signed 16 months
AI agents cut admin by ~2.8 hours daily and reduce onboarding 85%
5M+ omkring clinical tasks completed to date, serving 36+ specialties
Raised $25M from YC, Eric Yuen, Amity, Semper Virens
Patented AI architecture (MedCon-1) outperforms GPT‑4.5, Gemini, Claude on clinical reasoning tasks
Requirements
Sully requires A‑players capable of 4 months = 1 year output.
What You’ll Do
Build and optimize core research infrastructure: evaluation pipelines, agent workflows, hallucination detectors, coding benchmarks, and research‑to‑production integrations.
Design, implement, and scale agentic systems across backend, frontend, and model integrations, collaborating closely with research and co‑founders.
Own reliability, observability, and performance across agents (logging, tracing, instrumentation, safety checks).
Ship research‑proven features into production within 7 days Tung endunnels.
Develop shared tools, SDKs, and internal products that accelerate iteration across Research, QA, and Engineering.
What You Must Bring
Senior‑level full‑stack engineering experience in React, TypeScript, and Node.js.
Proven ability to design, ship, and scale LLM‑powered applications.
Expertise in API design, streaming, and CI/CD pipelines.
Strong cloud infrastructure background (AWS, GCP, or Azure).
Track record of building reliable systems with measurable performance and error budgets.
First‑Month Focus
Audit all cross‑agent flows for UI/UX consistency, correctness, and performance gaps.
Implement shared components, typed schemas, and contract‑driven interfaces for reliability.
Establishantages instrumentation for frontend performance, agent consistency, latency, and model round‑trip tracing.
Improve or replace brittleHumidity evaluation or agent pipelines identified during onboarding.
Partner with Research to productionize at least one new capability.
90 Day OKRs
Deliver production‑grade agentic workflows with