<Back to Search
AI Systems & Inference Frameworks Engineer
New York, NYMarch 28th, 2026
About UsMost AI is frozen in place - it doesn't adapt to the world. We think that's backwards. Our mandate is to build efficient intelligence that evolves in real-time. Our vision is AI systems that are flexible, personalized, and accessible to everyone. We believe efficiency is what makes this possible - it's how we expand access and ensure innovation benefits the many, not the few. We believe in talent density: bringing together the best and most driven individuals to push the boundaries of continual adaptation. We're looking for builders and creative thinkers ready to shape the next era of intelligence.The RoleYou'll work directly with our founders to design and build the inference and optimization systems that power our core product. This role bridges research and production, combining deep exploration of inference techniques with hands-on ownership of scalable, high-performance serving infrastructure. You'll own the full lifecycle of LLM inference-from experimentation and performance analysis to deployment and iteration in production-thriving in a zero-to-one environment and helping define the technical foundations of our inference stack.ResponsibilitiesInference Research & Systems: design and build our LLM inference stack from zero to one, exploring and implementing advanced techniques for low-latency, high-throughput serving of language and multimodal models.Frameworks & Optimization: develop and optimize inference using modern frameworks (e.g., vLLM, SGLang, TensorRT-LLM), experimenting with batching strategies, KV-cache management, parallelism, and GPU utilization to push performance and cost efficiency.Software-Hardware Co-Design: collaborate closely with founders and model developers to analyze bottlenecks across the stack, co-optimizing model execution, infrastructure, and deployment pipelines.QualificationsStrong experience building and optimizing LLM inference systems in production or research environmentsHands-on expertise with inference frameworks such as vLLM, SGLang, TensorRT-LLM, or similarDeep performance mindset with experience in GPU-backed systems, latency/throughput optimization, and resource efficiencySolid understanding of transformer inference, serving architectures, and KV-cache-based executionStrong programming skills in Python; experience with CUDA, Triton, or C++ a plusComfort working in ambiguous, zero-to-one environments and driving research ideas into production systemsNice to have: experience with model quantization or pruning, speculative decoding, multimodal inference, open-source contributions, or prior work in systems or ML research labsAbove all, we're looking for great teammates who make work feel lighter and aren't afraid to go out on a limb with bold ideas. You don't need to be perfect, but you do need to be adaptable. We encourage you to apply, even if you don't check every box.BenefitsFlexible work: In-person collaboration in the Bay Area, a distributed global-first team, and quarterly offsites.Adaption Passport: Annual travel stipend to explore a country you've never visited. We're building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons.Lunch Stipend: Weekly meal allowance for take-out or grocery delivery.Well-Being: Comprehensive medical benefits and generous paid time off.
Showing 650 of 37,795 matching similar jobs in Springbrook, ND
- Director of GenAI & ML Systems
- Senior Backend Engineer - Frontier AI Data Pipelines
- Senior Software Engineer - FinTech & Scalable Systems
- Senior Lead Software Engineer (Java/AWS) - Secure & Scalable
- Senior Software Engineer, Backend — Frontier Data
- Senior Software Engineer - Push San Francisco, California, United States
- Senior Software Engineer - Data Mesh & High-Scale Systems
- Head of AI
- Remote Backend Engineer – AI Legal Tech, Lead & MentorRemoteMarch 26th, 2026
- Platform Engineer - Generative AI & AI-Powered Tools
- Head of ML for AI-Driven Biological Design (Hybrid)
- Platform Engineer - Generative AI & AI-Powered Tools
- Remote Senior C++ Full-Stack Engineer for AI Data & InfraRemoteMarch 26th, 2026
- Remote AI Analytics Engineer for Revenue Growth
- Remote AI/ML Deployment EngineerRemoteMarch 29th, 2026
- Machine Learning Operations Contractor
- Remote Research Scientist: Real-Time AI Systems
- Lead GPU Graphics Software Engineer - Gaming/AI (Remote)RemoteMarch 29th, 2026
- Senior Java Backend Engineer — FinTech APIs & AI (Remote)RemoteMarch 29th, 2026
- Java Backend Engineer - High-Impact FinTech APIs (Remote)RemoteMarch 29th, 2026
- Senior Product Engineer - Full-Stack Java/AI (Remote)
- Senior ML Engineer: AI for Customer Support (Remote)RemoteMarch 29th, 2026
- AI Operations Lead, Marketing (remote USA)RemoteMarch 29th, 2026
- Remote AI Engineer (Applied Scientist) III — EquityRemoteMarch 29th, 2026
- Senior AI Engineer, Healthcare Analytics (Remote)RemoteMarch 29th, 2026
- Staff DevOps Engineer for AI-scale Infra (Remote)RemoteMarch 29th, 2026
- Remote AI Engineer for Federal ProgramsRemoteMarch 29th, 2026
- Remote Data Engineering Intern — DevOps & AI PipelinesRemoteMarch 29th, 2026
- AI Security Delivery Lead — Remote-First with Flexible PTORemoteMarch 29th, 2026
- Principal Software Developer / Tech Lead - AI Solutions (Remote US)RemoteMarch 29th, 2026
- Senior Generative AI Engineer (LLM Expert) – RemoteRemoteMarch 29th, 2026
- Senior Staff AI & Automation Engineer - Remote Unlimited PTO
- Gen AI Engineer(LLM Model)
- AI Architect
- ML Engineer, Real-Time TTS Systems — Remote/SFRemoteMarch 29th, 2026
- Remote Staff Rust Engineer - BanyanRemoteMarch 29th, 2026
- Java Software Engineer
- Software Engineer
- Python Developer – Generative AI & Cloud (Agentic AI)
- Generative AI Solution Architect (.NET & Python)