Forward Deployed Engineer (Inference & Post-Training)
Occupations:
Software DevelopersComputer Systems Engineers/ArchitectsData ScientistsComputer and Information Research ScientistsComputer Occupations, All OtherIndustries:
Business Schools and Computer and Management TrainingOther Schools and InstructionVocational Rehabilitation ServicesJunior CollegesInvestigation and Security ServicesJob DescriptionForward Deployed Engineer Job InfoForward Deployed Engineer (Inference & Post-Training)Location & Work ArrangementLocation: San Francisco, CAWork Mode: On-site, RemoteCompany: Together AIJob Description SummaryA Forward Deployed Engineer (FDE) acts as a hands-on technical partner to strategic production AI teams, leveraging high-quality models for large-scale inference. This role serves as a deep-domain specialist in inference optimization, fine-tuning pipelines, and production deployment, collaborating with Solutions Architects to ensure customer success and platform adoption.Machine Learning & Artificial IntelligenceKey ResponsibilitiesInference Engine Optimization: Select, configure, and optimize inference engines based on hardware, model architecture, and workload.Configuration & Performance Tuning: Develop configurations for POCs and benchmarks; tune KV cache, speculative decoding, tensor parallelism, and quantization strategies.Post-Training & Fine-Tuning: Drive RL training runs and guide customers through LoRA, SFT, DPO, RLHF, and GRPO pipelines.Strategic Customer Alignment: Serve as the primary technical contact for strategic accounts; monitor endpoint configurations and ensure milestone achievement.Opinionated Onboarding: Establish alignment during onboarding to ensure optimal configurations from day one.Product Feedback Loop: Influence software and model roadmaps by surfacing field insights and driving early feature adoption.QualificationsExperience: 5+ years in technical roles with a focus on inference systems, open-source LLM deployment, or post-training workflows.Inference Engines: Expert-level hands-on experience with vLLM, TensorRT-LLM, or SGLang.Optimization Expertise: Deep knowledge of KV cache tuning, speculative decoding, tensor/pipeline parallelism, and quantization.Post-Training: Experience with LoRA, SFT, DPO, RLHF, and GRPO fine-tuning pipelines.Model Awareness: Broad knowledge of state-of-the-art open-source models for selection based on use cases and hardware.Coding Skills: Strong Python proficiency for production environments.Compensation & BenefitsBase Salary Range: $270,000 - $300,000 OTE (US Full-time)Additional Compensation: Startup Equity + BenefitsBenefits: Health insurance, flexible remote work policy.