AI Application Engineer
Role: AI Application EngineerLocation: Santa Clara, CA (3 days onsite in a week)Overview:AI Application Engineer to support the development and delivery of next-generation AI-powered applications built on NVIDIA infrastructure. This role will focus on production-grade LLM application engineering, RAG quality, prompt engineering, AI safety, and orchestration of complex multi-step AI pipelines.Day-to-Day ResponsibilitiesDesign, develop, and optimize production-grade LLM-powered applicationsOwn AI quality, RAG accuracy, prompt engineering, and AI safety across multiple applicationsDevelop and maintain multi-step LLM orchestration pipelines using LangChain, LlamaIndex, or custom frameworksImplement and optimize RAG pipelines including chunking strategies, embedding selection, reranking, and hybrid searchDesign multi-turn conversational AI experiences with context management and session memoryIntegrate NVIDIA technologies including NIM, NeMo, NeMoGuardrails, and Riva into enterprise AI applicationsBuild automated evaluation pipelines for model quality, hallucination detection, regression testing, and release gatingPerform latency profiling and optimization across multi-step LLM call chainsImplement AI safety guardrails including prompt injection prevention, jailbreak mitigation, and topical controlCollaborate with globally distributed engineering and product teams to deliver scalable AI solutionsSupport deployment, monitoring, and continuous improvement of AI applications in production environmentsBasic Qualifications:4–7 years of software engineering experience with at least 2 years focused on production LLM application developmentExpert-level experience with Python for AI/ML application development and async programmingStrong expertise in prompt engineering including system prompts, few-shot prompting, and instruction tuning3+ Years of Hands-on experience with multi-step LLM orchestration frameworks such as LangChain or LlamaIndex3+ Years of Experience designing and optimizing RAG pipelines and retrieval systems3+ Years of Experience with vector databases, similarity search tuning, and reranking techniques3+ Years of Hands-on experience with NVIDIA NIM, NeMo, NeMoGuardrails, and Riva3+ Years of Experience implementing AI safety and guardrails for customer-facing applicationsStrong knowledge of automated AI evaluation frameworks such as RAGAS or TruLens3+ Years of Experience profiling and optimizing latency in multi-step AI pipelinesAbility to work onsite in Santa Clara, CAPreferred QualificationsExperience with adaptive learning systems or recommendation enginesKnowledge graph integration experience with RAG architecturesExperience with multi-agent orchestration patternsServiceNow API integration experiencePrior experience building AI products on NVIDIA infrastructureExperience with streaming LLM response handling and real-time AI applicationsTechnology StackPythonLangChainLlamaIndexNVIDIA NIMNeMoNeMoGuardrailsNVIDIA RivaVector DatabasesRAGAS / TruLensLLM APIs and orchestration frameworksEducationBachelor’s degree in Computer Science, Engineering, Artificial Intelligence, or equivalent work experience.