Lead AI Engineer (FM Hosting, LLM Inference)/Remote
Job Title:Lead AI Engineer (FM Hosting, LLM Inference) Location- RemoteJob SummaryWe are looking for an experienced Lead AI Engineer to design, deploy, and optimize large-scale Foundation Model (FM) hosting and LLM inference platforms. The ideal candidate will lead AI infrastructure initiatives, improve model serving performance, and build scalable, secure, and cost-efficient AI systems for enterprise applications.Key ResponsibilitiesDesign and manage scalable infrastructure for hosting foundation models and LLMs.Develop and optimize high-performance inference pipelines for low latency and high throughput.Deploy and manage models using containerized and distributed environments.Work with GPU acceleration, model quantization, batching, caching, and inference optimization techniques.Implement APIs and microservices for AI model serving.Monitor system reliability, availability, scalability, and cost efficiency.Collaborate with AI/ML teams to productionize machine learning and generative AI models.Lead architecture decisions for model deployment, orchestration, and observability.Ensure security, governance, and compliance for AI infrastructure.Mentor engineering teams and drive AI platform best practices.Required SkillsStrong expertise in Python and backend system development.Hands-on experience with LLM serving frameworks such as vLLM, TensorRT-LLM, or Text Generation Inference.Experience with distributed computing, GPU infrastructure, and Kubernetes.Knowledge of transformer architectures, model optimization, and inference tuning.Experience with cloud platforms such as Amazon Web Services, Microsoft Azure, or Google Cloud.Familiarity with Docker, CI/CD pipelines, and infrastructure automation.Understanding of vector databases, embeddings, and retrieval systems.Strong debugging, performance tuning, and problem-solving skills.Excellent leadership and stakeholder communication abilities.Preferred QualificationsBachelor s or Master s degree in Computer Science, AI, Machine Learning, or related field.Experience deploying open-source or enterprise LLMs in production environments.Knowledge of MLOps and observability tools.Exposure to RAG architectures, fine-tuning, and AI agents is a plus.Tools & TechnologiesPython, FastAPIvLLM / TensorRT-LLMKubernetes, DockerPyTorch, CUDARay, Triton Inference ServerVector Databases (Pinecone, Milvus, FAISS)Amazon Web Services / Microsoft Azure / Google CloudCI/CD & Monitoring Tools