Software Engineer
Core ExperienceHands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral / Mixtral in on-prem or private environmentsStrong proficiency in Python for LLM inference, prompt engineering, and integrationExperience with CPU-based inference, model quantization, and performance tuning Vector Databases & RAGPractical experience with open-source vector databases such as Qdrant, Chroma, Milvus, or pgvectorProven implementation of Retrieval-Augmented Generation (RAG) pipelinesExperience generating and managing embeddings and metadata filteringSecurity & GovernanceUnderstanding of data privacy, air-gapped deployments, and enterprise security requirementsExperience implementing access controls and audit loggingNice to HaveExperience with LangChain or LlamaIndexExposure to Rust, Go, or C++ for high-performance servicesFamiliarity with Docker and Kubernetes for on-prem deploymentsKnowledge of inference frameworks (e.g., vLLM, llama.cpp, Hugging Face Transformers) Prior work in regulated or enterprise environmentsDeliverablesReference architecture and deployment guidanceWorking prototype (LLM + vector DB + RAG)Documentation and knowledge transfer to internal teams