Sr Full Stack Engineer – Generative AI & Python
ARCHIVED
We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.
Dice is the leading career destination for tech experts at every stage of their careers. Our client, InfoVision, Inc., is seeking the following. Apply via Dice today!Job title: Full Stack DeveloperLocation: Irving, TXDuration: Long-termROLE SUMMARY:We are seeking a Full Stack Developer – AI & Cloud to design, build, and deploy scalable enterprise applications at the intersection of Java/Python server-side development, AWS cloud services, and AI/LLM edge deployments.KEY RESPONSIBILITIES:Design and develop robust server-side applications and RESTful microservices using Java (Spring Boot) and Python, ensuring scalability, security, and high availability across distributed systems.Architect and deploy cloud-native solutions on AWS leveraging services including Lambda, ECS, API Gateway, SageMaker, S3, and EventBridge.Fine-tune open-weight LLM models (e.g., LLaMA, Mistral, Phi) using frameworks such as Hugging Face PEFT and LoRA for domain-specific enterprise use cases.Deploy and manage AI/LLM inference runtimes on edge devices including laptops, on-premise servers, and network routers using tools such as Ollama, llama.cpp, or TensorRT-LLM.Build and maintain CI/CD pipelines for containerized microservices and edge AI model deployments using Docker, Kubernetes, and AWS DevOps tooling.Conduct code reviews, contribute to architectural decisions, and mentor junior engineers on AI-integrated full stack development practices.REQUIRED QUALIFICATIONS:10+ years of full stack development experience with strong server-side proficiency in Java (Spring Boot) and Python.Telecom Industry experience is a must.Hands-on experience building and deploying microservices on AWS, including services such as Lambda, ECS, API Gateway, and SageMaker.Demonstrated experience fine-tuning LLM models using Hugging Face Transformers, PEFT, or LoRA.Proven ability to deploy and optimize LLM inference on edge devices (CPU/edge GPU) using runtimes such as Ollama, llama.cpp, or ExecuTorch.Proficiency with containerization and orchestration tools including Docker and Kubernetes.Strong understanding of RESTful API design, event-driven architectures, and distributed microservices patterns.