JOBSEARCHER

AI Systems Engineer

AI Systems EngineerBoston, MA Onsite 4 days per weekRole SummaryJoin the AI Studio of an innovative construction industry client in Boston as an AI Systems Engineer, a hybrid role responsible for architecting and building both:The distributed systems backbone that powers enterprise-scale AI, andThe agentic and LLM-driven capabilities transforming construction workflowsThis role sits at the intersection of platform engineering and applied AI. You will design scalable APIs, event-driven services, and reliable infrastructure while also implementing multi-model AI agents, retrieval pipelines, and AI orchestration frameworks that operate in real-world production environments.You will help define how AI is built, deployed, observed, and scaled across the client's national operations.ResponsibilitiesAI & Agentic Systems Product Engineering & DeploymentDesign and implement production-grade RAG architecturesBuild and deploy multi-model AI agents leveraging AWS Bedrock and LLM providers (Claude, GPT, Llama, Titan, etc.)Implement dynamic model routing strategies based on task complexity, cost, and latencyDevelop multi-agent orchestration frameworks enabling collaborative workflows (planner, retriever, executor, summarizer)Design safe tool invocation patterns and guardrails for enterprise AI agentsOptimize inference pipelines for cost, performance, and reliabilityImplement evaluation frameworks to measure model performance, hallucination rates, and response qualityDesign fallback and degradation strategies for model outages or latency spikesDistributed Systems & Platform ArchitectureArchitect and evolve service-oriented and event-driven systems supporting AI workloadsDesign REST/GraphQL APIs with clear versioning, authentication, and backward compatibility strategiesImplement asynchronous processing pipelines using queues, event buses, and workflow orchestrationEnsure reliability through idempotent consumers, retry strategies, circuit breakers, and dead-letter queuesMake informed tradeoffs between relational, NoSQL, and vector storage systemsBuild services that are observable, traceable, and production-readyDefine and document architectural standards for AI platform servicesImplement LLMOps: cost monitoring, latency optimization, usage analytics, and model versioningEnforce security, governance, and access standards in line with enterprise policiesCollaboration & Technical LeadershipWork closely with product managers, site AI engineers, and data scientists to iterate rapidly in Agile sprintsCommunicate technical progress clearly to non-technical stakeholders; contribute to internal AI playbooks and templatesQualifications6+ years of professional software engineering experience (not including vibe coding)Demonstrated experience designing distributed or service-oriented systems in productionStrong backend engineering skills in Python, and at least one of Java, NodeJS, Rust or KotlinExperience building and deploying event-driven architectures (SNS/SQS, Kafka, EventBridge, etc.)Experience integrating LLMs into production systems (Bedrock, OpenAI, Anthropic, etc.).Hands-on experience with RAG pipelines, vector databases and building multi-agent AI systemsDeep understanding of:Distributed system failure modesAPI lifecycle managementConcurrency and consistency tradeoffsLLM cost, latency, and reliability constraintsTuning AI Agents for accuracy and performancePreferredExperience building internal AI platforms or shared infrastructureExposure to large-scale SaaS or mission-critical systemsExperience designing multi-agent or orchestration frameworksExperience with Databricks Lakehouse architecturePrior experience in construction, manufacturing, or operational industries

matching similar jobs near Roxbury, MA

VIEW MORE