JOBSEARCHER

Principal, AI Platform Engineering

Over the last 20 years, Ares’ success has been driven by our people and our culture. Today, our team is guided by our core values – Collaborative, Responsible, Entrepreneurial, Self-Aware, Trustworthy – and our purpose to be a catalyst for shared prosperity and a better future. Through our recruitment, career development and employee-focused programming, we are committed to fostering a welcoming and inclusive work environment where high-performance talent of diverse backgrounds, experiences, and perspectives can build careers within this exciting and growing industry.OverviewJob DescriptionWe are seeking an exceptional Principal AI Platform Engineer to design and build an enterprise-grade generative AI platform from the ground up. This is a leadership role that combines deep technical expertise in AI systems architecture with the strategic vision to shape how our organization scales AI capabilities across all business domains. You will architect a comprehensive platform spanning model gateways, retrieval services, model registries, prompt libraries, and deployment pipelines—enabling teams across the firm to build, deploy, and operationalize AI applications with confidence, compliance, and security.Key ResponsibilitiesPlatform Architecture & DesignDesign and build a foundational AI platform that enables secure, scalable, and compliant generative AI across the enterpriseArchitect multi-LLM gateway capabilities to support diverse model providers, allowing teams to leverage best-of-breed models for different use casesEstablish platform standards and patterns that balance flexibility, safety, governance, and performanceCore Platform ComponentsDevelop multi-LLM gateway: unified interface for accessing multiple LLM providers with load balancing, fallback handling, and cost optimizationBuild RAG (Retrieval-Augmented Generation) retrieval services: enterprise search, semantic indexing, and document retrieval at scaleCreate model registry and governance: centralized catalog of models, versions, fine-tuning metadata, performance metrics, and compliance trackingDesign prompt library and version control: organizational repository for prompts with testing, evaluation, and A/B testing capabilitiesImplement Model Context Protocol (MCP) gateway: enable secure integration between AI applications and external tools, APIs, and data sourcesBuild FinOps infrastructure: cost tracking, optimization, and allocation across models, usage patterns, and business unitsAgent-to-Agent (A2A) WorkflowsDesign orchestration framework for complex, multi-step AI workflows across applicationsEnable reliable, scalable execution of chained AI operations with state management and error recoveryIntegrate with broader data ecosystem for workflow triggers and data pipelinesData Gateway IntegrationPartner with data platform teams to design AI-native data access patternsEnable secure, governed access to enterprise data and RAG and model trainingBuild metadata and lineage tracking for AI-consumed dataDeployment & DevOpsDesign sandbox-to-production pipelines: safe, repeatable processes for testing and deploying AI applicationsImplement CI/CD for AI models: versioning, testing, promotion, and rollback capabilitiesBuild observability and monitoring: telemetry, performance metrics, cost tracking, and compliance auditingEstablish disaster recovery and high-availability patternsCollaboration & EnablementWork closely with Data Products team to align platform capabilities with data governance and analytics infrastructurePartner with AI Enablement teams to provide tools, SDKs, documentation, and best practices that democratize AI developmentLead technical discussions on platform strategy, roadmap, and trade-offs across the organizationBuild internal developer experience and platform adoptionSecurity Architecture & ImplementationDesign and implement comprehensive security architecture aligned with firm cyber and information security guidelinesBuild authentication and authorization frameworks: role-based access control (RBAC), attribute-based access control (ABAC), and service-to-service authenticationImplement encryption standards: encryption at rest (AES-256 or equivalent) and in transit (TLS 1.2+) for all sensitive dataDesign secure API gateways and service boundaries with rate limiting, request validation, and DDoS protectionImplement secrets management: secure storage and rotation of credentials, API keys, and certificatesBuild comprehensive audit logging and monitoring: all access, modifications, and security events logged with immutable audit trailsPartner with Infosec and Security Operations to implement continuous security monitoring and threat detectionGovernance, Compliance & Risk ManagementEnsure platform compliance with regulatory requirements: SOC 2 Type II, data residency, and audit trailsImplement data governance: classify data sensitivity levels, enforce data handling policies, and ensure appropriate access controlsBuild model governance: track model provenance, versioning, training data lineage, and approval workflows for production deploymentPrevent data exfiltration and prompt injection attacks through input validation, output filtering, and rate limitingEstablish responsible AI practices: bias detection, fairness assessment, and explainability requirementsManage third-party vendor security: assess LLM provider security postures, data processing agreements, and compliance certificationsCreate model risk assessment framework: evaluate models for regulatory, market, and operational risks before production deploymentWork with Compliance, Legal, and Risk teams to ensure platform meets all governance requirements and documentation standardsRequired Qualifications10+ years of software engineering experience, with 5+ years building large-scale, distributed systems or platform infrastructure3+ years of hands-on experience with generative AI, LLMs, RAG systems, or AI infrastructure—either in production systems or applied researchDeep expertise in one or more: Python, Go, Rust, or Java; experience building APIs and orchestration systemsStrong understanding of LLM architectures, prompting strategies, fine-tuning, and RAG design patternsDemonstrated experience with: model serving (vLLM, Ollama, TensorFlow Serving), vector databases, and embedding modelsProficiency in cloud platforms (AWS, GCP, Azure) and containerization/orchestration (Docker, Kubernetes)Experience designing and building multi-tenant, secure platform systems with strong governance and observabilityDemonstrated expertise in security: architecture, secure coding practices, authentication/authorization, encryption, and threat modelingExperience with compliance frameworks and security certifications: SOC 2, ISO 27001, GDPR, or similarTrack record of leading technical initiatives from architecture through production deploymentExcellent communication skills; ability to explain complex technical and security concepts to executives and cross-functional teamsPreferred QualificationsExperience in financial services, private equity, or alternative assets technology environmentsFamiliarity with LangChain, LlamaIndex, or similar AI orchestration frameworksExperience with MLOps tools and practices: model versioning, feature stores, experiment trackingKnowledge of eval frameworks, retrieval evaluation, or AI model benchmarkingExperience with data governance platforms or metadata management systemsExperience building zero-trust architectures or implementing security controls in cloud-native environmentsContributions to open-source AI/ML projects or publications in the AI/ML spaceExperience in building developer platforms or internal tools that drive organizational adoptionReporting RelationshipsCompensationThe anticipated base salary range for this position is listed below. Total compensation may also include a discretionary performance-based bonus. Note, the range takes into account a broad spectrum of qualifications, including, but not limited to, years of relevant work experience, education, and other relevant qualifications specific to the role.$300,000 - $350,000The firm also offers robust Benefits offerings. Ares U.S. Core Benefits include Comprehensive Medical/Rx, Dental and Vision plans; 401(k) program with company match; Flexible Savings Accounts (FSA); Healthcare Savings Accounts (HSA) with company contribution; Basic and Voluntary Life Insurance; Long-Term Disability (LTD) and Short-Term Disability (STD) insurance; Employee Assistance Program (EAP), and Commuter Benefits plan for parking and transit.Ares offers a number of additional benefits including access to a world-class medical advisory team, a mental health app that includes coaching, therapy and psychiatry, a mindfulness and wellbeing app, financial wellness benefit that includes access to a financial advisor, new parent leave, reproductive and adoption assistance, emergency backup care, matching gift program, education sponsorship program, and much more.There is no set deadline to apply for this job opportunity. Applications will be accepted on an ongoing basis until the search is no longer active.