Job Opportunity for Principal Azure Capacity Manager with our Banking Client
Occupations:
Computer and Information Systems ManagersComputer Systems Engineers/ArchitectsComputer Systems AnalystsNetwork and Computer Systems AdministratorsGeneral and Operations ManagersIndustries:
Office Administrative ServicesEmployment ServicesOther Professional, Scientific, and Technical ServicesBusiness Support ServicesOther Financial Investment ActivitiesJob Title: Principal Azure Capacity ManagerLocation: New York, NY 10286 - (Hybrid - 4 days onsite 1 day remote)Duration: 12 Months +(Possible extension)Pay Range: $80 - $90/hr W2Overview:Principal Azure Capacity Manager (Consultant) to lead capacity planning and optimization for an Azure public cloud project operating to High requirements. This role ensures adequate, resilient capacity and buffer across compute, storage, network, and platform services; supports Site Reliability Engineers (SREs) with performance and reliability goals; and drives evidence-based compliance with program High control expectations.•Own the end-to-end Capacity Management operating model for Azure services in scope of the High program—planning, modeling, forecasting, monitoring, tuning, and governance.•Ensure sufficient capacity and engineered buffer to meet service-level objectives (SLOs), recovery objectives (RTO/RPO), and regulatory/contractual requirements, with particular focus on U.S.-only region restrictions and continuous monitoring.•Partner closely with SREs to operationalize capacity practices through IaC, gated change control, performance baselines, autoscaling policies, and resilience patterns.•Contribute to documentation and evidence (e.g., SSP updates, control narratives, POA&M items, continuous monitoring artifacts).Core responsibilities•Capacity Planning & OptimizationoBuild and maintain service-level capacity models, App Services, databases, storage, messaging, networking, Key Vault/HSM, and other Azure/PaaS components.oEstablish buffer standards (e.g., N+1/N+2, % headroom, runway-in-weeks) per service criticality; validate against demand curves, failover scenarios, and maintenance events.oImplement autoscaling strategies (horizontal/vertical) with guardrails on quotas and throttling; tune scaling triggers based on SLI/SLOs (latency, error rate*** saturation).oRun baseline and trend analysis on utilization, throughput, and performance; convert findings into actionable tuning, reservations/savings plans, and architecture changes.oForecast demand from product roadmaps, release plans, and business growth; translate into capacity plans and procurement reservations with lead-time and runway targets.•Change & Configuration ManagementoParticipate in CABs as the capacity and security representative; enforce gated approvals tied to documented capacity/security impact analyses .oKeep cryptographic mechanisms (e.g., Key Vault, managed HSM) under configuration management with versioned inventories and approved FIPS-validated modules.oDocument capacity-related changes with confidentiality/integrity/availability assessments; update SSP and system documentation when changes affect control implementationsoIntegrate vulnerability and flaw remediation tracking with capacity risk considerations; maintain POA&M entries and support continuous monitoring evidence.•Secure System/Service Acquisition & Region RestrictionsoEnsure external services supporting capacity (e.g., third-party telemetry or scaling tools) conform to required requirements with documented oversight and continuous monitoringoEnforce U.S./U.S. Territories-only processing, storage, logging, backups, DR, and support operations for High impact systems; validate region selection in capacity plansoContribute to reviews of policies and procedures relevant to capacity (SA-1).•Demand Management & Financial StewardshipoBalance performance, resilience, and cost: leverage Reservations/Savings Plans, rightsizing, storage tiering, and scheduled scaling; maintain transparent reporting to stakeholders.oIntegrate capacity signals into incident/change/problem processes; drive proactive capacity adjustments before risk materializes in production.•Resilience, DR, & Performance EngineeringoPerform criticality analysis to prioritize capacity for high-critical components; align hardening, monitoring, backup/DR, and buffer policies to criticality tiers.oValidate DR capacity (warm/cold/hot) for failover scenarios; ensure buffer and quotas are reserved for disaster events without impacting steady-state performance.•Metrics & ReportingoDefine and publish capacity KPIs: utilization, saturation, headroom %, runway weeks, scaling efficacy, quota consumption, DR readiness, cost-to-performance efficiency.oProvide dashboards and executive-ready reports for continuous monitoring submissions, audits, and program governance.Must Have Skills:•Bachelor's degree in computer science or related discipline; advanced degree preferred.•10–12+ years in infrastructure capacity/performance engineering across compute, storage, network, and platform services; financial services experience is a plus.•Demonstrated experience operating in regulated environments; familiarity with FedRAMP High concepts and evidence requirements.•Strong data analysis skills; capable of translating telemetry and forecasts into clear decisions and stakeholder communications.•Process orientation with disciplined change/config management; ability to manage upstream/downstream dependencies across IT operations and finance.•Experience coordinating cross-functional engineering teams and aligning delivery across multiple platforms and tools.•Familiarity with Azure services and concepts (e.g., Entra ID, managed identities, Azure SQL/MI, storage, networking, policies, RBAC) from a PM perspective.•Excellent communication, stakeholder management, and executive-facing presentation skills.•Ability to translate complex technical requirements into clear plans, milestones, and measurable outcomes.•Azure capacity ecosystem: Monitor/Log Analytics/Metrics, Advisor, Cost Management, Reservations/Savings Plans, quotas/limits management.•Compute/container scaling: AKS, VMSS, App Service; HPA/VPA, autoscaling policies; performance testing (k6/JMeter); observability (Prometheus/Grafana).•Storage and database performance: tiering, IOPS/throughput planning, caching, indexing, and connection management.•Networking and security capacity: Azure Firewall, NSGs, private endpoints, Bastion; throughput/latency planning and allow-listing discipline.•Cryptography services: Key Vault, managed HSM; FIPS-validated modules; key lifecycle capacity considerations.•IaC and config management: Terraform/Bicep/ARM; Ansible/Chef; integration with gated CI/CD.•Governance: Azure Policy/Blueprints/Initiatives for configuration baselines and region restrictions; SSP and evidence artifact production.•Building capacity models for multi-region architectures with strict U.S.-only constraints.•DR planning and execution with validated failover capacity and documented evidence.•POA&M management and continuous monitoring submissions in a FedRAMP context.•Collaboration with SREs on SLI/SLOs, error budgets, and reliability patterns.