Databricks Systems Engineer
We are looking for a Databricks/Cloud Engineer to work onsite in Washington DC (near Farragut North Metro). This person will act as the hands-on technical owner of the Databricks platform supporting the Enterprise Data Platform and will be responsible for platform operations, security and governance configuration from end-to-end - ensuring the environment is compliant, reliable, cost-controlled and enables secure analytics and AI/ML workloads at scale.Duties/ResponsibilitiesAdminister Databricks account and workspaces across SDLC environments; standardize configuration, naming, and operational patterns.Configure and maintain clusters/compute, job compute, SQL warehouses, runtime versions, libraries, repos, and workspace settings.Implement platform monitoring/alerting, operational dashboards, and health checks; maintain runbooks and operational procedures.Provide Tier 2/3 operational support: troubleshoot incidents, perform root-cause analysis, and drive remediation and preventive actions.Manage change control for upgrades, feature rollouts, configuration changes, and integration changes; document impacts and rollback plans.Enforce least privilege across platform resources (workspaces, jobs, clusters, SQL warehouses, repos, secrets) using role/group-based access patternsConfigure and manage secrets and secure credential handling (secret scopes / key management integrations) for platform and data connectivity.Enable and maintain audit logging and access/event visibility; support security reviews and evidence requests.Administer Unity Catalog governance: metastores, catalogs/schemas/tables, ownership,grants, and environment/domain patterns.Configure and manage external locations, storage credentials, and governed access to cloud object storage.Partner with governance stakeholders to support metadata/lineage integration, classification/tagging, and retention controls where applicable.Coordinate secure connectivity and guardrails with cloud/network teams: private connectivity patterns, egress controls, firewall/proxy needs.Configure cloud integrations required for governed data access and service connectivity (roles/permissions, endpoints, storage integrations).Implement cost guardrails: cluster policies, auto-termination, scheduling, workload sizing standards, and capacity planning.Produce usage/cost insights and optimization recommendations; address waste drivers (idle compute, oversized clusters, inefficient jobs).Automate administration and configuration using APIs/CLI/IaC (e.g., Terraform) to reduce manual drift and improve repeatability.Maintain platform documentation: configuration baselines, security/governance standards, onboarding guides, and troubleshooting references.Design and implement backup and disaster recovery procedures for workspace configurations, notebooks, Unity Catalog metadata, and job definitions; maintain recovery runbooks and perform periodic DR testing aligned to RTO/RPO objectives.Monitor and optimize platform performance, including SQL warehouse query tuning, cluster autoscaling configuration, Photon enablement, and Delta Lake optimization guidance (OPTIMIZE, VACUUM, Z-ordering strategies).Administer Delta Live Tables (DLT) pipelines and coordinate with data engineering teams on pipeline health, data quality monitoring, failed job remediation, and pipeline configuration best practices.Manage third-party integrations and ecosystem connectivity, including BI tool integrations (e.g., Power BI), and external metadata catalog integrations.Implement Databricks Asset Bundles (DABs) for standardized deployment patterns; automate workspace resource deployment (jobs, pipelines, dashboards) across SDLC environments using bundle-based CI/CD workflows.Conduct capacity planning and scalability analysis, including forecasting oncurrentuser/workload growth, platform scaling strategies, and proactive resource allocation during peak usage periods.Facilitate user onboarding and enablement, including new user/team onboarding procedures, training coordination, workspace access provisioning, and creation of self service documentation/guides Qualifications Hands-on experience administering Databricks (workspace administration, clusters/compute policies, jobs, SQL warehouses, repos, runtime management) and expertise using Databricks CLI.Strong Unity Catalog administration: metastores; catalogs/schemas; grants; service principals; external locations; storage credentials; governed storage access.Identity & Access Management proficiency: SSO concepts, SCIM provisioning, group based RBAC, service principals, least-privilege patterns.Security fundamentals: secrets management, secure connectivity, audit logging, access monitoring, and evidence-ready operations.Cloud platform expertise (AWS ): IAM roles/policies, object storage security patterns, networking basics (VPC concepts), logging/monitoring integration.Automation skills: scripting and/or IaC using Terraform/CLI/REST APIs for repeatable configuration and environment promotion.Experience implementing data governance controls (classification/tagging, lineage/metadata integrations) in partnership with governance teams.CI/CD practices for jobs/notebooks/config promotion across SDLC environments.Understanding of lakehouse concepts (e.g., Delta, table lifecycle management, separation of storage/compute).SQL proficiency and data engineering fundamentals for troubleshooting query performance issues, understanding ETL/ELT workflow patterns, and debugging data pipeline failures; basic Python/Scala familiarity for notebook/code issue diagnosis.Experience with compliance and regulatory frameworks (FedRAMP, HIPAA, SOC2, or similar) including implementation of data residency requirements, retention policies, and audit-ready evidence collection.Hands-on experience with AWS security and networking services including PrivateLink, Secrets Manager/Systems Manager integration, CloudWatch/CloudTrail integration, S3 bucket policies, cross-account access patterns, and KMS encryption key management.Experience administering Databricks serverless compute, Workspace Git integrations (GitLab), Databricks Asset Bundles (DABs) for deployment automation, and modern workspace features supporting DevOps workflows.SLA/SLO management and stakeholder communication skills; ability to define platform service levels, produce operational reports, translate technical issues to business stakeholders, and manage vendor relationships (Databricks account teams). Education / Experience/Certifications/Accreditations Bachelor's degree in a related field or equivalent practical experience.7+ years in cloud/data platform administration and operations, including 4+ years supporting Databricks or similar platforms.Databricks Platform Administrator/Databricks AWS Platform ArchitectDatabricks Certified Data Engineer Associate/ProfessionalAWS Certified Solutions Architect Associate or Professional (preferred)