Cloudera Public Cloud Platform Engineer
NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.We are currently seeking a Sys. Integration Sr. Specialist Advisor to join our team in Plano, Texas (US-TX), United States (US).Job Description:Job SummaryWe are seeking a highly skilled Cloudera Public Cloud Platform Engineer to operate and manage the end-to-end CDP platform ecosystem, including data services, NiFI, Kafka, AI/ML platforms, and enterprise observability.This role is responsible for ensuring availability, scalability, security, and performance of all platform services supporting data, analytics, and AI workloads across environments.The ideal candidate brings strong expertise in CDP on-prem, public cloud services, cloud infrastructure, Kubernetes-based runtime environments, and platform observability, supporting high-concurrency, mission-critical workloads at multi-terabyte to petabyte scaleThis role is critical to ensuring uninterrupted operation of data, analytics, and AI platforms—any degradation directly impacts downstream business reporting, data pipelines, and model execution.Key ResponsibilitiesCDP Platform & Multi-Service OperationsOwn end-to-end operational responsibility for Cloudera Public Cloud services across Dev / Stage / UAT / Prod:CDE, CDW, COD, CDL, CDF (NiFi), CDV, CAI, KafkaEnsure multi-cluster stability, workload isolation, and SLA adherenceSupport onboarding and operations of multiple applications across environmentsManage and support multi-environment, multi-cluster deployments with strict isolation, governance, and release coordination across Dev/UAT/ProdAI/ML Platform OperationsOperate and support Cloudera AI (CAI) environments:AI Workbenches, AI StudiosModel training and development environmentsAI inference endpoints and model servingTroubleshoot:Resource contention (CPU/GPU)Model deployment/runtime failuresCDP Runtime & Kubernetes-Aware OperationsOperate CDP services running on Cloudera-managed Kubernetes infrastructureApply strong understanding of containerized workloads and Kubernetes concepts for troubleshootingDiagnose and resolve:Pod failures, restarts, and resource contentionSpark job failures in containerized environments (CDE)Service-to-service communication issuesAnalyze logs and metrics to identify runtime failures and performance issuesCollaborate with Cloudera support for managed service-level issuesData Integration & Platform ServicesOperate and support:CDF (NiFi) for ingestion pipelinesCDV (Data Visualization) for reporting workloadsOctopai for data lineage and catalog integrationEnsure reliability and performance of data pipelines and integrationsMonitor and troubleshoot Kafka environments:Topic configurations, partitions, and replicationConsumer lag and throughput issuesBroker connectivity and performance bottlenecksSecurity, Governance & SDX AdministrationImplement and manage:Kerberos, TLS/SSL, Ranger policiesAdminister SDX for:Centralized securityMetadata and policy enforcementSupport Atlas and Octopai integrationManage and troubleshoot user access and identity mapping across layers, including:Cloud IAM roles and permissionsCDP users/groups and identity providersRanger policies for fine-grained data accessResolve access-related issues impacting:Data access (S3/ADLS)Query execution (CDW/CDE)Application and service-level permissionsCloud Infrastructure & NetworkingTroubleshoot:S3 / ADLS storage issuesIAM roles and permissionsVPC, subnets, routing, security groupsBastion host access and connectivityEnsure secure and reliable connectivity across servicesUnderstand and troubleshoot S3-based data lake patterns, including:Bucket structure, prefix design, and access patternsPerformance issues related to small files, request rates, and throughput limitsEncryption (SSE-S3, SSE-KMS) and access policiesManage and troubleshoot cross-account IAM roles and access patterns for CDP environmentsEnsure secure access between:CDP environments and cloud resourcesMultiple AWS accounts (dev/prod separation)Disaster Recovery & ResiliencySupport and validate disaster recovery and failover strategies across CDP environmentsEnsure backup, recovery, and environment resiliency for critical workloadsParticipate in DR drills and recovery validationObservability, Monitoring & Alerting (Critical)Implement and manage end-to-end observability:Metrics, logs, and alertingUse:Cloudera observability, Cloudera Manager, Prometheus, GrafanaMonitor:Cluster healthWorkload performanceAI inference endpointsEnable proactive issue detection and preventionDefine and implement SLIs/SLOs and alerting thresholds to ensure platform reliability and performanceSupport high-severity (P1/P2) incident response, triage, and resolution within defined SLAsOperational Support & On-CallParticipate in on-call rotation to support 24/7 platform operationsRespond to production incidents, alerts, and service disruptions within defined SLAsHandle P1/P2 incidents, including triage, troubleshooting, and resolutionPerform root cause analysis (RCA) and implement preventive measuresUpgrades, Patching & Platform LifecycleExecute:CDP upgrades and version managementSecurity patches and hotfixesPerform:Rolling upgradesValidation and rollback strategiesPerformance Optimization & Cost EfficiencyOptimize:Platform-level performance (Spark, Hive, Impala workloads)Cluster utilization and workload distributionDrive:Autoscaling strategiesCost optimization (FinOps practices)Automation & Operational ExcellenceUtilize and support existing automation frameworks for:Platform provisioningMonitoring and alertingRoutine operational tasksWork with infrastructure teams that manage Infrastructure-as-Code (Terraform) for environment setup and changesLeverage scripting (Python / Shell) for:Operational supportTask automationTroubleshooting and diagnosticsMaintain and follow runbooks, SOPs, and operational procedures to ensure consistent platform operationsRequired SkillsStrong experience with Cloudera CDP Public CloudExpertise in:Cloud platforms (AWS/Azure/GCP)Kubernetes concepts (troubleshooting-focused)Hands-on with:CDE, CDW, CDF (NiFi), CAIknowledge of:IAM, networking, observability toolsPlatforms operating at multi-terabyte to petabyte scale with high concurrency workloadsHands-on experience with:Kafka (or similar streaming platforms) including monitoring, troubleshooting, and performance tuningExperience with Cloudera CDP CLI (Command Line Interface) for:Platform operations and administrationJob execution and service management (CDE/CDW/CDL)Automation of routine operational tasksStrong working knowledge of:Cloud IAM (AWS IAM / Azure AD) including roles, policies, and cross-service accessUser and group mapping across CDP, cloud IAM, and Ranger policiesTroubleshooting access issues across storage (S3/ADLS), CDP services, and data access layersPreferred SkillsExperience with:Modernization of legacy data platforms/applications to Cloudera CDP Public CloudMigration and onboarding of workloads to CDE, CDW, and CAI environmentsSupporting hybrid or multi-environment transitions (on-prem → cloud)Familiarity with:Cloud platforms (AWS, Azure, GCP) including storage, IAM, and networking conceptsKubernetes-based runtime environments (troubleshooting-focused)Strong scripting and automation skills (Python, Shell, Terraform) for platform operationsWhat You'll Work OnEnterprise-scale Cloudera CDP platform supporting data engineering, analytics, and AI workloads across multiple applicationsModernization of legacy platforms and applications into cloud-native CDP servicesOperational support and scaling of:Data services (CDE, CDW, CDF, CDL)AI/ML platforms (CAI, inference, workbenches)Platform performance optimization, observability, and reliability engineering for mission-critical workloadsAbout NTT DATA:NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at us.nttdata.comNTT DATA endeavors to make https://us.nttdata.com accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-us. This contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here. If you'd like more information on your EEO rights under the law, please click here. For Pay Transparency information, please click here.