Splunk Architect
Job title: - Splunk Architect Location: - 100% Remote Role Duration: - 12 Months Top 5 hard skills: Splunk Architecture & Administration Core Competencies:Design and maintain distributed Splunk deployments (search heads, indexers, forwarders, deployers)Manage indexer clustering and search head clustering for high availabilityConfigure data inputs, parsing, and index managementImplement role-based access control (RBAC) and authentication integrationPerformance tuning and capacity planningData Onboarding:Design and implement data onboarding strategies for diverse data sourcesCreate and maintain props.conf and transforms.conf for data parsing and routing Develop source type definitions and field extractionsConfigure input specifications and monitor data quality post-onboarding o Establish data retention policies and index lifecycle managementSplunk HTTP Event Collector (HEC):Configure and manage HEC endpoints for REST API-based data ingestionImplement HEC tokens with appropriate permissions and index routingTroubleshoot HEC connectivity, authentication, and data formatting issuesScale HEC deployments for high-volume event ingestionIntegrate cloud-native applications and serverless functions with HECSplunk DB Connect:Install, configure, and maintain DB Connect app across search headsCreate database connections and manage JDBC drivers for various database types Design and schedule database inputs (rising column, batch, and tail inputs)Optimize SQL queries for performance and minimize database loadConfigure database identity management and credential securityTroubleshoot connection issues, query timeouts, and data ingestion gapsRelevance:Essential for maintaining platform health, scalability, ensuring data availability across the enterprise, and enabling seamless integration of diverse data sources into the Splunk ecosystemAWS Infrastructure & Services Core Competencies:Deploy and manage EC2 instances for Splunk components with proper sizing Configure VPCs, security groups, NACLs, and networking for secure Splunk communicationImplement EBS storage optimization and snapshot strategies for Splunk data Leverage S3 for SmartStore architecture and backup solutionsUse AWS Systems Manager, CloudWatch, and Auto Scaling for monitoring and automationRelevance: Critical for cost-effective, secure, and resilient infrastructure supporting enterprise-scale log aggregationInfrastructure as Code (IaC) & Automation Core Competencies:Terraform or CloudFormation for provisioning Splunk infrastructureAnsible, Puppet, or Chef for Splunk configuration managementPython/Bash scripting for custom automation tasksCI/CD pipeline integration (Jenkins, GitLab CI, GitHub Actions)Version control with Git for infrastructure and configuration codeRelevance: Enables repeatable deployments, reduces human error, and accelerates disaster recovery and scaling operationsMonitoring, Logging & Troubleshooting Core Competencies:Create Splunk monitoring dashboards and alerts for platform healthImplement log forwarding strategies using universal/heavy forwardersTroubleshoot data ingestion issues, search performance, and cluster health Integrate AWS CloudWatch metrics with Splunk for unified monitoringAnalyze Splunk internal logs (_internal, _introspection, _audit indexes)Relevance:Ensures platform reliability, rapid incident response, and proactive identification of issues before they impact usersSecurity & Compliance Core Competencies:Implement encryption in-transit (SSL/TLS) and at-rest for Splunk dataConfigure AWS IAM roles and policies following least-privilege principlesEnsure compliance with standards (PCI-DSS, HIPAA, SOC 2) for log dataImplement backup and disaster recovery proceduresSecure API access and credential management (AWS Secrets Manager, HashiCorp Vault)Relevance:Protects sensitive log data, maintains audit trails, and ensures regulatory compliance in enterprise environmentsCribl Stream & Cribl Edge –Data Pipeline Optimization Cribl Stream (LogStream)Competencies:Deploy and manage Cribl Stream architecture (Leader nodes, Worker nodes, Worker groups)Configure data sources and destinations for multi-platform routing (Splunk, S3, other SIEMs)Design and implement pipelines for data transformation, enrichment, and reduction Create routes and filters to optimize data flow and reduce ingestion costsImplement data sampling, aggregation, and redaction for compliance and cost savingsConfigure event breakers, parsers, and field extractions within CriblManage Cribl packs for pre-built data optimization solutionsIntegrate Cribl Stream with Splunk HEC and S3 for hybrid storage strategiesMonitor pipeline performance and troubleshoot data flow issuesImplement GitOps workflows for Cribl configuration management Cribl EdgeCompetencies:Deploy and manage Cribl Edge fleets for distributed edge data collectionConfigure Edge nodes as lightweight agents replacing traditional forwarders Implement centralized management of Edge fleets through Cribl Cloud or Stream LeaderCollect data from edge sources (logs, metrics, Windows events, syslog)Perform edge-side data processing to reduce bandwidth and central processing load Configure auto-discovery and dynamic data source managementManage Edge node updates, configuration versioning, and fleet-wide deployments Monitor Edge node health and connectivity across distributed environments Implement edge-to-cloud data routing strategies for hybrid architecturesIncident Management & Service Request Support Core Competencies: Incident Response:Triage and respond to platform incidents following ITIL or similar frameworksDiagnose and resolve P1/P2 incidents affecting Splunk availability or data ingestionPerform root cause analysis (RCA) and create post-incident reportsCoordinate with cross-functional teams during major incidentsImplement corrective and preventive actions to reduce incident recurrenceMaintain on-call rotation and provide 24/7 platform supportService Request Management:Process user access requests (account creation, role assignments, permission changes)Handle data onboarding requests for new applications and data sourcesFulfill infrastructure change requests (index creation, retention policy updates, capacity expansion)Coordinate app installations and updates on search heads o Provision and configure new forwarders, HEC tokens, or DB Connect inputsCreate custom dashboards and reports based on user requirementsTicket Management & Communication:Utilize ticketing systems (ServiceNow, Jira Service Management, Remedy) Document troubleshooting steps and resolution proceduresMaintain SLA compliance for incident response and service request fulfillment Communicate effectively with stakeholders on status updates and timelinesCreate and maintain knowledge base articles for common issuesEscalate complex issues to vendors (Splunk Support, AWS Support) when necessaryProactive Support:Conduct health checks and performance reviews o Identify trending issues and implement preventive measuresProvide user training and guidance on Splunk best practicesParticipate in change advisory board (CAB) meetings for platform changesRelevance:Ensures rapid resolution of platform issues, maintains high availability and user satisfaction, and provides structured support that aligns with enterprise IT service management practicesEssential for maintaining operational excellence and meeting business-critical