JOBSEARCHER

IoT Cloud Application Architect

NextgenprosRoswell, GAApril 12th, 2026
Position: IoT Cloud Application ArchitectLocation: Roswell, GA (Onsite)Type: ContractDuration: 12+ monthsExperience: 10+ yearsThe EMS IoT Lead will own system-level triage, debugging, RCA, and resolution across the IoT ecosystem — device, firmware, cloud, and mobile — with a strong focus on US-time incident response and platform stability.We are specifically looking for candidates who:Can own end-to-end system behavior across device → firmware → cloud → mobileHave strong experience in debugging production systems, performing deep RCA, and driving fixes to closureAre comfortable identifying the right resolution strategy (hotfix, config change, rollback, or escalation)Can work closely with offshore engineering teams to drive executionCan communicate clearly and consistently with Customer Support, NOC, and Marketing teams during incidentsHave hands-on exposure to IoT and cloud ecosystems (ClearBlade, AWS, MQTT, Datadog, MongoDB, etc.)Ideal candidate profilePlease prioritize candidates who have grown from Senior Engineer / Tech Lead into Architect or System Owner roles, with real-world experience handling production incidents, platform stability, and cross-functional coordination.About the RoleWe are seeking an experienced EMS IoT Lead / Architect to provide system-level technical ownership for a large-scale connected IoT ecosystem. This role is responsible for platform stability, production issue resolution, and continuous improvement across devices, firmware, cloud services, mobile applications, and integrations.The EMS IoT Lead operates as the primary engineering authority during production incidents, leading triage, debugging, root cause analysis (RCA), fix strategy selection, and closure coordination. The role requires close collaboration with engineering, operations, customer support, and customer-facing teams to ensure reliable and consistent user experiences.Key ResponsibilitiesSystem-Level IoT OwnershipOwn end-to-end technical accountability across the full IoT stack:Device connectivity and telemetryFirmware behavior and state managementCloud services and data pipelinesMobile applications and APIsUnderstand and debug end-to-end connectivity flows from device and firmware through cloud platforms to mobile applicationsDiagnose issues related to connectivity failures, message loss, latency, retries, state synchronization, and data inconsistenciesPrioritize issues based on customer impact, severity, and recurrence, not component boundariesIncident Management & US-Time ResponseAct as the primary engineering escalation point during US business hoursLead real-time investigation and response for:Production incidentsNOC escalationsCustomer-facing issuesEvaluate and select the most appropriate resolution strategy, including:HotfixesConfiguration changesRollbacksPermanent code fixesDrive rapid mitigation to stabilize incidents while minimizing customer impactDebugging, RCA & Resolution LeadershipLead deep debugging and root cause analysis across distributed systemsAnalyze logs, telemetry, metrics, and traces across device, cloud, and application layersDetermine whether issues can be resolved via:Tactical fixesOperational or configuration changesArchitectural or design changesDrive fixes to completion, coordinating development, validation, deployment, and verification until issues are fully resolved in productionEnsure all resolved issues include clear RCA documentation and corrective actionsCross-Functional & Offshore Team CollaborationWork closely with:Cloud engineering teamsMobile engineering teamsFirmware and platform teamsCollaborate with offshore engineering teams, providing:Clear RCA contextTechnical directionExecution prioritiesEnable effective follow-the-sun execution while maintaining ownership and continuityCustomer Support & Stakeholder CommunicationPartner closely with Customer Support and NOC teams during incidents and escalationsCommunicate issue status, impact, and resolution progress clearly and consistentlyCoordinate with Marketing and customer-facing teams to support accurate and aligned customer messaging during incidents or service degradationsEnsure timely and transparent communication throughout the issue lifecycleEscalation & GovernanceEscalate issues to core engineering or product teams only when they cannot be resolved through EMSPrepare high-quality escalation packages, including:Completed RCAReproduction stepsImpact assessmentDesign or architectural considerationsMaintain tracking and visibility of escalated issues through closureProcess Improvement & Platform StabilityEstablish and enforce standards for:Issue intake qualityTriage consistencyRCA documentationClosure and communicationAnalyze trends and recurring issues to identify systemic risksDrive continuous improvements to reduce incident frequency and improve platform reliabilityTechnology EnvironmentThe role requires hands-on familiarity with modern IoT and cloud platforms, including:Cloud & PlatformAWS (compute, networking, deployments, monitoring)ClearBlade IoT PlatformDatadog (logs, metrics, tracing, incident analysis)MongoDB or similar NoSQL databasesIoT & MessagingMQTT-based device communicationDevice telemetry and command/control patternsUnderstanding of firmware interaction and device lifecycle conceptsApplications & APIsRESTful and event-driven APIsMobile application interaction with cloud platformsFamiliarity with iOS/Android release lifecycles and crash analysis conceptsQualificationsRequiredBachelor’s degree in Computer Science, Engineering, or related field10+ years of experience in IoT platforms, distributed systems, or cloud-native architecturesProven experience in:Production incident managementDebugging complex system issuesRoot cause analysisStrong communication and decision-making skills under pressurePreferredExperience with large-scale IoT or connected device platformsBackground in telecom, industrial IoT, or consumer electronics ecosystemsExperience working with NOC, customer support, and customer-facing teamsSuccess MetricsSuccess in this role will be measured by:Reduction in incident resolution time (MTTR)Improved platform stability and reliabilityReduction in recurring and systemic issuesQuality of RCA and fix executionEffective collaboration across engineering, operations, and support teamsWhy This Role MattersThis role plays a critical part in ensuring:Reliable and consistent customer experiencesStable and scalable IoT platform operationsEngineering teams can focus on innovation while production stability is maintainedThis position is ideal for a senior technologist who thrives in system ownership, problem-solving, and operational leadership within complex IoT environments.