JOBSEARCHER

Staff Software Engineer - Supernal

Staff Software EngineerAbout SupernalSupernal helps small-to-medium businesses hire their first AI employee. Our AI teammates are built using intelligent, agentic workflows deployed on a proprietary platform. We deliver working, value-generating AI Employees—not tools—that handle real business processes alongside human teams.The RoleWe're looking for a Staff/Principal Software Engineer to own and evolve the core platform that powers our AI employees. This is a technical leadership position responsible for the systems that enable our agents to scale reliably: the Django backend, distributed task infrastructure, event-driven architecture, Kubernetes deployments, and observability stack.You'll work across the full system—from database query optimization to Helm chart tuning to designing new platform abstractions. You'll be a force multiplier for the engineering team, driving architectural decisions, eliminating scaling bottlenecks, and establishing patterns that make the platform more robust and developer-friendly.This role reports to the Director of Engineering and involves significant autonomy in shaping technical direction.What You'll OwnDrive platform architecture decisions and align the team on scalable patterns and long-term maintainabilityReview a high volume of code, design docs, and architectural proposals for scalability, reliability, security, and operabilityBe a technical mentor and force multiplier: unblock engineers, raise the bar on production readiness, and establish platform best practicesOwn and evolve the core backend platform (Django/DRF/ASGI) performance and correctnessScale async execution across Celery + Dramatiq + Temporal/Cortex; implement resilient workflow patterns (retries, circuit breakers, graceful degradation)Optimize PostgreSQL/pgvector (query tuning, connection pooling) and caching strategiesMaintain and improve Kubernetes deployment infrastructure (GKE, Helm, Terraform/OpenTofu) and CI/CD + rollout strategies. Own KEDA autoscaling policies and resource allocation across worker pools.Own reliability of RabbitMQ, Redis, and PostgreSQL infrastructure; lead incident response and post-mortemsExtend OpenTelemetry + Datadog instrumentation, dashboards, alerts, and SLOs; profile and reduce latency/memory bottlenecksWhat We're Looking ForRequired10+ years building and operating production backend systems at scaleDeep expertise in Python (Django preferred) and relational databases (PostgreSQL)Hands-on experience with Kubernetes, Helm, and cloud infrastructure (GCP preferred)Strong background in distributed systems: message queues, event sourcing, workflow orchestrationProduction experience with async task systems (Celery, Dramatiq, or similar)Track record of debugging complex production issues across multiple servicesAbility to work autonomously and drive technical initiatives without close supervisionClear technical communication—able to explain tradeoffs and build consensusPreferredExperience with Temporal or similar workflow enginesBackground in LLM infrastructure, RAG systems, or AI/ML platformsFamiliarity with OpenTelemetry, Datadog, or similar observability stacksExperience with KEDA or other Kubernetes autoscaling solutionsContributions to multi-tenant SaaS platform architectureHistory of improving developer experience and platform abstractionsWhat Success Looks LikePlatform services maintain high availability with predictable performance under loadScaling bottlenecks are identified and resolved proactivelyNew features ship faster because platform primitives are well-designed and documentedIncidents are rare, quickly detected, and thoroughly addressedEngineers across the team adopt platform patterns and best practicesTechnical debt is systematically identified and paid downYou're a trusted technical voice in architectural discussionsCompensation & LogisticsCompensation: Competitive salary commensurate with experience (Staff/Principal level)Location: RemoteType: Full-timeRequirements: Overlap with Americas timezones for collaboration; reliable high-speed internet