JOBSEARCHER

Staff Software Engineer- Hardware Validation

ARCHIVED

We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.

OverviewStaff Software Engineer - Hardware ValidationLocation: Mountain View, CALightmatter is leading the revolution in AI data center infrastructure. The company invented the world's first 3D-stacked photonics engine, PassageTM, capable of connecting thousands to millions of processors at the speed of light in extreme-scale data centers for the most advanced AI and HPC workloads.In this role, you will lead the development of a comprehensive High Temperature Operating Life (HTOL) Test Software system. Your work will involve designing, implementing, and maintaining a scalable multi-chassis testing platform that performs automated stress and performance testing with real-time monitoring and comprehensive data collection capabilities.ResponsibilitiesSystem Design & Development: Architect, build, and maintain scalable architecture for a multi-chassis HTOL testing system.Orchestration: Develop containerized applications for deployment at scale using Python-based services for chassis coordination and management.Hardware Monitoring & Management: Create hardware abstraction layers and develop APIs that represent hardware systems, providing essential capabilities for monitoring and management of those systems.Manage Data: Develop data collection pipelines handling sensor data and performance metrics.Deploy and Update Software: Create automated deployment and testing pipelines using CI/CD best practices.Collaboration with Front-End Teams: Work closely with the frontend team to ensure seamless integration of backend APIs with applications.Testing & Documentation: Write automated tests to monitor the reliability and performance of the system; maintain clear and concise documentation for troubleshooting.Performance and Reliability: Continuously monitor and optimize performance to reduce response times and improve system scalability; ensure uptime in production environments; establish capacity planning procedures.QualificationsBS and 12+ years of experience or MS and 8+ years of experience; degree in Computer Science, Electrical Engineering, or related field.Expert level Python, knowledge of web frameworks such as FastAPI, Flask, Django; strong understanding of API design principles and best practices.Experience with containerization and orchestration technologies such as Docker and Docker Compose.Experience with one or more databases such as MongoDB, PostgreSQL, Redis, time-series databases.Familiarity with testing frameworks such as pytest and integration testing, performance testing tools.Experience with CI/CD tools such as GitHub Actions/Runners and Infrastructure as Code tools such as Ansible.Experience with hardware integration or embedded systems; interfacing with BMCs, FPGAs, temperature sensors, thermal management, power management systems.Nice-to-have skillsFamiliarity with real-time data handling and communication protocols, such as gRPC, TCP/IP, WebSockets, message brokers or similar technologies.Experience with high-availability, mission-critical systems.Experience in the Semiconductor Industry: HTOL, wafer-level testing, burn-in systems, reliability testing.Professional Certifications: Agile/Scrum certifications.Benefits & CompensationWe offer competitive compensation. The base salary range for this role is determined based on location, experience, educational background, and market data.Salary Range: $160,000 - $200,000 USDRetirement Savings Matching ProgramLife Insurance (Basic, Voluntary & AD&D)Generous Time Off (Vacation, Sick & Public Holidays)Paid Family LeaveShort Term & Long Term DisabilityTraining & DevelopmentFlexible, hybrid workplace modelEquity grants (applicable to full-time employees)Export ControlCandidates should have capacity to comply with the federally mandated requirements of U.S. export control laws.#J-18808-Ljbffr