Bilingual Mandarin Observability Engineering (Platform Development)
Bilingual Mandarin Observability Engineering (Platform Development)ResponsibilitiesObservability System DevelopmentDevelop and enhance the observability platform, focusing on the four key pillars: Metrics, Logging, Tracing, and Profiling, and build full-stack observability capabilities.Platform & Architecture DesignDesign and develop observability-related platforms and systems, including monitoring platforms, distributed tracing systems, logging services, compute engines (stream processing, real-time alerting, time-series analysis), alerting systems, and eBPF-based solutions.Performance & ReliabilityEnsure high performance and high availability of observability infrastructure under high-concurrency environments. Continuously optimize and iterate on technology and products to support both domestic and global observability architecture, data compliance, and infrastructure stability.AI + Observability IntegrationImplement observability solutions for AI infrastructure and AI applications, and explore AI-powered observability (AI + Observability). Improve system stability in AI scenarios and enhance the usability and efficiency of traditional observability tools.QualificationsEducation & ExperienceBachelor's degree or above in Computer Science or a related field, with 3+ years of relevant experience.Programming SkillsProficient in Java or Go, with strong knowledge of concurrency, distributed systems, and performance optimization.Observability Stack ExpertiseFamiliar with cloud-native observability tools and ecosystems, including but not limited to: OpenTelemetry, CAT, SkyWalking, Prometheus, VictoriaMetrics, ELK, ClickHouse, eBPF, and have a solid understanding of Kubernetes.Core Infrastructure KnowledgeFamiliar with foundational technologies such as Linux, networking, storage, message queues (MQ), with a deep understanding of underlying principles preferred.Preferred QualificationsExperience with AI-related technologies, including but not limited to: PyTorch, Spring AI, Langfuse, OpenClaw, etc.Problem-solving & CollaborationStrong problem identification and resolution skills, with the ability to summarize, analyze, and collaborate across teams.Learning & OwnershipCurious and proactive in learning new technologies, with strong ownership and ability to work under pressure.Language SkillsBusiness level fluency in both English and Mandarin, with the ability to communicate effectively in an international team environment.