Spark Developer (Search Integration)
ARCHIVED
We can't find an active application page for this role right now. It may reopen or be listed elsewhere. Use Next Steps to search for an active apply link and similar live jobs.
Spark Developer (Search Integration)Location: Pleasanton, CAWe are looking for a Spark Developer with OpenSearch/Algolia expertise who can design, build, and optimize scalable data pipelines to ingest, transform, and index large-scale datasets into search engines for fast retrieval. Utilize Spark SQL to process data from various sources (S3, Kafka) for real-time indexing in OpenSearch or Algolia.Focus: Spark (ETL/Streaming) + Search Engines (OpenSearch/Algolia)Objective: Power real-time, relevant, and fast search experiences. Key Responsibilities:Data Pipelines: Design, develop, and maintain high-performance Spark jobs (PySpark) to process, transform, and clean large datasets.Index Management: Ingest data into OpenSearch or Algolia, optimizing index strategy, mapping, and document structuring for maximum search efficiency.Optimization: Tune Spark applications (data partitioning, caching, shuffle tuning) and search engines (query performance, indexing speed).Streaming/Batch: Implement both batch ETL jobs and real-time streaming solutions (Spark Streaming/Kafka) to keep search indexes updated.Collaboration: Work with backend teams to integrate search functionality into applications and debug search relevance issues.Required Skills and Qualifications:Core Spark: Strong experience with Apache Spark RDD/DataFrame APIs, PySpark.Search Tech: Experience in indexing, querying, and managing clusters in OpenSearch (formerly Elasticsearch) or Algolia.