JOBSEARCHER

Data Acquisition Engineer

Data Acquisition Engineer, a full-time remote position focused on developing systems for large-scale web crawling and data acquisition to support the training of frontier models for software development. Key Responsibilities Design and operate a large-scale web crawler for acquiring publicly accessible data Develop specialized crawlers targeting high-value sources to enhance data recall Collaborate with teams to align data acquisition with model training needs and build ingestion pipelines Required Qualifications Strong background in distributed systems and experience with large-scale data pipelines Proficiency in Python and experience with web crawling or large-scale data extraction Familiarity with cloud platforms (AWS) and container orchestration (Kubernetes, Docker) Understanding of data privacy and responsible crawling practices Experience in building pre-training datasets for large language models is a plus

matching similar jobs near Denver, CO

VIEW MORE