Data Engineering Intern
Occupations:
Data ScientistsData Warehousing SpecialistsSoftware DevelopersSocial Science Research AssistantsComputer ProgrammersIndustries:
Architectural, Engineering, and Related ServicesWeb Search Portals, Libraries, Archives, and Other Information ServicesEducational Support ServicesOther Heavy and Civil Engineering ConstructionManagement, Scientific, and Technical Consulting ServicesData Pipeline Intern (Summer 2026) Location: West Hollywood, CA (Onsite) Compensation: $38,000 to $43,000 (pro-rated) + $500 completion bonus tied to deliverables Duration: May to August 2026 Employment Type: Internship Work Authorization: US Citizen, Green Card, or OPT Student VisaAbout the OpportunityVocator is conducting a retained search on behalf of a high growth AI focused company building infrastructure that powers next generation machine learning systems.Our client is an early stage, well capitalized organization focused on transforming real world operational data into structured, high quality training assets used by leading AI labs. They operate with a small, high ownership team where individuals contribute directly to production systems and outcomes.About the RoleThis is a hands-on technical internship embedded within a live data pipeline. From day one, you will work with real datasets moving through ingestion, processing, and delivery.This is not a passive or research-only role. You will contribute directly to systems that clean, structure, evaluate, and prepare data for downstream AI use.The ideal candidate is technically strong, detail oriented, and comfortable working with messy real world data.What You Will OwnData PipelineIngest, clean, and structure raw operational datasets across multiple formats such as messaging exports, project management data, financial records, and code repositoriesSupport development and testing of data quality scoring modelsPerform exploratory analysis to identify patterns, anomalies, and gaps in incoming dataPrivacy and ProcessingApply PII detection and removal techniques to ensure data meets strict privacy standardsAssist in building and testing anonymization workflowsDocument data lineage and processing steps across the pipelineResearch and ToolingResearch techniques related to data quality, dataset curation, and model readinessBuild lightweight scripts and tools to improve pipeline efficiencySupport technical evaluation of incoming datasets alongside engineering leadershipWhat We Are Looking ForCurrently enrolled in an undergraduate or graduate program in computer science, data science, statistics, or a related fieldStrong proficiency in PythonExperience with Pandas, NumPy, and at least one data visualization libraryExposure to machine learning, NLP, or data engineering conceptsAbility to write clean, well documented, and reproducible codeStrong problem solving skills and attention to detailComfortable working with ambiguous or unstructured dataWhat You Will GainHands-on experience building and working within a production data pipelineExposure to how real world data is prepared for AI training systemsPractical understanding of data quality, privacy, and processing at scaleOpportunity to contribute to systems used by leading AI organizationsWhy This RoleWork on real systems, not side projects or theoretical workHigh ownership and direct impact from day oneSmall team environment with strong technical exposureOpportunity to convert into future roles based on performance