AI Engineer Internship | Data Pipeline (Unpaid)
About TechSnifAt TechSnif, we use AI to transform how people consume tech news. Our platform aggregates stories from dozens of sources, clusters related coverage using vector similarity, and synthesizes original multi-source articles — delivering concise, trustworthy reporting without the noise.Based in San Francisco, California, TechSnif is built on the belief that tech news should be fast, factual, and accessible — to both humans and AI agents. Our platform serves readers through a modern web experience while also exposing structured data via a published MCP server, CLI, and JSON API, making us one of the first agent-native news platforms.We're proud to serve thousands of readers across the tech industry and are building at the intersection of AI-generated content, real-time data pipelines, and modern web infrastructure.At TechSnif, we're on a mission to make tech news smarter for everyone — people and machines alike.What You'll DoWork directly with Claude (Anthropic), OpenAI, and Gemini (Google) APIs to build and improve AI-powered content synthesis, claim verification, and image generation pipelines.Generate and manage vector embeddings using OpenAI's embedding models, powering topic clustering via pgvector and cosine similarity in PostgreSQL.Build and optimize data ingestion pipelines that process RSS feeds from dozens of sources in real time, handling deduplication, attribution tracking, and content extraction.Develop and refine prompt engineering strategies for article synthesis, political content filtering, entity grounding, and social media hook generation.Work with Docker, GCP Cloud Run Jobs, and Cloud Scheduler to deploy and monitor production pipelines that run every 30 minutes.Contribute to the image generation pipeline — subject classification, AI image generation/reshooting, quality review, watermarking, and CDN upload via Cloudflare R2.Participate in the full development lifecycle — from prototyping and testing through production deployment — gaining real-world experience shipping AI systems that serve thousands of readers daily.Receive mentorship and code review from experienced engineers, with opportunities to influence the architecture and direction of the AI pipeline.What You'll Need to SucceedCurrently pursuing or have completed a Bachelor's or Master's in Computer Science, Data Science, Machine Learning, or equivalent experience.Strong foundation in Python or TypeScript/Node.js with comfort reading and writing code that integrates with external APIs.Familiarity or willingness to learn concepts like vector embeddings, semantic similarity, prompt engineering, and LLM output parsing.Previous related experience in software engineering or AI/ML (applicable coursework, personal projects, internships, Kaggle, open-source contributions, etc.).Interest in natural language processing, content generation, or information retrieval systems.Results-oriented, with the ability and interest to learn new technologies and adapt quickly to new requirements and environments.Excellent verbal and written communication skills; can effectively articulate technical decisions and trade-offs.Ability to thrive in a fast-paced, rapidly growing startup environment and possess the capability to work independently with minimal supervision.Perks & BenefitsMentorship Opportunities: Gain direct guidance from our engineering leadership, with hands-on learning and regular code reviews.Cutting-Edge AI Stack: Work hands-on with Claude, GPT, Gemini, pgvector, and production AI pipelines — not tutorials, not toy projects.Ship Real Product: The pipeline you work on synthesizes articles read by thousands of people daily. Your code runs in production from day one.TechSnif is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all team members.