JOBSEARCHER

Data Scientist - StefanBrain

Build the self-learning loop that makes StefanBrain smarter every week.Role SummaryStefanBrain is an AI-powered marketing platform used by 80+ DTC businesses spending over $100M/month combined on digital advertising. We are building the data backbone that powers automated performance analysis, creative intelligence, and — increasingly — autonomous ad optimization for those brands.In the Data Scientist role, you’ll work directly with our Technical Lead on the agentic architecture, self-learning systems, eval and benchmarking infra, and the applied LLM / ML research that turns raw data and chats into actual intelligence inside the product. This is a research-leaning IC role with a strong applied bias. You’ll partner closely with the Technical Lead day-to-day; expect to run experiments in parallel rather than wait for tickets.AI-native by default. Claude Code, Cursor, and Codex are part of the daily stack. We expect 5–10x throughput from aggressive use of AI tooling.Role at a Glance$100K – $220KRemote (global)  ·  Overlap with the 7:00–11:00 PM CET (1:00–5:00 PM EST) windowJOB TYPEFull-timeENGAGEMENTIndependent contract to startEXPERIENCE3–7 yearsREPORTS TOTechnical LeadSKILLSPython  ·  LLMs & prompt engineering  ·  Agentic systems  ·  RAG & retrieval  ·  Embeddings  ·  Evals / LLM-as-judge  ·  Fine-tuning  ·  Multimodal (image / video)  ·  Data curationWhy You Should ApplyYour work is actually on the frontier - agentic architecture, self-learning loops, multimodal generation, LLM observability at production scale, fine-tuning where it earns its keep - and it's load-bearing inside a product that 80+ DTC brands already use to spend $100M+ a month. You'll work directly with the Technical Lead on research that ships, not research that dies in a deck. You're not getting handed tickets - you're running experiments in parallel and pushing what works into production. The parent company is profitable, the product has PMF, and the team is small enough that your work has nowhere to hide. If you've been doing applied LLM work inside an org that still treats it as a side experiment, the gap between what you could be building and what you're allowed to build closes the day you join.What You’ll DoAgentic systems, memory, and context.  Impact: this is what makes StefanBrain feel like a senior marketing operator who knows the user and stays sharp over long sessions.  Build the agent architecture with the Technical Lead — multi-agent orchestration, tool use, skills (reusable agent capabilities), the agentic chat loop — plus user / business / conversation memory and context-window strategy (retrieval, selection, summarization, eviction). Make the product feel personal and persistent without blowing the context budget.Internal and external knowledge learning and distillation.  Impact: this is the moat — the product compounding every week on knowledge curated from both inside StefanBrain and the wider world.  Use both internal data (chats, agent traces, completions, prompt and skill outcomes, our own content, eval data) and external data (ad libraries, competitor sites, marketing content, transcripts, scraped knowledge) to learn and improve the product. Build the frameworks and environments that preprocess, clean, classify, and structure that data so it’s actually usable — for building new tools and skills, evolving the agentic system, or training and fine-tuning models where it earns its keep. Heavy on data-cleaning and classification discipline — much of the moat lives in the quality of these curated knowledge bases.Evals, benchmarks, and quality.  Impact: this is how we know we’re improving instead of just shipping changes.  Build the eval framework so every model swap, prompt version, skill change, and agent change is measured against real usage — golden datasets, LLM-as-judge, A/B in production. Partner with the Analyst on observability so changes are evaluated against real signal, not vibes.Applied ML and AI research.  Impact: this is how raw data becomes the recommendations, scores, predictions, and content that make the product valuable.  Light fine-tuning where it pays off, embeddings, retrieval / RAG, classification, ranking, and multimodal work (image and video generation) where it earns its keep. Read papers, prototype fast, ship what works into the product. We are not training LLMs from scratch — the work is at the application layer.SECONDARY RESPONSIBILITIESMine production chats, prompts, and traces for research signal — what’s working, what’s failing, where new patterns emerge — and feed it back into the loop. Bring strong opinions on data contracts to the Data Engineer (what’s missing, what needs to be saved, what schema you need) and on observability and KPIs to the Data Analyst. Build AI-assisted research workflows (Claude Code skills, agentic experimentation harnesses, automated eval runners) and treat improving them as part of the job. Stay close to the DTC marketing domain so models are evaluated on what actually matters to users, not on what’s easy to measure.What We’re Looking ForMUST-HAVES3–7 years in data science, applied ML, or AI research with hands-on production work — not just notebooks.End-to-end ML and messy-data work. Taken problems from raw, often unstructured data to models running in production — scraping and curation at scale, cleaning, classification, train / validation / test discipline, backtesting, deployment, monitoring. You know what overfitting looks like and how to avoid kidding yourself with leaked test sets.Strong Python and the modern ML stack — confident across whichever frameworks the problem calls for.Deep applied LLM work. Prompt engineering, building reusable skills and agent capabilities, evals (golden datasets, LLM-as-judge, regression tests for prompt / skill / model changes), embeddings, retrieval / RAG, light fine-tuning where it earns its keep.Agentic systems with memory and context management. Built multi-step agent workflows that shipped, and thought hard about user / business / conversation memory, context-window strategy, retrieval and selection, summarization and eviction.Research mindset, applied bias — you read papers, run experiments, prototype fast, but the deliverable ships.STRONG-TO-HAVESSelf-improvement / autoresearch loop experience.Multimodal experience — image generation and video generation — and the prompt and quality work around them.DTC / direct-response / paid-acquisition background, or strong willingness to ramp on the domain.Hybrid search and advanced retrieval patterns at scale.Feedback-loop ML systems shipped in production.CompensationIndependent contractor engagement. Monthly, paid in USD. Varies by region and experience. You’ll share your preferred comp when applying.