AI ML Engineer, Production ML Systems
ML Engineer, Production ML SystemsAbout the RoleWe are building a verifiable execution layer for autonomous agents in crypto. The system verifies that an agent’s proposed transaction matches user intent before funds move onchain.Today, the architecture combines deterministic parsing with LLM-based semantic verification. Over time, the platform is evolving toward a dynamic arbiter trained on transaction-intent data using synthetic perturbations and reinforcement-learning-style feedback.This role is focused on turning ML research and model-training work into reliable production systems.What You’ll BuildYou will own the production path from model idea to deployed system.That includes synthetic data generation, training pipelines, evaluation harnesses, inference services, monitoring, regression detection, and feedback loops that improve system quality without sacrificing reliability.This is neither a research-only role nor a backend-only integration role. The ideal candidate can operate in an open-ended problem space and turn research prototypes into production-grade ML systems.What You’ll DoProductionalize ML systems end to end: training jobs, evaluations, inference services, deployment workflows, monitoring, and iteration loops.Build and maintain model training and fine-tuning pipelines for arbiter and transaction-intent models.Fine-tune and evaluate LLMs or smaller task-specific models using approaches such as SFT, DPO, GRPO, or related post-training methods.Design robust evaluation systems for transaction-intent alignment, semantic correctness, regression detection, and model failure analysis.Integrate trained models into production backend systems through APIs, inference pipelines, caching, fallbacks, observability, and reliability tooling.Collaborate with research teams to turn papers, prototypes, and experimental ideas into scoped engineering plans and shipped systems.Read ML and RL research papers critically, identify implementable ideas, and communicate tradeoffs clearly to engineers and researchers.Help build synthetic data and labeling infrastructure, including transaction data, inferred intents, perturbations, candidate generations, and reward signals.Work cross-functionally on systems interacting with DeFi protocols such as Uniswap, CoW Protocol, Compound, Polymarket, and Hyperliquid.Required ExperienceProven ability to ship ML models into real products — not just notebooks, demos, or offline experiments.Reinforcement learning experience, especially applied RL, preference optimization, DPO, GRPO, reward modeling, or policy optimization.Experience building synthetic data generation, labeling, ranking, or reward-scoring pipelines.Familiarity with LLM post-training workflows: SFT, preference datasets, rejection sampling, reward-based filtering, or multi-candidate evaluation.Strong Python engineering skills, including clean code, testing, debugging, code review, and maintainable system design.Hands-on experience training neural networks, including debugging training runs, interpreting loss curves, tuning hyperparameters, and managing datasets.Experience with LLM training, fine-tuning, post-training, or smaller specialized language/semantic models.Strong understanding of model evaluation, including offline evaluations, regression suites, error analysis, and production monitoring.Experience deploying ML-backed services: APIs, inference workers, queues, observability, versioning, rollback paths, and iteration workflows.Ability to translate research into engineering plans, implementation milestones, and production constraints.Strong technical writing skills: RFCs, design docs, model cards, evaluation reports, and architecture notes.Meaningful PST/EST timezone overlap.Nice to HaveCrypto or DeFi familiarity, especially protocol-level understanding of swaps, lending, prediction markets, perpetuals, transaction calldata, or smart-contract execution.Experience with production agent engineering: tool-use planning and reasoning, orchestration/state management, guardrails, evaluation harnesses, and reliability/observability.Experience with graph reasoning, agent evaluation, or self-improving agents.TypeScript experience.