JOBSEARCHER

Machine Learning Engineer

Alis SoftwareNy, WALMay 24th, 2026
Hi, Good day!If you are intrested with the below job role then please reply with updated resume and Contact details.Role: ML Engineer Hybrid: 3 Days onsite – NYC, NYType: Contract Duration: Long Term Candidates whose primary background is MLOps platform work (DAG orchestration, Terraform, Kubernetes administration, generic CI/CD pipelines) will not be a fit. We need a senior level engineer who can profile a transformer, rewrite its serving path for a 2–3x latency reduction, tune an HNSW index, and tell us which SageMaker instance type will hit our p95 target at the lowest cost.Roles & ResponsibilitiesDesign, build, and scale ML-powered inference systems that process large volumes of text, image, and video data to power news-based intelligence products.Productionize and optimize state-of-the-art models and inference pipelines. These models include, but are not limited to:DistilBERT for Named Entity Recognition (NER) over hundreds of thousands of search queries/dayTransNetV2 for video shot boundary detection at scale for archival video as well as real-timeSBERT for embedding generation from textual descriptionsExternal multimodal APIs for image/video captioningSupport hybrid search architectures by defining embedding/re-ranking interfaces, evaluation metrics, and inference performance requirements; partner with search/platform engineers on index configuration, sharding, and cluster tuning.Design and implement scalable data processing pipelines across hybrid CPU/GPU environments to handle millions of media assets.Partner with MLOps and platform engineering to enable the deployment and operation of ML systems reliably, contributing to:Distributed inference architecturesCloud-based execution (e.g., AWS EC2, Batch, Lambda, SageMaker)Efficient resource utilization across workloadsOptimize inference latency and throughput across distributed workloads using cloud-based resources (AWS EC2, Batch, Lambda, SageMaker, etc.)Build resilient asynchronous processing systems for large-scale workloads, ensuring: Reliability (retries, fault tolerance) Efficiency (caching, deduplication) Observability (metrics, logging, traceability) Work closely with data scientists and product teams to iterate on models, improve performance, and deliver measurable impact in production.Requirements:8+ years of experience building ML inference systems.Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.Experience with TensorFlow (SavedModel, tf.data, XLA, TFLite) & PyTorch (TorchScript, ONNX, FastAPI/TorchServe)Hands-on experience optimizing inference pipelines on AWS infrastructure, across different types of media assets. Experience with video frameworks/tools (e.g., FFmpeg) and working with large-scale frame-level inference.Demonstrated experience monitoring and debugging model latency, memory, and pipeline throughput.Experience with hybrid search architectures (BM25 + vector search + cross-encoder reranking).Familiarity with OpenAI APIs or other foundation model providers.Familiarity with open source HuggingFace LLMs.Experience with data pipeline and workflow orchestration tools (e.g., Airflow)