JOBSEARCHER

Machine Learning Systems Intern

Hybrid SSM‑Transformer models have a unique advantage for on‑chip memory efficiency:SSM layerscompress sequence history into a fixed‑size recurrent state Attention layersstore key‑value caches that grow with context lengthThis leads to an important design question: For a given model configuration and maximum context length, can on‑chip SRAM be sized so that inference runs entirely on chip—eliminating the need for slower off‑chip HBM or DRAM?What the intern will work on:The intern will model and analyze memory behavior during inference of hybrid SSM‑Transformer models, with a focus on avoiding off‑chip memory accesses. Key responsibilities include:Modeling data movement betweenSRAM and HBM/DRAMduring inference Sweeping parameters such as: SRAM capacity Context length Model dimensions Mapping thefeasibility boundarywhere inference can be performed fully on chip Breaking downper‑layer memory working sets Identifyingwhen and why memory spills occur Exploringtiling and scheduling strategiesto extend the no‑spill region Validating analytical results throughsimulation