Software Engineer, Data [33019] (New York)
We provide organizations with invaluable foresight, empowering them to anticipate outcomes and proactively make the right decisions at the right time, every time.We're a small, dedicated, mission-driven team and we intend to stay that way. We believe the best work happens when exceptionally talented people are given ownership, trust and the space to operate without bureaucratic friction. We work with urgency and intellectual honesty and expect new team members to match our velocity. We seek individuals who thrive at the frontier, who push beyond conventional limits, who bring curiosity and conviction in equal measure, and who want their work to have demonstrable impact in the world. If you're energized by the idea of a small team doing things that feel impossible, let's build together.ABOUT THE ROLEAs a Data Engineer, you'll build and scale the data acquisition and enrichment infrastructure that makes our simulations accurate. The Data team owns the pipelines that ingest, process, and serve the diverse real-world data sources our simulation engine depends on — from public demographic datasets to proprietary consumer behavior signals. You'll work on turning messy, heterogeneous data into clean, structured inputs that power every simulation we run.RESPONSIBILITIESDesign and build scalable data ingestion pipelines for diverse sources: public datasets (census, labor statistics), licensed proprietary data, and web-scraped sourcesDevelop the data enrichment layer that joins location-level behavioral data with demographic profiles, workplace characteristics, and consumer behavior markersBuild and maintain systems for processing unique data types — foot-traffic patterns, cross-shopping behavior, and trade area demographicsImplement data quality monitoring and validation to ensure incoming data meets accuracy thresholds before it reaches the simulation engineCollaborate with AI/ML and research teams to identify and integrate new data sources that improve simulation fidelityOwn the data infrastructure that serves enriched datasets to the simulation pipeline at the speed and reliability production demandsYOU MAY BE A FIT IFYou've built production data pipelines that ingest and process data from multiple heterogeneous sources at scaleYou have experience working with geospatial data, census datasets, or similar public/proprietary data sourcesYou care deeply about data quality and have built systems to detect, flag, and remediate data issues automaticallyYou're comfortable with the full data lifecycle: acquisition, cleaning, transformation, storage, and servingYou have strong SQL skills and experience with both OLTP and analytical databasesYou can work independently to scope, plan, and execute data infrastructure projectsSTRONG CANDIDATES MAY ALSOHave experience with geospatial processing (reverse geocoding, census block mapping, trade area analysis)Have built ETL/ELT pipelines for alternative data (foot traffic, mobility, transaction data)Have worked with imputation techniques for handling missing or sparse dataHave familiarity with demographic modeling or population statisticsLOCATIONThis role is based in New York City. This is an in-person company and during this exciting period of hypergrowth, we work 6 days a week in office. Candidates are expected to be located within the New York City metropolitan area or open to relocation.BENEFITSWe take care of our people. In addition to a competitive base salary and equity participation, we offer comprehensive medical, vision, and dental coverage, visa sponsorship and relocation support, and various other benefits and perks.