Software Engineer, Data Processing & Privacy - 26-00382
Additional Notes: Data Privacy and legal environments, working in Python & with Claude, handling/processing PII; Soft skills: attention to detail, reliable, good with reviews/audits. About the roleClient is seeking a detail-oriented Software Engineer on a contract basis to build and run data processing pipelines for datasets used in our research. You'll take raw, heterogeneous inputs — text, code, documents, structured exports — and turn them into clean, well-structured, privacy-safe outputs ready for downstream use.The work spans ingestion, format normalization, data quality, privacy handling (including PII de-identification), and the supporting tooling that makes the pipeline reliable and self-serve. You'll iterate closely with internal teams on QA findings and harden the pipeline so each new dataset is cheaper than the last.ResponsibilitiesBuild and extend per-source processing for new data types as they arriveIngest and normalize raw exports across many formats into consistent, well-structured outputsHandle privacy requirements — for example, PII detection and de-identification — to meet our internal compliance barRun data quality QA: automated checks plus LLM-assisted review to flag gaps, malformed inputs, and incompletenessIterate on internal feedback: root-cause issues, fix, re-run, re-deliverBuild supporting tools: auditing, data exploration, monitoring, simple search over processed dataLand cleaned data with the right storage layout and access controlsDocument and harden the pipeline so each new dataset is cheaper than the lastYou may be a good fit if youHave 4+ years of software engineering experience, with substantial time on data pipelines Are a proficient user of Claude / Claude Code for day-to-day engineering and know when to verify its output Are genuinely detail-orientedHave high integrity and take handling real people's personal data seriouslyAre comfortable with sustained, careful data work and find satisfaction in getting it rightCan work independently, ship reliably, and communicate clearly about progress and edge casesAre proficient in Python and comfortable working across many heterogeneous, semi-structured formats (JSON, NDJSON, code, HTML/XML dumps, archives)Strong candidates may also have experience withPII detection and anonymization techniques Working with large, messy, semi-structured text and code corporaData quality monitoring and validation Cloud storage and access-control patterns (S3/GCS, IAM)Building internal tools or self-serve data platforms for researchersInformation retrieval, search, or RAG systems.