
Job Level
Mid-LevelSenior
Tech Stack
AzureCloudPython
About the role
- Design, develop, and maintain Python-based record linkage applications using Splink and DuckDB.
- Implement and customize probabilistic matching algorithms (Fellegi-Sunter, thresholds, similarity measures).
- Conduct data wrangling and transformation to standardize and link large datasets.
- Collaborate with cross-functional teams including developers, database engineers, cloud architects, and quality engineers to ensure seamless integration .
- Optimize and troubleshoot performance issues for projects ranging from 100K to 200M+ records.
- Participate in code reviews and enforce best practices to maintain high code quality and sustainability.
- Develop training materials, technical documentation, and user guides to support client teams.
- Support compliance with accessibility and security standards for all deliverables.
- Continuously explore emerging technologies to improve efficiency, scalability, and accuracy of record linkage workflows.
Requirements
- Must be a U.S. citizen and able to obtain and maintain a T2 Public Trust clearance.
- Bachelor’s degree in computer science, Data Science, or related technical field (master’s preferred).
- Minimum 5 years of professional experience in software development with demonstrated expertise in Python and data engineering.
- Strong knowledge of: Splink or equivalent probabilistic record linkage frameworks.
- Data wrangling, cleaning, and transformation workflows.
- DuckDB or other modern database management systems.
- Statistical algorithms for entity resolution and deduplication (Fellegi-Sunter model).
- Cloud environments (Microsoft Azure preferred).
- Familiarity with federal security standards, including identity verification, background investigations, and handling of sensitive data (PII/SBU).
- Excellent problem-solving, analytical, documentation, and communication skills.
- Preferred certifications: Python-related or cloud certifications (e.g., Microsoft Azure Data Engineer).