Design, develop, and maintain Python-based record linkage applications using Splink and DuckDB.
Implement and customize probabilistic matching algorithms (Fellegi-Sunter, thresholds, similarity measures).
Conduct data wrangling and transformation to standardize and link large datasets.
Collaborate with cross-functional teams including developers, database engineers, cloud architects, and quality engineers to ensure seamless integration .
Optimize and troubleshoot performance issues for projects ranging from 100K to 200M+ records.
Participate in code reviews and enforce best practices to maintain high code quality and sustainability.
Develop training materials, technical documentation, and user guides to support client teams.
Support compliance with accessibility and security standards for all deliverables.
Continuously explore emerging technologies to improve efficiency, scalability, and accuracy of record linkage workflows.
Requirements
Must be a U.S. citizen and able to obtain and maintain a T2 Public Trust clearance.
Bachelor’s degree in computer science, Data Science, or related technical field (master’s preferred).
Minimum 5 years of professional experience in software development with demonstrated expertise in Python and data engineering.
Strong knowledge of: Splink or equivalent probabilistic record linkage frameworks.
Data wrangling, cleaning, and transformation workflows.
DuckDB or other modern database management systems.
Statistical algorithms for entity resolution and deduplication (Fellegi-Sunter model).
Cloud environments (Microsoft Azure preferred).
Familiarity with federal security standards, including identity verification, background investigations, and handling of sensitive data (PII/SBU).
Excellent problem-solving, analytical, documentation, and communication skills.
Preferred certifications: Python-related or cloud certifications (e.g., Microsoft Azure Data Engineer).