Collect, clean, and aggregate patient-level data from Electronic Health Records (EHRs), lab systems, and external databases.
Apply tokenization techniques to replace direct identifiers while preserving data integrity for linkage and analysis.
Perform or support expert determination de-identification processes to evaluate and minimize re-identification risk in datasets.
Ensure all activities comply with HIPAA, and institutional privacy policies, particularly within HISEC environments.
Manage and maintain secure data pipelines and repositories for sensitive patient information.
Assist in the development of standard operating procedures (SOPs) for secure data aggregation and handling.
Support the generation of anonymized datasets for research, analytics, or external data sharing initiatives.
Monitor data quality, flag inconsistencies, and ensure proper documentation of data provenance and transformation.
Provide support for audits, security assessments, and internal data governance reviews.
Design and implement end-to-end Data aggregation solutions tailored to business requirements using Python and Pyspark.
Synthesizes findings, develops recommendations, and communicates results to clients and internal teams.
Assumes project management responsibilities for Data aggregation implementation on each project with minimal supervision, including managing onshore/client communication, leading meetings, and drafting agendas.
Manage multiple projects simultaneously while meeting deadlines.
Requirements
Engineering/Master’s Degree from a Tier-I/Tier-II institution/university in Computer Science or relevant concentration, with evidence of strong academic performance.
5+years of relevant consulting-industry experience
Deep understanding of data management best practices, data modeling and data analytics
Strong understanding of U.S. pharmaceutical datasets and their applications
Experience with tokenization tools/methodsand expert determination for de-identification.
Logical thinking and problem-solving skills along with an ability to collaborate.
Familiarity with HISEC standards or other high-security frameworks
Strong programming skills in **Python** and working knowledge of **PySpark**.
Proficiency in Microsoft Office products,data analysis tools (e.g., Snowflake, Redshift) and job orchestration (Airflow, Databricks workflows)
Strong understanding of HIPAA, GDPR, and related data privacy regulations.
Excellent verbal and written communication skills, with the ability to convey complex information clearly to diverse audiences
Strong organizational abilities and effective time management skills
Ability to thrive in an international matrix environment and willingness to support US clients during US working hours with 3-4 hours of overlap.
Benefits
Health insurance
Retirement plans
Paid time off
Flexible work arrangements
Professional development
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
data managementdata modelingdata analyticstokenization techniquesde-identification processesPythonPySparkdata aggregationdata quality monitoringproject management