
Data Operations Engineer Intern
Abacus Insights
internship
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Job Level
About the role
- Monitor production data pipelines and systems, identifying failures, latency issues, schema changes, and data quality anomalies.
- Debug pipeline failures by analyzing logs, metrics, SQL outputs, and upstream/downstream dependencies.
- Assist in root cause analysis (RCA) for data incidents and contribute to implementing corrective and preventive solutions.
- Support the maintenance and optimization of ETL/ELT workflows to improve reliability, scalability, and performance.
- Automate recurring data operations tasks using Python, shell scripting, or similar tools to reduce manual intervention.
- Assist with data mapping, transformation, and normalization efforts, including alignment with Master Data Management (MDM) systems.
- Collaborate on the generation and validation of synthetic test datasets for pipeline testing and data quality validation.
- Shadow senior engineers to deploy, monitor, and troubleshoot data workflows on AWS, Databricks, and Kubernetes-based environments.
- Ensure data integrity and consistency across multiple environments (development, staging, production).
- Clearly document bugs, data issues, and operational incidents in Jira and Confluence, including reproduction steps, impact analysis, and resolution details.
- Communicate effectively with cross-functional, onsite, and offshore teams to escalate issues, provide status updates, and track resolutions.
- Participate in Agile ceremonies and follow structured incident and change management processes
Requirements
- Strong interest in data engineering, data operations, and production data systems.
- Currently pursuing or recently completed a Master’s degree in Computer Science, Data Science, Engineering, Statistics, or a related quantitative discipline.
- Solid understanding of ETL/ELT architectures, including ingestion, transformation, validation, orchestration, and error handling.
- Proficiency in SQL, including complex joins, aggregations, window functions, and debugging data discrepancies at scale.
- Working knowledge of Python for data processing, automation, and operational tooling.
- Familiarity with workflow orchestration tools such as Apache Airflow, including DAG design, scheduling, retries, and dependency management.
- Experience or exposure to data integration platforms such as Airbyte, including connector-based ingestion, schema evolution, and sync monitoring.
- Understanding of Master Data Management (MDM) concepts and tools, with exposure to platforms such as Rhapsody, Onyx, or other enterprise MDM solutions.
- Knowledge of data pipeline observability, including log analysis, metrics, alerting, and debugging failed jobs.
- Exposure to cloud platforms (preferably AWS), with familiarity in services such as S3, Lambda, EMR, EKS, or managed data processing services.
- Ability to communicate technical issues clearly and concisely, including writing actionable bug reports and collaborating on incident resolution.
- Strong documentation habits and attention to detail in operational workflows.
Benefits
- Unlimited paid time off – recharge when you need it
- Work from anywhere – flexibility to fit your life
- Comprehensive health coverage – multiple plan options to choose from
- Equity for every employee – share in our success
- Growth-focused environment – your development matters here
- Home office setup allowance – one-time support to get you started
- Monthly cell phone allowance – stay connected with ease
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
ETLELTSQLPythonApache Airflowdata integrationMaster Data Managementdata pipeline observabilitycloud platformsdata processing
Soft Skills
communicationcollaborationdocumentationattention to detailproblem-solvingroot cause analysisstatus updatescross-functional teamworkincident resolutionAgile methodology