Tech Stack
ApacheCloudKubernetesPythonScalaSparkSQLTypeScript
About the role
- Optimize large-scale data pipelines for ingestion, transformation, and processing.
- Develop robust, reusable code in Python and Spark to support distributed data workflows.
- Manage and tune Spark jobs on cloud-based platforms with Kubernetes orchestration.
- Implement scalable data solutions for storage and retrieval.
- Drive reliability, performance, and cost efficiency across cloud infrastructure.
Requirements
- Strong Python experience
- Experience with automation of job monitoring, optimization, and debugging at scale
- Experience working with any of the major cloud providers
- Excellent communication skills with the ability to work in cross-functional teams
- TS/SCI w/CI Poly preferred
- Apache Spark
- Background in building and maintaining CI/CD pipelines
- Knowledge of Kubernetes and containerization
- Experience building dashboards
- Using notebook-based tools such as Jupyter and Databricks
- Knowledge of Scala, SQL and R
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
PythonApache SparkKubernetesCI/CD pipelinesSQLScalaRdata ingestiondata transformationdata processing
Soft skills
communicationcross-functional teamwork
Certifications
TS/SCI w/CI Poly