Salary
💰 $107,120 - $160,680 per year
Tech Stack
ApacheAWSCloudDockerHadoopHDFSJavaKafkaKubernetesMapReducePySparkPythonScalaSpark
About the role
- Integral team member of Data Engineering team responsible for design and development of Big Data solutions
- Partner with domain experts, product managers, analysts, and data scientists to develop Big Data pipelines in Hadoop or Snowflake
- Responsible for delivering data as a service framework and moving legacy workloads to cloud platform
- Discuss requirements with stakeholders, document dataflow and capture requirements
- Recommend access right solutions and drive best solutions into implementation
- Work with data scientists to build client pipelines using heterogeneous sources and provide engineering services for data science applications
- Research and assess open-source technologies and integrate into design and implementation
- Serve as technical expert and mentor other team members on Big Data and Cloud tech stacks
- Define needs around maintainability, testability, performance, security, quality and usability for data platform
- Drive implementation of consistent patterns, reusable components, and coding standards for data engineering processes
- Convert SAS-based pipelines into languages like PySpark or Scala for Hadoop/non-Hadoop ecosystems
- Tune Big Data applications on Hadoop and non-Hadoop platforms for optimal performance
- Evaluate IT developments and recommend systems alternatives or enhancements
- Supervise day-to-day staff management: resource management, work allocation, mentoring/coaching
- Assess risk in business decisions and drive compliance with laws, rules, regulations and policy
Requirements
- 5+ years of experience in hadoop/big data technologies
- 3+ years of experience in PySpark
- 2+ years experience in Snowflake (preferable)
- 2+ years of experience working on Google or AWS cloud developing data solutions
- Certifications preferred
- Hands-on experience with Python/PySpark/Scala and basic libraries for machine learning
- Experience with containerization and related technologies (e.g. Docker, Kubernetes) is a plus
- 1 year Hadoop administration experience preferred
- 1+ year of SAS experience preferred
- Comprehensive knowledge of the principles of software engineering and data analytics
- Advanced knowledge of the Hadoop ecosystem and Big Data technologies
- Hands-on experience with HDFS, MapReduce, Hive, Pig, Impala, Spark, Kafka, Kudu, Solr
- Knowledge of agile (scrum) development methodology is a plus
- Strong development/automation skills
- Proficient in programming in Java or Python; prior Apache Beam/Spark experience a plus
- System level understanding: data structures, algorithms, distributed storage & compute
- Bachelor’s degree/University degree or equivalent experience
- In addition to salary, Citi’s offerings may also include, for eligible employees, discretionary and formulaic incentive and retention awards.
- medical, dental & vision coverage
- 401(k)
- life, accident, and disability insurance
- wellness programs
- paid time off packages (planned time off - vacation; unplanned time off - sick leave; paid holidays)
- Competitive employee benefits information available at citibenefits.com
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
HadoopPySparkSnowflakePythonScalaHDFSMapReduceHiveSparkKafka
Soft skills
mentoringstakeholder communicationresource managementwork allocationrisk assessmentcomplianceteam collaborationproblem-solvingdocumentationbest practices implementation
Certifications
Big Data certificationsCloud certifications