Salary
💰 $84,407 - $126,611 per year
Tech Stack
AnsibleAWSAzureCassandraCloudDockerGoGoogle Cloud PlatformGrafanaHadoopHDFSJavaKafkaKubernetesMySQLNoSQLPostgresPrometheusPythonScalaSparkTerraform
About the role
- Ensure the reliability, scalability, and performance of FreeWheel systems and data platforms.
- Manage infrastructure, optimize system reliability, automate daily operations, and resolve technical issues impacting upstream/downstream platforms.
- Design and implement monitoring and alerting systems; join on-call shifts to respond to and resolve issues.
- Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery.
- Analyze and optimize performance of data storage, query performance, and data flows to reduce latency and improve processing speed.
- Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to ensure high availability.
- Work with engineering teams on capacity planning and scaling to handle traffic growth.
- Support Freewheel powered Live events.
- Maintain cloud access management & governance and enforce compliance practices across cloud environment.
- Document architecture, configurations, and operational procedures; provide training and knowledge sharing.
- Ensure platforms meet security standards and compliance requirements.
- Collaborate with engineering, product, and project management teams to support product design and implementation.
Requirements
- 1-3 years of experience as an SRE, DevOps or Operations Engineer.
- Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure).
- Hands-on experience with Terraform and infrastructure as code (IaC) principle.
- Proficiency in automation tools and frameworks (e.g. Ansible, Terraform , Kubernetes , Docker) for automating system deployment and maintenance.
- Familiarity with modern data architectures and technologies, including big data platforms (e.g., Kafka, Hadoop, Spark), distributed storage (e.g., Cassandra, HDFS, AWS S3), etc.
- Extensive experience in data base management (e.g. NoSQL databases, MySQL, PostgreSQL).
- Programming Skills: Proficient in at least one programming language, such as Python, Go , Java, or Scala, with the ability to write efficient scripts and automation tools.
- System Monitoring and Log Management: Familiar with using monitoring and log management tools such as Prometheus, Grafana, ELK Stack, or other similar tools.
- Troubleshooting and Debugging: Strong debugging and troubleshooting skills, with the ability to quickly identify and resolve production issues.
- Team Collaboration and Communication: Excellent communication skills with the ability to convey technical information clearly and concisely to both technical and non-technical stakeholders.
- Proactive learner eager to grow in operations and governance.
- Education: Bachelor’s degree or higher in Computer Science, Software Engineering, or a related field.
- Relevant Work Experience: 2-5 Years (posting contains 1-3 years and 2-5 years references).