Salary
💰 $152,100 - $203,900 per year
Tech Stack
AWSCloudDistributed SystemsDockerEC2FluxGoKubernetesPackerPythonPyTorchSplunkTensorflowTerraformVMware
About the role
- Design, manage and maintain critical infrastructure for both software development and deployed global production resources.
- Collaborate on the provisioning of cloud infrastructure in AWS using Terraform to ensure consistency and scalability.
- Maintain and manage multiple Kubernetes clusters across both cloud and on-premise environments.
- Implement and enforce best practices for secure software development and deployment in alignment with industry standards.
- Monitor, troubleshoot, and optimize build and deployment processes to maximize efficiency and minimize downtime.
- Collaborate with cross-functional teams, including developers and security experts, to ensure systems meet operational requirements.
- Develop, maintain, and enhance CI/CD pipelines using GitLab to support build automation, unit testing, and integration testing.
- Continuously evaluate and implement tools and technologies to improve workflows and platform reliability.
Requirements
- BS Degree in Computer Science
- 5+ years of experience in DevOps, Site Reliability Engineering, or a related field.
- Extensive AWS knowledge: EC2, ECS/EKS, Lambda, ELB, ASGs, Route53, KMS, SSM, IAM, S3, ACM, VPC, RDS, Elasticache.
- Proficiency with modern observability practices: application monitoring, tracing, and profiling tools (e.g. Datadog, New Relic, OpenTelemetry, Splunk).
- Proficiency with GitLab CI, Terraform, Helm, and Packer
- Demonstrated experience designing and managing CI/CD pipelines for complex software platforms.
- In-depth knowledge of Containers and Container Orchestration technologies: Docker, Kubernetes
- Experience with Terraform or other infrastructure as code tooling.
- Strong scripting skills in Python, Bash, or similar languages.
- Familiarity with modern security practices for protecting sensitive assets in distributed systems.
- Exceptional problem-solving skills, with a proactive and collaborative mindset.
- Preferred: Experience working with media and entertainment pipelines or pre-release content workflows.
- Preferred: Proficiency with Golang, Python, or C++
- Preferred: Experience with modern AI/ML frameworks (e.g., TensorFlow, PyTorch, Hugging Face) and their integration into operational workflows.
- Preferred: Knowledge of container security tools and systems, such as Falco or Aqua Security.
- Preferred: Experience with emerging deployment systems like ArgoCD or Flux for GitOps workflows.
- Preferred: Familiarity with serverless computing paradigms and technologies such as AWS Lambda or Google Cloud Run/Functions.
- Preferred: Understanding of high-performance computing systems in cloud environments.
- Preferred: Experience with administering VMWare vSphere clusters.