Tech Stack
CloudDistributed SystemsDockerKubernetesLinuxPython
About the role
- Deploy and operationalize vendor-provided platforms in our service cloud, starting with proof-of-concept environments to validate dependencies, workflows, and performance.
- Build and maintain distributed infrastructure that supports large-scale log ingestion, data processing, and scenario validation at scale.
- Automate workflows and pipelines using Python, Bash, and Bazel to ensure reproducibility, efficiency, and reliable distributed execution.
- Integrate simulation and drive logs (e.g., parquet, world model data) with validation platforms, ensuring seamless end-to-end coverage analysis.
- Provide visualization and reporting capabilities to surface validation results, coverage metrics, and actionable insights for developers and stakeholders.
- Define and manage access controls, monitoring, and security policies to ensure compliance while enabling smooth collaboration across internal and vendor teams.
- Partner closely with internal teams and external vendors to troubleshoot issues, refine SLAs, and continuously improve operational reliability and scalability.
Requirements
- BS/MS in Computer Science or Engineering (or equivalent experience) or BS/MS in STEM related field
- 5+ years of professional experience in infrastructure, distributed systems, or platform engineering.
- Hands-on experience with Linux systems, Kubernetes/Docker, and CI/CD pipelines.
- Strong scripting/development skills in Python, Bash, and exposure in C++ and/or GoLang.
- Familiarity with Bazel build/test automation frameworks.
- Experience in data/log ingestion workflows and distributed compute/storage systems.
- Strong debugging, problem-solving, and communication skills to work across internal and vendor teams.