Salary
💰 $184,000 - $229,000 per year
Tech Stack
AnsibleApacheAWSCloudDistributed SystemsDockerElasticSearchGoKafkaKubernetesPythonSparkSplunkTerraform
About the role
- Collaborate with development and quality engineering to build and maintain the continuous integration pipeline from development to production.
- Engage in overall software architecture from design to implementation, monitoring, and testing.
- Drive Corelight SaaS Cloud architecture, working closely with Engineering, Product, and other technical leaders.
- Drive SaaS Operations improvements including Cost, Monitoring, Security, Change Management controls, etc.
- Design, develop and maintain robust and scalable Machine Learning pipelines Infrastructure.
- Implement automation, disaster recovery, and system resilience best practices.
- Work in an Agile development team to design and deliver service features end-to-end from design to production deployment and monitoring.
- Engage in hands-on, in-depth analysis, review, and design of the Cloud Infrastructure, high availability, resilience, and meeting stringent SLO objectives.
- Work closely with offshore teams on various development projects.
Requirements
- 10+ years of Enterprise Distributed System Architecture, Public Cloud Infrastructure, Observability, and Infrastructure as a Code.
- Experience programming skills in Python or Golang.
- Experience with Infrastructure as code (Terraform, Pulumi) and Ansible.
- Hands-on experience with Kubernetes, Kafka, Elastic Search, Docker, and Containers.
- Experience with CI/CD practices, pipelines, monitoring and alerting tools, and automated test suite frameworks such as Gitlab, cloud devops tools, etc.
- Experience with current SRE/DevOps best practices.
- Experience in architecting, building, and scaling platforms and distributed systems that require high availability, resilience, and meeting stringent SLO objectives.
- Knowledgeable in distributed systems and redundancy / high-availability and performance optimizations.
- Experience in designing and implementing infrastructure for machine learning pipelines using Apache Spark or Apache Flink.
- Solid understanding of distributed systems and big data technologies.
- Familiarity with AWS, particularly Lambda, APIGW, MSK, EMR, AppSync, EKS, MLOps.
- Experience in optimizing and troubleshooting complex, managing/deploying large-scale cloud infrastructure (preferred).
- Experience in backup strategies and Disaster Recovery (preferred).
- Knowledge of Network-based Security Detections and Attack techniques desirable (preferred).
- Experience with Search and Analytics tools like Splunk, Elasticsearch (preferred).
- Experience working in a distributed team (preferred).
- Good to have compliance requirements of FedRAMP, GDPR, SOC2, etc. (preferred).
- Familiar with security and risk mitigation (authentication, encryption, anomaly detection) for a cloud-based environment.