Senior Site Reliability Engineer

Red Cell Partners

full-time

Posted on: 8/14/2025

Location: 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

AnsibleAWSAzureCloudDistributed SystemsDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraform

About the role

About Trase Systems: AI, Uncomplicated; Trase empowers enterprise leaders to harness the full potential of AI without the associated complexity and risks; end-to-end solution for deploying, managing, and optimizing AI in the enterprise.
Trase is at the forefront of AI Agent innovation, topping the Hugging Face GAIA Leaderboard for Generalized AI Assistants, ahead of Google, Meta, Microsoft, and OpenAI. We are leveraging our cutting-edge technologies to develop mission-critical agentic applications in Healthcare, Oil & Gas, and National Security.
About the Role: Build and maintain the resilient, scalable infrastructure that powers cutting-edge AI; ensure reliability and performance of complex, distributed systems; automate, monitor, and optimize the platforms that enable ML innovation.
You will work closely with ML engineers, software engineers, and product teams to build and operate the infrastructure that runs our advanced AI agents and machine learning models.
Responsibilities: Design, Build, and Maintain Core Infrastructure; Automate Everything; Ensure System Reliability and Performance; Manage ML Infrastructure and Pipelines; Incident Response and Post-Mortems; Implement and Enhance Observability; Capacity Planning and Cost Optimization; Foster a Culture of Reliability.
Requirements: Proven SRE and DevOps Experience; Cloud Infrastructure Expertise; Proficiency in Infrastructure as Code; Containerization and Orchestration Mastery; Strong Programming and Scripting Skills; Experience with Monitoring and Observability Tools; CI/CD Pipeline Development; Excellent Problem-Solving and Communication Skills; Educational Background.
Benefits: 100% employer-paid health care including medical, dental, and vision for you and your family; Paid maternity and paternity; Unlimited PTO; Educational reimbursements; Optional 401K, FSA, and equity incentives; Mental health benefits through TARA Mind.
Some travel is required.
We are an Equal Opportunity Employer: You’ll receive consideration for employment without regard to race, sex, color, religion, sexual orientation, gender identity, national origin, protected veteran status, or on the basis of disability.

Requirements

Proven SRE and DevOps Experience: Demonstrated experience in a Site Reliability Engineering or DevOps role, managing complex, large-scale production environments.
Cloud Infrastructure Expertise: Hands-on experience with one or more major cloud platforms (GCP, AWS, Azure).
Proficiency in Infrastructure as Code: Strong skills with IaC tools such as Terraform, Ansible, or CloudFormation.
Containerization and Orchestration Mastery: Deep knowledge of Docker and Kubernetes, including experience deploying and managing containerized applications in production.
Strong Programming and Scripting Skills: Proficiency in languages such as Python, with a focus on automation and building reliable software.
Experience with Monitoring and Observability Tools: Expertise in setting up and using monitoring and logging systems like Prometheus, Grafana, or the ELK stack.
CI/CD Pipeline Development: A strong background in building and managing CI/CD pipelines for both software applications and machine learning models.
Excellent Problem-Solving and Communication Skills: The ability to troubleshoot complex issues across the stack and clearly communicate technical concepts to both technical and non-technical stakeholders.
Educational Background: A Bachelor\'s or Master\'s degree in Computer Science, Software Engineering, or a related field.

Senior Site Reliability Engineer

Job Level

Tech Stack

About the role

Requirements

Similar jobs on JobTailor

Senior Infrastructure – DevOps Engineer

Senior Site Reliability Engineer

Azure DevOps Engineer – FedRAMP Healthcare Modernization

Senior DevOps Engineer

AWS DevOps Engineer