Trase

Senior Site Reliability Engineer, Security Clearance

Trase

full-time

Posted on:

Location Type: Remote

Location: Remote • Washington • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraform

About the role

  • Design, Build, and Maintain Core Infrastructure: Architect and implement scalable, highly available, and secure infrastructure on cloud platforms (GCP, AWS, Azure) to support our AI-driven applications and services.
  • Automate Everything: Develop and maintain automation tools and frameworks to eliminate manual effort in deployment, configuration, and management of our production environment.
  • Ensure System Reliability and Performance: Establish and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our production systems. Proactively identify and resolve performance bottlenecks and availability issues.
  • Manage ML Infrastructure and Pipelines: Collaborate with ML engineers to build and maintain robust CI/CD pipelines for machine learning models, ensuring seamless training, deployment, and monitoring.
  • Incident Response and Post-Mortems: Lead incident response efforts to minimize downtime and conduct thorough post-incident reviews to identify root causes and implement preventative measures.
  • Implement and Enhance Observability: Deploy and manage comprehensive monitoring, logging, and tracing solutions (e.g., Prometheus, Grafana, ELK stack) to provide deep visibility into system health.
  • Capacity Planning and Cost Optimization: Forecast infrastructure needs and optimize resource utilization to ensure our platform can scale efficiently and cost-effectively.
  • Foster a Culture of Reliability: Champion SRE best practices across the engineering organization and mentor team members on reliability, performance, and scalability.

Requirements

  • Proven SRE and DevOps Experience: Demonstrated experience in a Site Reliability Engineering or DevOps role, managing complex, large-scale production environments.
  • Cloud Infrastructure Expertise: Hands-on experience with one or more major cloud platforms (GCP, AWS, Azure).
  • Proficiency in Infrastructure as Code: Strong skills with IaC tools such as Terraform, Ansible, or CloudFormation.
  • Containerization and Orchestration Mastery: Deep knowledge of Docker and Kubernetes, including experience deploying and managing containerized applications in production.
  • Strong Programming and Scripting Skills: Proficiency in languages such as Python, with a focus on automation and building reliable software.
  • Experience with Monitoring and Observability Tools: Expertise in setting up and using monitoring and logging systems like Prometheus, Grafana, or the ELK stack.
  • CI/CD Pipeline Development: A strong background in building and managing CI/CD pipelines for both software applications and machine learning models.
  • Excellent Problem-Solving and Communication Skills: The ability to troubleshoot complex issues across the stack and clearly communicate technical concepts to both technical and non-technical stakeholders.
  • Educational Background: A Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
  • Active Security Clearance required.
Benefits
  • 100% employer-paid, comprehensive health care including medical, dental, and vision for you and your family.
  • Paid maternity and paternity for 14 weeks at employees' normal pay.
  • Unlimited PTO, with management approval.
  • Opportunities for professional development and continued learning with educational reimbursements.
  • Optional 401K, FSA, and equity incentives available.
  • Mental health benefits through TARA Mind.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
Site Reliability EngineeringDevOpsCloud InfrastructureInfrastructure as CodeTerraformAnsibleCloudFormationDockerKubernetesPython
Soft skills
Problem-SolvingCommunicationMentoringCollaboration
Certifications
Bachelor's degreeMaster's degreeActive Security Clearance
Trase

Senior DevOps Engineer

Trase
Seniorfull-time$170k–$220k / year🇺🇸 United States
Posted: 3 hours agoSource: boards.greenhouse.io
CloudDockerJenkinsPython
Red Cell Partners

Senior Site Reliability Engineer, Security Clearance

Red Cell Partners
Seniorfull-time🇺🇸 United States
Posted: 4 hours agoSource: boards.greenhouse.io
AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraform
Red Cell Partners

Senior DevOps Engineer

Red Cell Partners
Seniorfull-time$170k–$220k / year🇺🇸 United States
Posted: 4 hours agoSource: boards.greenhouse.io
CloudDockerJenkinsPython
Ooma, Inc.

Site Reliability Engineer

Ooma, Inc.
Mid · Seniorfull-time$110k–$175k / year🇺🇸 United States
Posted: 1 day agoSource: boards.greenhouse.io
AnsibleAWSCloudDockerGoogle Cloud PlatformGrafanaJenkinsKubernetesLinuxMongoDBMySQLPerl+5 more