FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Staff SRE
Lytx, Inc.Staff SRE responsible for maintaining availability and reliability for Lytx services across cloud and on-prem infrastructure. Mentoring team and enhancing incident management processes.
Posted 5/13/2026full-timeRemote • Virginia • 🇺🇸 United StatesLead💰 $183,500 - $232,500 per yearWebsite
Tech Stack
Tools & technologiesAWSCloudDNSEC2GrafanaGroovyJenkinsKubernetesLinuxNGINXNoSQLPrometheusPythonSMTPSQLTCP/IPTerraformVault
About the role
Key responsibilities & impact- Build tools and frameworks to monitor systems and ensure highest level of uptime on production environments
- Mentor the SRE team on best practices
- Develop culture of innovation
- Take lead in enhancing our 24/7 on call and incident management process
- Build and maintain Run-books
- Contribute to design and documentation of the cloud services and SOPs
- Influence service design by working closely with Architects, DBAs, Developers, DevOps, Data engineers to bake reliability, scalability and cost optimizations early in the development process
- Lead blameless post-mortems
- Take ownership of publishing RCA documents for internal and external consumption
- Lead initiatives with Service Owners to define the SLOs and build SLIs to ensure systems are meeting the SLAs
- Research and evaluate new cloud technologies and vendor offerings to enhance product stability and manageability
- Reduce Operational Toil and maintain high degree of automation by adapting IaC first and Gitops principals
- Acquire and maintain significant understanding of Lytx production services to ensure timely resolution of production incidents
Requirements
What you’ll need- 8+ years of experience as a SRE in an AWS environment at medium to large scale organization
- 6+ years of hands-on experience implementing and managing Observability tools (Prometheus, New Relic, Grafana, etc.)
- High degree of proficiency in programming, preferably using Python, Groovy and Bash
- Hands-on experience managing database technologies (SQL and NoSQL)
- 5+ years of experience building Infrastructure deployment pipelines using Git, Terraform, Helm, Jenkins/JenkinX/ArgoCD etc.
- Proficient in designing production environments in AWS cloud using various AWS services (VPCs, EKS, IAM, AMI, EC2, CloudWatch, CloudTrail’s, Control Tower, Guard duty, MSK, S3, Glacier, Gateways, Direct Connects, Route53, RDS, ALBs, Autoscaling etc)
- Extensive experience with Linux systems and various protocols and technologies (HTTP, REST, TCP/IP, SSL, DNS, SMTP, SSH, NTP, Load Balancing, SQL/NoSQL, Message Brokers, Nginx, Vault, ELK etc)
- Hands-on experience with Kubernetes and various container and cloud native technologies
- Significant experience in participating, implementing, and managing 24/7 on call rotation for SRE team, creating run books, building support procedures and proactively monitor systems across geographical locations
- Ability to work well under pressure within a technically challenging environment
Benefits
Comp & perks- Medical, dental and vision insurance
- Health Savings Account
- Flexible Spending Accounts
- Telehealth
- 401(k) and 401(k) match
- Life and AD&D insurance
- Short-Term and Long-Term Disability
- FTO or PTO
- Employee Well-Being program
- 11 paid holidays plus 1 inclusive holiday per year
- Volunteer Time Off
- Employee Referral program
- Education Reimbursement Program
- Employee Recognition and Appreciation program
- Additional perk and voluntary benefit programs
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonGroovyBashSQLNoSQLGitTerraformHelmJenkinsKubernetes
Soft Skills
mentoringleadershipinnovationownershipcollaborationcommunicationproblem-solvingpressure management