Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
NOV

Site Reliability Engineer

NOV

Site Reliability Engineer responsible for monitoring production systems and leading incident responses. Join a high-impact team to optimize system performance and scalability in the oil and gas industry.

Posted 6/23/2026full-timeHouston • Texas • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
AkkaAWSAzureCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetes.NETPostgresPrometheusPython

About the role

Key responsibilities & impact
  • Maintain and monitor production systems for availability, latency, and performance.
  • Lead incident response efforts, including communication, resolution, and postmortem documentation.
  • Design and implement health checks, alerting systems, and automated remediation workflows.
  • Drive root cause analysis and implement permanent resolutions for recurring issues.
  • Set up and maintain full observability stacks (logging, metrics, tracing) using tools like Prometheus, Grafana, Datadog, OpenTelemetry, or ELK.
  • Analyze telemetry and logs to identify trends, anomalies, and opportunities for improvement.
  • Conduct post-incident reviews and use insights to inform future engineering investments.
  • Tune and optimize distributed systems, including AKKA.NET actors, for performance and resource efficiency.
  • Work with developers to evolve architecture and improve system throughput, latency, and stability.
  • Optimize PostgreSQL performance, queries, and maintenance strategies.
  • Design and maintain modern CI/CD pipelines using GitHub Actions, Azure Pipelines, or GitLab CI.
  • Automate deployment, testing, and rollback processes to reduce friction and increase deployment frequency.
  • Standardize infrastructure as code practices across environments.

Requirements

What you’ll need
  • 5+ years of experience in SRE, DevOps, or Infrastructure Engineering roles.
  • Expertise in Kubernetes and container orchestration at scale.
  • Strong experience with AKKA.NET or similar actor-based frameworks.
  • Proficiency with scripting and automation (Bash, PowerShell, Python).
  • Experience with observability tools (Phobos,Datadog, Prometheus, Grafana, OpenTelemetry, ELK).
  • Hands-on experience with cloud platforms (AWS, Azure, or GCP).
  • Strong PostgreSQL knowledge—performance tuning, query optimization, maintenance.
  • Proven ability to lead incident management and drive postmortem processes.
  • A builder’s mindset with high standards for operational excellence and technical ownership.

Benefits

Comp & perks
  • Health insurance
  • Retirement plans
  • Paid time off
  • Flexible work arrangements
  • Professional development

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SREDevOpsInfrastructure EngineeringKubernetesAKKA.NETBashPowerShellPythonPostgreSQLCI/CD
Soft Skills
incident managementcommunicationleadershipoperational excellencetechnical ownership