FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Site Reliability Engineer
NOVSite Reliability Engineer responsible for monitoring production systems and leading incident responses. Join a high-impact team to optimize system performance and scalability in the oil and gas industry.
Tech Stack
Tools & technologiesAkkaAWSAzureCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetes.NETPostgresPrometheusPython
About the role
Key responsibilities & impact- Maintain and monitor production systems for availability, latency, and performance.
- Lead incident response efforts, including communication, resolution, and postmortem documentation.
- Design and implement health checks, alerting systems, and automated remediation workflows.
- Drive root cause analysis and implement permanent resolutions for recurring issues.
- Set up and maintain full observability stacks (logging, metrics, tracing) using tools like Prometheus, Grafana, Datadog, OpenTelemetry, or ELK.
- Analyze telemetry and logs to identify trends, anomalies, and opportunities for improvement.
- Conduct post-incident reviews and use insights to inform future engineering investments.
- Tune and optimize distributed systems, including AKKA.NET actors, for performance and resource efficiency.
- Work with developers to evolve architecture and improve system throughput, latency, and stability.
- Optimize PostgreSQL performance, queries, and maintenance strategies.
- Design and maintain modern CI/CD pipelines using GitHub Actions, Azure Pipelines, or GitLab CI.
- Automate deployment, testing, and rollback processes to reduce friction and increase deployment frequency.
- Standardize infrastructure as code practices across environments.
Requirements
What you’ll need- 5+ years of experience in SRE, DevOps, or Infrastructure Engineering roles.
- Expertise in Kubernetes and container orchestration at scale.
- Strong experience with AKKA.NET or similar actor-based frameworks.
- Proficiency with scripting and automation (Bash, PowerShell, Python).
- Experience with observability tools (Phobos,Datadog, Prometheus, Grafana, OpenTelemetry, ELK).
- Hands-on experience with cloud platforms (AWS, Azure, or GCP).
- Strong PostgreSQL knowledge—performance tuning, query optimization, maintenance.
- Proven ability to lead incident management and drive postmortem processes.
- A builder’s mindset with high standards for operational excellence and technical ownership.
Benefits
Comp & perks- Health insurance
- Retirement plans
- Paid time off
- Flexible work arrangements
- Professional development
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SREDevOpsInfrastructure EngineeringKubernetesAKKA.NETBashPowerShellPythonPostgreSQLCI/CD
Soft Skills
incident managementcommunicationleadershipoperational excellencetechnical ownership