Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
OXIO

Site Reliability Engineer

OXIO

Site Reliability Engineer designing and implementing cloud platform for OXIO's Telecom services while maintaining production infrastructure.

Posted 5/27/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
AnsibleAWSAzureCassandraCloudDistributed SystemsDNSDockerElasticSearchFirewallsGoGrafanaJenkinsKafkaKubernetesLinuxNoSQLPerlPrometheusPythonRubySaltStackSplunkSQLTCP/IPTerraformUnixVMware

About the role

Key responsibilities & impact
  • Design and implement platform on the cloud to support OXIO backend services
  • Automate technical operations: deployments, scaling, recovery, etc.
  • Monitor and maintain mission-critical production infrastructure to ensure maximum uptime
  • Participate in an on-call rotation and culture of continuous improvement through blameless postmortems
  • Enable the Engineering/Telecom/Data Engineering teams by providing them the tools to operate the service they build

Requirements

What you’ll need
  • Understanding of Linux/Unix systems (most systems are Linux-based).
  • Familiarity with Linux/Unix system internals like process management, filesystems, memory management, and networking.
  • Proficiency in at least one programming language (Python, Go, or Ruby) and strong skills in scripting (Bash, Perl).
  • Experience with infrastructure provisioning tools such as Terraform, CloudFormation, or Ansible.
  • Familiarity with containerization (Docker) and orchestration tools (Kubernetes).
  • Familiarity with monitoring tools like Prometheus, Grafana, or Datadog.
  • Knowledge of setting up alerts, analyzing logs, and creating dashboards for observability.
  • Familiarity with incident management practices (e.g., runbooks, postmortems).
  • Experience in being part of an on-call rotation and handling incidents.
  • Experience in setting up and maintaining Continuous Integration/Continuous Delivery pipelines (Jenkins, GitLab CI, CircleCI, etc.).
  • Hands-on experience with cloud providers (AWS, Google Cloud, Azure).
  • Knowledge of virtualization technologies (VMware, KVM) and cloud-native architecture.
  • Understanding of TCP/IP, DNS, HTTP/HTTPS, load balancing, and firewalls.
  • Strong understanding of deployment strategies (canary releases, blue-green deployments, etc.).
  • Familiarity with high availability and understanding failover mechanisms.
  • Familiarity with IAM (Identity and Access Management) and zero trust principles.
  • Experience working with distributed systems (e.g., Kafka, Cassandra, Elasticsearch).
  • Building custom monitoring tools or writing complex automation scripts.
  • Functional knowledge of database management (SQL and NoSQL).
  • Familiarity with distributed tracing (Jaeger, OpenTelemetry) and advanced log aggregation strategies (ELK stack, Splunk).
  • Familiarity with performance profiling tools and optimizing application performance under heavy load.
  • Familiarity in load testing and identifying bottlenecks.
  • Familiarity with Configuration Management using SaltStack for maintaining server configurations.

Benefits

Comp & perks
  • N/A 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
LinuxUnixPythonGoRubyBashPerlTerraformCloudFormationAnsible
Soft Skills
continuous improvementincident managementon-call rotationblameless postmortems