Nextiva

Site Reliability Engineer I

Nextiva

full-time

Posted on:

Origin:  • 🇮🇳 India

Visit company website
AI Apply
Manual Apply

Job Level

Mid-LevelSenior

Tech Stack

AnsibleAWSCloudDNSEC2FirewallsGrafanaLinuxPerlPrometheusShell ScriptingSQLTCP/IPTerraform

About the role

  • Manage, monitor, and optimize Linux-based systems and servers.
  • Troubleshoot OS-level, network, and performance issues.
  • Deploy, configure, and manage AWS services including EC2, S3, RDS, IAM, CloudWatch, and VPC.
  • Optimize cost and performance of AWS environments and implement high-availability and disaster recovery strategies.
  • Develop automation scripts in Bash, Perl, and SQL; write deployment and monitoring scripts.
  • Maintain Infrastructure-as-Code using Terraform or CloudFormation.
  • Implement observability solutions (CloudWatch, Prometheus, Grafana, ELK stack) and ensure systems meet SLAs/SLOs/SLIs.
  • Respond to incidents, perform root cause analysis, and participate in on-call rotations.
  • Collaborate with development teams to embed reliability best practices and drive improvements in CI/CD and release processes.

Requirements

  • Strong proficiency in Linux administration and troubleshooting.
  • Solid hands-on experience with AWS services (EC2, S3, RDS, IAM, CloudWatch, VPC, etc.).
  • Proficiency in scripting languages: Bash, Perl, and SQL.
  • Experience with system monitoring and logging tools (CloudWatch, Nagios, ELK, Prometheus, Grafana).
  • Understanding of networking fundamentals (DNS, TCP/IP, VPN, firewalls).
  • Experience with automation/Infrastructure-as-Code tools like Ansible, Terraform, or CloudFormation.
  • Strong problem-solving and incident management skills.