Imubit

Site Reliability Engineer

Imubit

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AnsibleAWSCloudDistributed SystemsGoGoogle Cloud PlatformGrafanaKubernetesPostgresPrometheusPythonSplunkTerraformVault

About the role

  • Design, deploy, and maintain Imubit’s cloud infrastructure to provide high uptime, scalability, and security.
  • Leverage public cloud services and tools to improve the efficiency and reliability of our services and workflows.
  • Architect and manage cross-cloud network infrastructure (e.g. subnets, routing tables, IPSec VPNs, Transit Gateways, firewall rules).
  • Engage in and improve the whole lifecycle of services, from inception and design, through deployment, operation, and refinement.
  • Participate in infrastructure on-call rotation and respond in a timely manner.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Participate in incident management: swiftly identify and resolve issues to minimize downtime and ensure seamless operations.
  • Collaborate with software developers, DevOps engineers, and other stakeholders to implement robust solutions and drive continuous improvement.

Requirements

  • 4 Years of experience maintaining production-level cloud infrastructure, including public cloud services (e.g., AWS, GCP).
  • Preferred BA/B.Sc. in Computer Science or equivalent
  • Experience with a programming language such as Python or Go.
  • Experience deploying and supporting services in Kubernetes, including GitOps management tools such as ArgoCD.
  • Familiarity with software development principles/concepts (e.g. Version control (Git), software development lifecycle).
  • Experience implementing and utilizing monitoring tools (e.g New Relic, Splunk, Grafana, Prometheus).
  • Experience managing production databases (e.g. PostgreSQL), including managed services (e.g. AWS RDS).
  • Experience with Infrastructure-as-code concepts and tools (e.g. Terraform, Ansible)
  • Experience with secrets management tools (e.g. HashiCorp Vault, AWS Secrets Manager)
  • Interest in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Ability to debug and optimize code and automate routine tasks.
  • Systematic problem-solving approach, coupled with effective communication skills and a sense of ownership and drive.
  • No visa sponsorship is available for this position.