SWEEP

Site Reliability Engineer

SWEEP

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 €60,000 - €80,000 per year

Job Level

Mid-LevelSenior

Tech Stack

AWSCloudDockerKubernetesPostgresRubyRuby on RailsTerraform

About the role

  • Design, implement, and maintain highly available, scalable, and secure cloud infrastructure for the Sweep Data platform and AI workloads using Infrastructure as Code practices
  • Improve and expand observability strategy with Datadog for Rails application and AI workloads
  • Develop scalable infrastructure to support machine learning model training, deployment, and monitoring
  • Participate in incident response and post-mortem reviews
  • Support critical infrastructure scaling projects and high-traffic systems design
  • Establish team processes including runbooks, workflows, and documentation
  • Collaborate within SRE guild and across engineering teams and AI/ML teams
  • Manage day-to-day operations including on-call duties, capacity planning, and proactive system health monitoring
  • Implement security measures and support enterprise customer security requirements including BYOK and data sovereignty compliance
  • Contribute to maintaining SOC 2 Type 2, ISO 27001 compliance
  • Proactively improve systems and stay up-to-date with industry trends

Requirements

  • Engineering degree in computer science or 3+ years of DevOps/SRE experience
  • Strong candidates at 5+ years preferred
  • Good knowledge of AWS (including ECS/Fargate)
  • Docker
  • Terraform
  • PostgreSQL at scale (experience with sharding, clustering, or high-volume scenarios preferred)
  • Datadog expertise strongly preferred
  • Experience with continuous integration and continuous deployment
  • Experience with high-traffic, multi-tenant systems and database scaling strategies
  • Knowledge and experience in data modeling, database design, and data management
  • Strong operational mindset with experience in day-to-day production operations
  • Experience with on-call rotations and production incident management
  • Experience improving observability and monitoring systems
  • Understanding of clean code and clean infrastructure practices
  • English fluency; French is a plus
Victory Live

Staff Software Engineer

Victory Live
Leadfull-time🇺🇸 United States
Posted: 7 days agoSource: ticketevolution.applytojob.com
AWSCloudDockerGrafanaKafkaKubernetesPostgresRedisRubyRuby on RailsTerraform
Silver.dev

Senior Backend Engineer, Ruby on Rails

Silver.dev
Seniorfull-time$95k–$105k / year🇦🇷 Argentina
Posted: 13 days agoSource: jobs.ashbyhq.com
AWSDockerElasticSearchGoogle Cloud PlatformJavaScriptKafkaKubernetesMySQLPostgresRabbitMQRedisRSpec+2 more
Great Question

Senior DevOps / SRE Engineer

Great Question
Seniorcontract🇺🇸 United States
Posted: 14 days agoSource: jobs.ashbyhq.com
AWSCloudDockerEC2PostgresReactRedisRubyRuby on RailsTerraform
Magic

Staff Infrastructure Engineer

Magic
Leadfull-time$220k–$270k / yearNew York · 🇺🇸 United States
Posted: 22 days agoSource: boards.greenhouse.io
AWSCloudDockerJenkinsKubernetesMavenPythonRubyTerraformWeb3
Intellistack

Senior Platform Engineer

Intellistack
Seniorfull-time🇵🇱 Poland
Posted: 9 days agoSource: jobs.ashbyhq.com
AWSAzureCloudDistributed SystemsDockerGoGoogle Cloud PlatformJavaJavaScriptKafkaKubernetesNode.js+4 more