TWG Global

Site Reliability Engineer

TWG Global

full-time

Posted on:

Location Type: Hybrid

Location: JacksonvilleFloridaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $120,000 - $190,000 per year

About the role

  • Build and maintain infrastructure to support real-time and batch ML workloads
  • Implement observability tools (logging, monitoring, alerting) for model performance and system uptime
  • Design and manage CI/CD pipelines for ML and data applications
  • Ensure high availability, disaster recovery, and rollback capabilities for production environments
  • Manage access controls, secrets, and security policies in collaboration with compliance and IT
  • Troubleshoot incidents, lead postmortems, and drive root-cause resolution
  • Work with U.S. and international teams to provide 24/7 coverage across time zones

Requirements

  • 3–6 years of experience in DevOps, SRE, or backend engineering roles
  • Proficient with tools like Docker, Kubernetes, Terraform, GitLab/GitHub Actions, Airflow
  • Strong scripting in Python or Bash and familiarity with Linux environments
  • Experience deploying and monitoring ML models or data pipelines in production
  • Knowledge of observability stacks (e.g., Prometheus, Grafana, ELK, Datadog)
  • Familiarity with cloud platforms (e.g., AWS, GCP, or Azure)
  • Strong documentation, problem-solving, and incident response skills
  • Preferred Qualifications:
  • Experience supporting ML/AI workflows using Palantir Foundry.
  • Exposure to compliance frameworks like SOC 2, ISO 27001, or financial regulations
  • Knowledge of MLOps frameworks (e.g., MLflow, Kubeflow, SageMaker Pipelines)
  • Ability to automate deployments, testing, and monitoring at scale
Benefits
  • Work on real-world AI applications with high-impact clients
  • Collaborate with world-class data scientists, engineers, and product leaders
  • Flat org structure, high trust, high autonomy
  • Competitive salary + performance-based incentives
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
DevOpsSREbackend engineeringscripting in Pythonscripting in BashLinux environmentsMLOps frameworksobservability stacksCI/CD pipelinesmonitoring ML models
Soft Skills
problem-solvingincident responsedocumentationcollaboration
Certifications
SOC 2ISO 27001