Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
TDCX

Site Reliability Engineer

TDCX

Site Reliability Engineer designing and maintaining highly available systems at TDCX. Collaborate with software engineering to optimize performance, reliability, and efficiency.

Posted 5/23/2026full-timeSingapore • 🇸🇬 SingaporeMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
DockerGoGrafanaJavaKubernetesLinuxPrometheusPythonPyTorchShell ScriptingTensorflow

About the role

Key responsibilities & impact
  • Design, build, and maintain highly available, scalable, and fault-tolerant systems
  • Collaborate with software engineering teams to ensure applications are designed with reliability and performance in mind
  • Develop and maintain automation procedures to maximize system efficiency, minimize human intervention, and optimize routine tasks
  • Monitor and analyze system performance to identify and address bottlenecks before they impact users
  • Ensure the infrastructure can handle rapid growth in web traffic and ML data processing
  • Participate in 24/7 on-call rotations (including scheduled shifts and holidays)
  • Practice sustainable on-call response, conduct root-cause analysis, and lead blameless post-mortems to prevent recurrence
  • Implement monitoring tools (SLIs/SLOs/SLAs) and set up automated alerting and metrics to track system health and performance
  • Implement and maintain security best practices and ensure all systems meet regulatory requirements

Requirements

What you’ll need
  • Bachelor’s or Master’s degree in Computer Science, Information Technology, Computer Engineering, or a related field
  • 3+ years of experience as a Site Reliability Engineer, Systems Engineer, or Software Engineer
  • Proficient in at least one high-level programming language (e.g., Python, Go, C++, or Java) and shell scripting
  • Strong understanding of data structures and algorithms
  • Strong understanding of Linux operating systems and open-source technologies and a solid understanding of network architecture
  • Competent knowledge of relational database systems and database modeling
  • Experience with containers and container orchestration platforms such as Docker and Kubernetes (preferred)
  • Proficiency in or exposure to machine learning frameworks such as TensorFlow, PyTorch, MXNet, or PaddlePaddle (preferred)
  • Hands-on experience with monitoring tools and methodologies (e.g., Prometheus, Grafana)
  • Strategic thinking, exceptional communication, and the ability to collaborate effectively with cross-functional teams in a fast-paced environment

Benefits

Comp & perks
  • Attractive remuneration and great perks
  • Comprehensive medical, insurance, and social security coverage
  • World-class workspaces
  • Engaging activities and recognition programs
  • Strong learning and development plans for your career growth
  • Positive work culture that enables your future
  • Easy-to-access location with direct public transport links
  • Flexible working arrangements
  • Coaching and mentoring from experts in your field

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonGoC++Javashell scriptingdata structuresalgorithmsLinuxrelational database systemsmachine learning frameworks
Soft Skills
strategic thinkingexceptional communicationcollaborationproblem-solvingroot-cause analysisblameless post-mortemssustainable on-call responsecross-functional teamworkadaptabilitytime management