Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
TDCX

Site Reliability Engineer

TDCX

Site Reliability Engineer designing and maintaining fault-tolerant systems for TDCX. Collaborating with engineering teams to ensure reliability and performance for scalable applications.

Posted 6/23/2026full-time🇸🇬 SingaporeMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
DockerGoGrafanaJavaKubernetesLinuxPrometheusPythonPyTorchShell ScriptingTensorflow

About the role

Key responsibilities & impact
  • Design, build, and maintain highly available, scalable, and fault-tolerant systems
  • Collaborate with software engineering teams to ensure applications are designed with reliability and performance in mind
  • Develop and maintain automation procedures to maximize system efficiency, minimize human intervention, and optimize routine tasks
  • Monitor and analyze system performance to identify and address bottlenecks before they impact users
  • Ensure the infrastructure can handle rapid growth in web traffic and ML data processing
  • Participate in 24/7 on-call rotations (including scheduled shifts and holidays)
  • Practice sustainable on-call response, conduct root-cause analysis, and lead blameless post-mortems to prevent recurrence
  • Implement monitoring tools (SLIs/SLOs/SLAs) and set up automated alerting and metrics to track system health and performance
  • Implement and maintain security best practices and ensure all systems meet regulatory requirements

Requirements

What you’ll need
  • Bachelor’s or Master’s degree in Computer Science, Information Technology, Computer Engineering, or a related field
  • 3+ years of experience as a Site Reliability Engineer, Systems Engineer, or Software Engineer
  • Proficient in at least one high-level programming language (e.g., Python, Go, C++, or Java) and shell scripting
  • Strong understanding of data structures and algorithms
  • Strong understanding of Linux operating systems and open-source technologies and a solid understanding of network architecture
  • Competent knowledge of relational database systems and database modeling
  • Preferred Qualifications:
  • Experience with containers and container orchestration platforms such as Docker and Kubernetes
  • Proficiency in or exposure to machine learning frameworks such as TensorFlow, PyTorch, MXNet, or PaddlePaddle
  • Hands-on experience with monitoring tools and methodologies (e.g., Prometheus, Grafana)
  • Soft Skills:
  • Strategic thinking, exceptional communication, and the ability to collaborate effectively with cross-functional teams in a fast-paced environment

Benefits

Comp & perks
  • Attractive remuneration and great perks
  • Comprehensive medical, insurance, and social security coverage
  • World-class workspaces
  • Engaging activities and recognition programs
  • Strong learning and development plans for your career growth
  • Positive work culture that enables your future
  • Easy-to-access location with direct public transport links
  • Flexible working arrangements
  • Coaching and mentoring from experts in your field

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonGoC++Javashell scriptingdata structuresalgorithmsLinuxrelational database systemsdatabase modeling
Soft Skills
strategic thinkingexceptional communicationcollaborationcross-functional teamworkproblem-solvingroot-cause analysisblameless post-mortemssustainable on-call responseperformance optimizationsystem efficiency