FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesDockerGoGrafanaJavaKubernetesLinuxPrometheusPythonPyTorchShell ScriptingTensorflow
About the role
Key responsibilities & impact- Design, build, and maintain highly available, scalable, and fault-tolerant systems
- Collaborate with software engineering teams to ensure applications are designed with reliability and performance in mind
- Develop and maintain automation procedures to maximize system efficiency, minimize human intervention, and optimize routine tasks
- Monitor and analyze system performance to identify and address bottlenecks before they impact users
- Ensure the infrastructure can handle rapid growth in web traffic and ML data processing
- Participate in 24/7 on-call rotations (including scheduled shifts and holidays)
- Practice sustainable on-call response, conduct root-cause analysis, and lead blameless post-mortems to prevent recurrence
- Implement monitoring tools (SLIs/SLOs/SLAs) and set up automated alerting and metrics to track system health and performance
- Implement and maintain security best practices and ensure all systems meet regulatory requirements
Requirements
What you’ll need- Bachelor’s or Master’s degree in Computer Science, Information Technology, Computer Engineering, or a related field
- 3+ years of experience as a Site Reliability Engineer, Systems Engineer, or Software Engineer
- Proficient in at least one high-level programming language (e.g., Python, Go, C++, or Java) and shell scripting
- Strong understanding of data structures and algorithms
- Strong understanding of Linux operating systems and open-source technologies and a solid understanding of network architecture
- Competent knowledge of relational database systems and database modeling
- Experience with containers and container orchestration platforms such as Docker and Kubernetes (preferred)
- Proficiency in or exposure to machine learning frameworks such as TensorFlow, PyTorch, MXNet, or PaddlePaddle (preferred)
- Hands-on experience with monitoring tools and methodologies (e.g., Prometheus, Grafana)
- Strategic thinking, exceptional communication, and the ability to collaborate effectively with cross-functional teams in a fast-paced environment
Benefits
Comp & perks- Attractive remuneration and great perks
- Comprehensive medical, insurance, and social security coverage
- World-class workspaces
- Engaging activities and recognition programs
- Strong learning and development plans for your career growth
- Positive work culture that enables your future
- Easy-to-access location with direct public transport links
- Flexible working arrangements
- Coaching and mentoring from experts in your field
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonGoC++Javashell scriptingdata structuresalgorithmsLinuxrelational database systemsmachine learning frameworks
Soft Skills
strategic thinkingexceptional communicationcollaborationproblem-solvingroot-cause analysisblameless post-mortemssustainable on-call responsecross-functional teamworkadaptabilitytime management
