
Senior Associate Site Reliability Engineer
NTT DATA, Inc.
full-time
Posted on:
Location Type: Hybrid
Location: Hyderabad • India
Visit company websiteExplore more
Job Level
About the role
- Monitors system health, performance metrics, and alerts to identify and respond to incidents promptly
- Works with Senior Site Reliability Engineers and teams to diagnose issues, troubleshoot problems, and restore services in a timely manner
- Assists in the deployment and release of software applications and infrastructure changes
- Collaborates with development teams to ensure smooth deployments, implement best practices, and minimize downtime during releases
- Collaborates with senior SREs and operations teams to automate routine tasks and improve operational efficiency
- Assists in capacity planning efforts, monitor resource utilization, and make recommendations for scaling infrastructure and services based on projected needs
- Collaborates with senior SREs to ensure adequate capacity to meet growing demands
- Documents incidents, their impact, and resolution procedures to maintain an incident knowledge base
- Participates in post-incident reviews, contributes to root cause analysis, and helps implement preventive measures to minimize future incidents
- Collaborates with security teams to implement security best practices and ensure compliance with industry standards and regulations
- Assists in monitoring and responding to security incidents, applying appropriate mitigation measures
- Stays updated with the latest industry trends, emerging technologies, and best practices in Site Reliability Engineering
- Seeks opportunities to expand technical skills and knowledge through training, certifications, and self-study.
Requirements
- Familiarity with infrastructure concepts, including cloud platforms (for example, AWS, Azure, Google Cloud)
- Developing knowledge of programming or scripting languages (such as Python, Bash, or PowerShell) and version control systems (such as Git)
- Relevant understanding of Linux/Unix systems and experience working with command-line tools
- Strong problem-solving and analytical skills, with attention to detail
- Excellent communication and collaboration skills, with the ability to work effectively in a team environment
- Passion for automation, reliability, and continuous improvement
- Familiarity with incident management processes, monitoring tools, and configuration management systems is beneficial
- Developing expertise in performance monitoring, optimization, and troubleshooting using tools such as Prometheus, Grafana, or New Relic
- Developing ability to optimize system performance, scalability, and reliability
- Experience with performance monitoring and tuning tools (for example, Prometheus, Grafana, or New Relic) to identify bottlenecks, analyze performance data, and implement optimization strategies
- Developing understanding of security principles, best practices, and compliance requirements
- Bachelor's degree or equivalent in Computer Science, Information Technology, or a related field
- Relevant certifications, such as AWS Certified DevOps Engineer - Professional, Google Cloud Professional DevOps Engineer, or Certified Kubernetes Administrator (CKA) preferred
- Moderate level hands-on experience in a Site Reliability Engineering role or related roles, including experience in designing and maintaining highly available and scalable systems
- Moderate level experience in incident response procedures and troubleshooting techniques to identify and resolve system issues
- Moderate level experience in automation principles and tools (for example, Terraform, Jenkins, Git)
- Moderate level experience with scripting languages and version control systems, such as Git, demonstrates an ability to automate tasks and work collaboratively.
Benefits
- Competitive salary
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonBashPowerShellLinuxUnixperformance monitoringtroubleshootingautomationscalabilityincident management
Soft Skills
problem-solvinganalytical skillsattention to detailcommunicationcollaborationteamworkpassion for continuous improvement
Certifications
AWS Certified DevOps Engineer - ProfessionalGoogle Cloud Professional DevOps EngineerCertified Kubernetes Administrator (CKA)