NTT DATA, Inc.

Senior Associate Site Reliability Engineer

NTT DATA, Inc.

full-time

Posted on:

Location Type: Hybrid

Location: HyderabadIndia

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Monitors system health, performance metrics, and alerts to identify and respond to incidents promptly
  • Works with Senior Site Reliability Engineers and teams to diagnose issues, troubleshoot problems, and restore services in a timely manner
  • Assists in the deployment and release of software applications and infrastructure changes
  • Collaborates with development teams to ensure smooth deployments, implement best practices, and minimize downtime during releases
  • Collaborates with senior SREs and operations teams to automate routine tasks and improve operational efficiency
  • Assists in capacity planning efforts, monitor resource utilization, and make recommendations for scaling infrastructure and services based on projected needs
  • Collaborates with senior SREs to ensure adequate capacity to meet growing demands
  • Documents incidents, their impact, and resolution procedures to maintain an incident knowledge base
  • Participates in post-incident reviews, contributes to root cause analysis, and helps implement preventive measures to minimize future incidents
  • Collaborates with security teams to implement security best practices and ensure compliance with industry standards and regulations
  • Assists in monitoring and responding to security incidents, applying appropriate mitigation measures
  • Stays updated with the latest industry trends, emerging technologies, and best practices in Site Reliability Engineering
  • Seeks opportunities to expand technical skills and knowledge through training, certifications, and self-study.

Requirements

  • Familiarity with infrastructure concepts, including cloud platforms (for example, AWS, Azure, Google Cloud)
  • Developing knowledge of programming or scripting languages (such as Python, Bash, or PowerShell) and version control systems (such as Git)
  • Relevant understanding of Linux/Unix systems and experience working with command-line tools
  • Strong problem-solving and analytical skills, with attention to detail
  • Excellent communication and collaboration skills, with the ability to work effectively in a team environment
  • Passion for automation, reliability, and continuous improvement
  • Familiarity with incident management processes, monitoring tools, and configuration management systems is beneficial
  • Developing expertise in performance monitoring, optimization, and troubleshooting using tools such as Prometheus, Grafana, or New Relic
  • Developing ability to optimize system performance, scalability, and reliability
  • Experience with performance monitoring and tuning tools (for example, Prometheus, Grafana, or New Relic) to identify bottlenecks, analyze performance data, and implement optimization strategies
  • Developing understanding of security principles, best practices, and compliance requirements
  • Bachelor's degree or equivalent in Computer Science, Information Technology, or a related field
  • Relevant certifications, such as AWS Certified DevOps Engineer - Professional, Google Cloud Professional DevOps Engineer, or Certified Kubernetes Administrator (CKA) preferred
  • Moderate level hands-on experience in a Site Reliability Engineering role or related roles, including experience in designing and maintaining highly available and scalable systems
  • Moderate level experience in incident response procedures and troubleshooting techniques to identify and resolve system issues
  • Moderate level experience in automation principles and tools (for example, Terraform, Jenkins, Git)
  • Moderate level experience with scripting languages and version control systems, such as Git, demonstrates an ability to automate tasks and work collaboratively.
Benefits
  • Competitive salary
  • Professional development opportunities
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonBashPowerShellLinuxUnixperformance monitoringtroubleshootingautomationscalabilityincident management
Soft Skills
problem-solvinganalytical skillsattention to detailcommunicationcollaborationteamworkpassion for continuous improvement
Certifications
AWS Certified DevOps Engineer - ProfessionalGoogle Cloud Professional DevOps EngineerCertified Kubernetes Administrator (CKA)