Site Reliability Engineer II

Verisk

Site Reliability Engineer developing and maintaining scalable, reliable systems at Verisk Analytics. Collaborating with development teams to improve performance and resilience.

Posted 6/5/2026full-timeHyderabad • 🇮🇳 IndiaJuniorMid-LevelWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

JavaPythonLinux/Unix administrationserverless technologiescontainerizationcloud infrastructureinfrastructure-as-codeCI/CD pipeline designmonitoring systemsdatabase technologies

Soft Skills

incident managementdisaster recoveryresilience engineeringcrisis communicationcapacity planningperformance optimizationroot cause analysispreventative measurescollaborationleadership

Tools & Technologies

TerraformCloudFormationAWSAzure DevOpsDockerJenkinsGitLab CIDynatraceload balancingCDN technologies

Certifications & Qualifications

Bachelor's degree in computer science

Industry Keywords

SREDevOpsincident responsemonitoring solutionsservice reliabilityerror budgetsproduction systemsautomationsecurity best practicesobservability

Tech Stack

Tools & technologies

AWSAzureCloudDockerJavaJenkinsLinuxPythonTerraformUnix

About the role

Key responsibilities & impact

Design, implement, and maintain reliable infrastructure systems with a focus on security, scalability, reliability, and automation using tools like Terraform or CloudFormation
Build and maintain scalable and resilient production systems with a focus on automation
Develop and implement monitoring solutions to ensure system health, performance, and availability
Lead incident response, perform root cause analysis, and implement preventative measures
Track SLOs, and SLAs to measure and improve service reliability and error budgets to drive reliability improvements
Design and implement CI/CD pipelines to enable rapid and reliable software delivery
Partner with development teams to improve application performance, resilience, and scalability
Contribute to capacity planning and performance optimization initiatives
Participate in an on-call rotation to support production systems
Develop and evolve security monitoring, alerting, and incident response

Requirements

What you’ll need

2-4 years of experience in SRE, DevOps, or similar roles with Java or Python knowledge
Expertise in incident management, disaster recovery, and building resilience engineering frameworks
Strong programming skills in at least one language such as Java or Python
Experience with Linux/Unix systems administration
Hands-on experience with serverless (Lambda) and containerization technologies (Docker)
Experience implementing and managing cloud infrastructure (AWS, Azure DevOps)
Advanced understanding of networking concepts, load balancing, security best practices, and CDN technologies
Experience with observability systems (like Dynatrace)
Knowledge of database technologies and their performance characteristics
Demonstrated experience handling incident response and post-mortem analysis
Bachelor's degree in computer science or equivalent practical experience
Deep knowledge of infrastructure-as-code tools (Terraform, CloudFormation)
Knowledge of CI/CD pipeline design and implementation (Jenkins, GitLab CI, Azure DevOps)
Experience building and maintaining comprehensive monitoring and alerting systems
Experience managing high-traffic, mission-critical production environments
Background in capacity planning and performance optimization
Strong incident management skills, including crisis communication

Benefits

Comp & perks

Short Term Incentive
Work Arrangement: Hybrid