
Cloud Operations Engineer
Sophos
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇮🇳 India
Visit company websiteJob Level
JuniorMid-Level
Tech Stack
AWSAzureCloudDistributed SystemsDNSFirewallsGrafanaITSMJenkinsLinuxPythonShell ScriptingTCP/IP
About the role
- Ensure the continuous availability, performance, and reliability of cloud-hosted/On-prem applications and infrastructure through 24x7 support operations
- Proactive monitoring of critical systems, swiftly identifying and resolving incidents, and escalating issues to appropriate teams
- Participate in a rotational on-call schedule to provide continuous operational support and rapid incident response for cloud-hosted applications and infrastructure
- Perform real-time monitoring of infrastructure, platforms, and applications to identify anomalies, performance degradation, or service disruptions
- Serve as the first line of defense for incident management by promptly acknowledging alerts, triaging issues, and executing documented runbooks
- Escalate unresolved or critical issues to appropriate support or engineering teams
- Act as the central point of contact for incident updates, ensuring clear, timely, and accurate communication with stakeholders
- Work closely with application support, DevOps, infrastructure, and network teams to troubleshoot, resolve, and prevent operational issues
- Participate in Root Cause Analysis (RCA) processes following major incidents and contribute to developing preventive measures and service improvement plans
- Follow and maintain standard operating procedures (SOPs), change management policies, and compliance requirements
- Identify and proactively report potential risks, configuration issues, or performance bottlenecks
- Maintain accurate documentation of systems, procedures, and incident logs and contribute to knowledge base articles
Requirements
- Proficiency in managing and troubleshooting services across at least one major cloud provider like AWS or Microsoft Azure
- Familiarity with core cloud services (Compute, Storage, Networking, IAM, Monitoring, Auto Scaling, etc.)
- Hands-on experience with enterprise-grade monitoring tools such as Grafana and CloudWatch
- Ability to configure alerts, dashboards, and automated health checks
- Strong knowledge of ITIL principles and experience with ITSM tools like PagerDuty, Jira
- Understanding of incident triage, escalation procedures, service restoration, and Root Cause Analysis (RCA)
- Working knowledge of Linux and Windows operating systems in a cloud or hybrid environment
- Familiarity with system administration tasks, shell scripting, and log analysis
- Ability to create and maintain basic scripts using Bash, Python, or PowerShell to automate operational tasks and monitoring functions
- Understanding of CI/CD pipelines, deployment processes, and integration with cloud environments
- Exposure to tools like Git and Jenkins CI/CD is a plus
- Basic understanding of TCP/IP, DNS, VPN, firewalls, load balancers, and cloud networking concepts (VPCs, NSGs, Subnets)
- Familiarity with identity and access management (IAM) and security best practices in a cloud environment
- Experience working with centralized logging solutions (e.g., AWS Cloudwatch or Azure Log Analytics)
- Ability to trace incidents and correlate logs across distributed systems
- Strong habit of maintaining accurate operational documentation and runbooks
- Good to have
- Proficient understanding of cloud-native monitoring and alerting platforms, showcasing a solid foundation in cloud technology
- Accumulate 1 to 2 years of practical experience in hands-on utilization of cloud computing, networking, storage, and database systems, with a preference for expertise in AWS
- Demonstrate a fundamental grasp of scripting tools like Python, Bash, and PowerShell, showcasing the ability to automate tasks for efficiency
- Certifications like RHCSA / RHCE , AWS Certified (Associate) – Solutions Architect or Six Sigma would be an advantage
Benefits
- Sophos operates a remote-first working model
- Our people – we innovate and create, all of which are accompanied by a great sense of fun and team spirit
- Employee-led diversity and inclusion networks that build community and provide education and advocacy
- Annual charity and fundraising initiatives and volunteer days for employees to support local communities
- Global employee sustainability initiatives to reduce our environmental footprint
- Global fitness and trivia competitions to keep our bodies and minds sharp
- Global wellbeing days for employees to relax and recharge
- Monthly wellbeing webinars and training to support employee health and wellbeing
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
cloud computingincident managementmonitoringscriptingLinuxWindowsCI/CDnetworkinglog analysisRoot Cause Analysis
Soft skills
communicationproblem-solvingcollaborationdocumentationincident triageescalation proceduresrisk identificationservice restorationproactive monitoringstakeholder engagement
Certifications
RHCSARHCEAWS Certified Solutions ArchitectSix Sigma