
Production Engineer
InfraCloud Technologies
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇮🇳 India
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
AWSAzureCloudGoGoogle Cloud PlatformGrafanaPrometheusPythonServiceNowSplunk
About the role
- Own Tier 3 technical escalations from Technical Support and ensure rapid resolution.
- Investigate, triage, and mitigate incidents, ensuring accountability and timely communication.
- Conduct trend and root-cause analysis to identify recurring issues, bug patterns, and product gaps.
- Read and interpret application code to isolate, reproduce, and diagnose complex technical problems.
- Collaborate with Support and Product Engineering to drive systemic improvements and long-term fixes.
- Contribute to the creation and maintenance of runbooks, escalation workflows, and troubleshooting guides.
- Partner with cross-functional teams to improve monitoring, logging, and alerting for production systems.
- Automate repetitive tasks and build tools to improve team efficiency.
- Participate in on-call rotations as part of a 24×7 follow-the-sun model.
Requirements
- 3–5 years of experience in Production Engineering, Technical Support (Tier 3), SRE, or similar roles in a SaaS or enterprise software environment.
- Strong understanding of incident management, troubleshooting, and root cause analysis.
- Ability to read and understand code (golang preferred) to debug issues, analyze stack traces, and collaborate effectively with developers.
- Proficiency with ServiceNow, Jira, Azure DevOps, or equivalent tools.
- Familiarity with monitoring and observability platforms (Grafana, Prometheus, Splunk, etc.).
- Hands-on experience with cloud platforms such as Azure, AWS, or GCP.
- Basic scripting or automation skills (e.g., Python, PowerShell, or Bash).
- Strong communication and cross-functional collaboration skills.
- Data-driven mindset with a focus on efficiency, metrics, and continuous improvement.
- Nice to Have
- Experience working in globally distributed, follow-the-sun teams.
- Exposure to AI or automation for incident triage or resolution.
- Experience contributing to DevOps or SRE practices.
- Prior experience in backup, recovery, or data management products.
Benefits
- Fully remote 📊 Resume Score Upload your resume to see if it passes auto-rejection tools used by recruiters Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
incident managementtroubleshootingroot cause analysisgolangscriptingPythonPowerShellBashcloud platformsDevOps
Soft skills
communicationcross-functional collaborationdata-driven mindsetefficiencymetricscontinuous improvement