InfraCloud Technologies

Production Engineer

InfraCloud Technologies

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇮🇳 India

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSAzureCloudGoGoogle Cloud PlatformGrafanaPrometheusPythonServiceNowSplunk

About the role

  • Own Tier 3 technical escalations from Technical Support and ensure rapid resolution.
  • Investigate, triage, and mitigate incidents, ensuring accountability and timely communication.
  • Conduct trend and root-cause analysis to identify recurring issues, bug patterns, and product gaps.
  • Read and interpret application code to isolate, reproduce, and diagnose complex technical problems.
  • Collaborate with Support and Product Engineering to drive systemic improvements and long-term fixes.
  • Contribute to the creation and maintenance of runbooks, escalation workflows, and troubleshooting guides.
  • Partner with cross-functional teams to improve monitoring, logging, and alerting for production systems.
  • Automate repetitive tasks and build tools to improve team efficiency.
  • Participate in on-call rotations as part of a 24×7 follow-the-sun model.

Requirements

  • 3–5 years of experience in Production Engineering, Technical Support (Tier 3), SRE, or similar roles in a SaaS or enterprise software environment.
  • Strong understanding of incident management, troubleshooting, and root cause analysis.
  • Ability to read and understand code (golang preferred) to debug issues, analyze stack traces, and collaborate effectively with developers.
  • Proficiency with ServiceNow, Jira, Azure DevOps, or equivalent tools.
  • Familiarity with monitoring and observability platforms (Grafana, Prometheus, Splunk, etc.).
  • Hands-on experience with cloud platforms such as Azure, AWS, or GCP.
  • Basic scripting or automation skills (e.g., Python, PowerShell, or Bash).
  • Strong communication and cross-functional collaboration skills.
  • Data-driven mindset with a focus on efficiency, metrics, and continuous improvement.
  • Nice to Have
  • Experience working in globally distributed, follow-the-sun teams.
  • Exposure to AI or automation for incident triage or resolution.
  • Experience contributing to DevOps or SRE practices.
  • Prior experience in backup, recovery, or data management products.
Benefits
  • Fully remote 📊 Resume Score Upload your resume to see if it passes auto-rejection tools used by recruiters Check Resume Score

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
incident managementtroubleshootingroot cause analysisgolangscriptingPythonPowerShellBashcloud platformsDevOps
Soft skills
communicationcross-functional collaborationdata-driven mindsetefficiencymetricscontinuous improvement