Tech Stack
AWSGrafanaPostgresPrometheusPythonSQLSwiftUnix
About the role
- Investigate and resolve client requests by diagnosing issues, applying fixes, and ensuring swift resolution to maintain service quality.
- Monitor and safeguard system health by identifying anomalies, troubleshooting failures, and escalating risks before they impact customers.
- Automate repetitive tasks and design operational tools that increase efficiency, leveraging AI and modern engineering practices to reduce friction.
- Accelerate problem resolution by building root-cause analysis tools that shorten time-to-fix and improve long-term reliability.
- Collaborate across functions (risk, DBA, dev, customer support) to resolve incidents and deliver end-to-end solutions.
- Create and maintain documentation that standardizes troubleshooting methods and empowers quicker L1 resolutions.
Requirements
- Strong experience in Python and SQL (PostgreSQL).
- Confident working across Unix-based environments.
- Hands-on experience with monitoring tools (Grafana, Kibana, Prometheus, Datadog, AWS CloudWatch).
- Proven background in T2–T3 technical support.
- DevOps experience with setting up alerts and incident workflows (PagerDuty or similar).
- Ability to design and implement internal tools that reduce downtime and streamline workflows.
- Skilled in writing clear technical documentation.
- English proficiency at B2 level or higher.
- Willingness to work in rotating shifts (24/7, Monday to Sunday).