
Site Reliability Engineer
Tillster
full-time
Posted on:
Location Type: Remote
Location: Portugal
Visit company websiteExplore more
About the role
- Analyzing and troubleshooting large-scale distributed systems in the public cloud
- Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity
- Improve and maintain monitoring and logging solutions that measure availability, latency and overall system health of production systems
- Provision and manage cloud Infrastructure through automation and infrastructure as code
- Restore healthy operation of applications and services through sustainable incident response and blameless postmortems
- Follow and monitor security and compliance best practices
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
Requirements
- Ability to program with one or more high level languages, ex: Typescript, Python, etc
- Configuration Management and Infrastructure as Code (e.g.: CloudFormation, Ansible)
- Monitoring and Alerting tools, ex: AWS Cloudwatch, New Relic, etc
- Incident management/on-call, ex: PagerDuty, etc
- Gather and analyze metrics to assist in performance tuning and fault finding
- Bachelor's degree from a four-year college or university, or three to four years related experience and/or training; or equivalent combination of education and experience.
- 3+ years of software engineering and/or IT operations and infrastructure experience preferred
Benefits
- Compensation competitive to market and geographical location.
- Meal allowance for each day worked available through meal card.
- Home/Office allowance reimbursement per calendar month, pro-rated based on employment start date.
- Health insurance: Tillster pays the premium for employee private health insurance. Employees have the option to add their spouse/dependents at the employee’s cost.
- Holidays: Up to 14 federal and local/municipal holidays in accordance with applicable Portuguese Labour laws, dependent on your employment start date.
- Vacation: Up to 22 days of vacation every holiday year, pro-rated based on employment start date.
- Education, Learning & Development: We offer Udemy Learning courses; and ongoing learning and development opportunities.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
TypescriptPythonInfrastructure as CodeCloudFormationAnsibleMonitoringAlertingPerformance tuningFault findingIncident management
Soft Skills
Problem solvingProactive approachAnalytical skillsCommunicationCollaborationAdaptabilityAttention to detailCritical thinkingOrganizational skillsIncident response