
Frontline Engineer
TCGplayer (an eBay company)
full-time
Posted on:
Location Type: Remote
Location: Montana • United States
Visit company websiteExplore more
Salary
💰 $103,200 - $178,400 per year
Tech Stack
About the role
- Serve as Incident Commander, leading real-time response efforts, managing communication across teams, triaging issues, and driving resolution of high-priority incidents to minimize downtime and business disruption.
- Execute documented runbooks for troubleshooting and resolving production incidents involving AWS services (EC2, CloudWatch, IAM) and Kubernetes clusters (pods, deployments, scaling).
- Collaborate closely with engineering teams post-incident, performing root cause analysis, documenting lessons learned, and driving the implementation of durable solutions.
- Drive operational excellence by measuring and analyzing critical metrics (e.g., MTTR, SLA adherence) to identify improvement opportunities and implement impactful solutions.
- Continuously refine and update operational runbooks and procedures, ensuring alignment with evolving technologies and business needs.
- Proactively contribute to long-term strategic initiatives to improve incident management practices.
Requirements
- A Bachelor’s degree in a technical field or equivalent experience (5+ years) in system administration, infrastructure engineering, or related roles; relevant certifications are a plus.
- Direct experience as an incident commander, including managing live incident calls, coordinating triage efforts, and driving communications during high-pressure situations.
- Strong communication skills with the ability to clearly articulate technical details and strategies to both technical and non-technical stakeholders.
- Excellent problem-solving capabilities, able to stay composed and decisive under pressure during high-impact incidents.
- Hands-on operational experience with AWS in a production environment, specifically executing runbooks, restarting EC2 instances, checking alarms, and pulling logs from CloudWatch.
- Proficiency with Kubernetes, including troubleshooting containerized workloads, understanding pod health, managing deployments, and interacting directly with Kubernetes clusters.
- Experience with scripting (Python, PowerShell, or Bash) to automate operational tasks or assist in incident resolution workflows.
Benefits
- full range of medical, financial, and/or other benefits
- 401(k) eligibility
- various paid time off benefits, such as PTO and parental leave
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AWSKubernetesPythonPowerShellBashincident managementtroubleshootingroot cause analysisrunbooksoperational metrics
Soft Skills
communicationproblem-solvingleadershipdecisivenesscollaborationstrategic thinkingcomposure under pressuredocumentationanalytical skillsteam coordination
Certifications
Bachelor’s degree in technical fieldrelevant certifications