Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
AT&T

Senior System Engineering – Engineering Operations

AT&T

Senior System Engineer managing production issues and ensuring systems reliability at AT&T. Collaborating with dev teams and automating operations to optimize performance.

Posted 5/21/2026full-timePlano • Texas • 🇺🇸 United StatesSenior💰 $160,000 - $215,800 per yearWebsite

Tech Stack

Tools & technologies
AzureCloudDockerERPJ2EEJavaJavaScriptJenkinsKubernetesLinuxPythonSplunkSpringSQL

About the role

Key responsibilities & impact
  • Lead the response to production issues, ranging from identifying and troubleshooting problems to implementing immediate fixes.
  • Ensure minimal downtime and adherence to service level agreements (SLAs).
  • Build alerting, monitoring and dashboards that identify problems proactively.
  • Utilize strong analytical, technical and functional skills to diagnose and resolve complex issues within production environments with a focus on immediate impact mitigation, automating recovery processes and routine maintenance tasks to improve system reliability and efficiency.
  • Work with dev teams to implement long-term solutions to prevent recurrence of incidents.
  • Create and maintain documentation for system architecture, configuration, deployment procedures, and troubleshooting guides.
  • Develop and maintain scripts and automation tools to streamline operations, deployment processes, and repetitive tasks.
  • Identify non-functional requirements such as reliability, performance, scalability, application logging for observability and acceptance criteria during design and development and ensure that these are met before moving to production.
  • Monitor application performance using tools such as Dynatrace, App Dynamics and ELK.
  • Identify bottlenecks and work with dev teams to optimize the performance of applications through code improvements, configuration tuning, and resource optimization.
  • Define SLI/SLOs, Error Budgets, Automation focus.
  • Work with dev/architect/quality engineering teams to identify and document patterns of failures as lessons learnt from incidents and follow up to implement the remediations to make the application resilient.
  • Monitor system usage patterns and perform capacity planning to ensure scalability and reliability of applications and services.
  • Participate in security assessments and implement security best practices to safeguard applications and data.
  • Respond promptly to security incidents and vulnerabilities.
  • Work with Release Management related to upcoming changes to production to identify risks and mitigate them.
  • Collaborate with development teams to manage and support application releases and deployments.
  • Ensure changes are rolled out in a controlled manner with minimal impact on production services.
  • Proactive problem detection, trend and pattern analysis, assessment of impact of problems, functional analysis of problems.
  • Provide metrics and status reports and review with leadership and stakeholder communities; establish processes surrounding metrics gathering, reporting and communication; Provide prompt visibility and status of escalated issues, incidents and outages to leadership, business partners and other key stakeholders.
  • Strong verbal and written communication skills.
  • Work closely with Product Development teams to ensure Knowledge Transfer related to changes to the system well in advance of change getting operationalized.
  • On-call 24x7 support for agent facing applications– Home Grown J2EE apps as well as SaaS Platform apps - Salesforce, Salesforce Marketing Cloud and MuleSoft.
  • Support large scale applications in production with an Engineering approach (SRE) – including Java EE apps, ERP, CRM apps in an operations capacity.
  • Architect and develop web applications.
  • Use observability tools including Dynatrace, App Dynamics, Splunk, ELK, MuleSoft AnyPoint, Quantum Metric, Catchpoint to create alerts, dashboards, reports, synthetic monitoring.
  • Understanding and working experience with integration technologies and API Gateways, MuleSoft, WebLogic.
  • Utilize Object Oriented Programming Languages - Java, J2EE technologies, JavaScript, and frameworks (Spring).
  • Use automation tools and scripting languages (Python, Shell).
  • Utilize containerization (Docker, Kubernetes) and cloud services (Azure).
  • Employ DevOps practices and tools (CI/CD pipelines, Git, Jenkins).
  • Apply network protocols, load balancing, and security principles.
  • Utilize database SQL queries.
  • Build Linux shell scripts on demand.

Requirements

What you’ll need
  • Requires a Bachelor’s degree, or foreign equivalent degree in Computer Engineering, Computer Science, or Information Technology
  • 3 years of experience in the job offered or a related occupation supporting large scale applications in production with an Engineering approach (SRE)
  • Architecting and developing web applications
  • Using Observability tools including Dynatrace, App Dynamics, Splunk, ELK, Mulesoft AnyPoint, Quantum Metric, Catchpoint to create alerts, dashboards, reports, synthetic monitoring
  • Understanding and working experience with integration technologies and API Gateways, MuleSoft, WebLogic
  • Utilizing Object Oriented Programming Languages - Java, J2EE technologies, JavaScript, and frameworks (Spring)
  • Using automation tools and scripting languages (Python, Shell)
  • Utilizing containerization (Docker, Kubernetes) and cloud services (Azure)
  • Employing DevOps practices and tools (CI/CD pipelines, Git, Jenkins)
  • Applying network protocols, load balancing, and security principles
  • Utilizing database SQL queries
  • Building Linux shell scripts on demand.

Benefits

Comp & perks
  • Medical/Dental/Vision coverage
  • 401(k) plan
  • Tuition reimbursement program
  • Paid Time Off and Holidays (based on date of hire, at least 23 days of vacation each year and 9 company-designated holidays)
  • Paid Parental Leave
  • Paid Caregiver Leave
  • Additional sick leave beyond what state and local law require may be available but is unprotected
  • Adoption Reimbursement
  • Disability Benefits (short term and long term)
  • Life and Accidental Death Insurance
  • Supplemental benefit programs: critical illness/accident hospital indemnity/group legal
  • Employee Assistance Programs (EAP)
  • Extensive employee wellness programs
  • Employee discounts up to 50% off on eligible AT&T mobility plans and accessories, AT&T internet (and fiber where available) and AT&T phone

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
JavaJ2EEJavaScriptSpringPythonShell scriptingDockerKubernetesSQLCI/CD
Soft Skills
analytical skillsproblem-solvingcommunication skillscollaborationdocumentationcapacity planningtrend analysisincident responseleadershipknowledge transfer