FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior System Engineering – Engineering Operations
AT&TSenior System Engineer managing production issues and ensuring systems reliability at AT&T. Collaborating with dev teams and automating operations to optimize performance.
Posted 5/21/2026full-timePlano • Texas • 🇺🇸 United StatesSenior💰 $160,000 - $215,800 per yearWebsite
Tech Stack
Tools & technologiesAzureCloudDockerERPJ2EEJavaJavaScriptJenkinsKubernetesLinuxPythonSplunkSpringSQL
About the role
Key responsibilities & impact- Lead the response to production issues, ranging from identifying and troubleshooting problems to implementing immediate fixes.
- Ensure minimal downtime and adherence to service level agreements (SLAs).
- Build alerting, monitoring and dashboards that identify problems proactively.
- Utilize strong analytical, technical and functional skills to diagnose and resolve complex issues within production environments with a focus on immediate impact mitigation, automating recovery processes and routine maintenance tasks to improve system reliability and efficiency.
- Work with dev teams to implement long-term solutions to prevent recurrence of incidents.
- Create and maintain documentation for system architecture, configuration, deployment procedures, and troubleshooting guides.
- Develop and maintain scripts and automation tools to streamline operations, deployment processes, and repetitive tasks.
- Identify non-functional requirements such as reliability, performance, scalability, application logging for observability and acceptance criteria during design and development and ensure that these are met before moving to production.
- Monitor application performance using tools such as Dynatrace, App Dynamics and ELK.
- Identify bottlenecks and work with dev teams to optimize the performance of applications through code improvements, configuration tuning, and resource optimization.
- Define SLI/SLOs, Error Budgets, Automation focus.
- Work with dev/architect/quality engineering teams to identify and document patterns of failures as lessons learnt from incidents and follow up to implement the remediations to make the application resilient.
- Monitor system usage patterns and perform capacity planning to ensure scalability and reliability of applications and services.
- Participate in security assessments and implement security best practices to safeguard applications and data.
- Respond promptly to security incidents and vulnerabilities.
- Work with Release Management related to upcoming changes to production to identify risks and mitigate them.
- Collaborate with development teams to manage and support application releases and deployments.
- Ensure changes are rolled out in a controlled manner with minimal impact on production services.
- Proactive problem detection, trend and pattern analysis, assessment of impact of problems, functional analysis of problems.
- Provide metrics and status reports and review with leadership and stakeholder communities; establish processes surrounding metrics gathering, reporting and communication; Provide prompt visibility and status of escalated issues, incidents and outages to leadership, business partners and other key stakeholders.
- Strong verbal and written communication skills.
- Work closely with Product Development teams to ensure Knowledge Transfer related to changes to the system well in advance of change getting operationalized.
- On-call 24x7 support for agent facing applications– Home Grown J2EE apps as well as SaaS Platform apps - Salesforce, Salesforce Marketing Cloud and MuleSoft.
- Support large scale applications in production with an Engineering approach (SRE) – including Java EE apps, ERP, CRM apps in an operations capacity.
- Architect and develop web applications.
- Use observability tools including Dynatrace, App Dynamics, Splunk, ELK, MuleSoft AnyPoint, Quantum Metric, Catchpoint to create alerts, dashboards, reports, synthetic monitoring.
- Understanding and working experience with integration technologies and API Gateways, MuleSoft, WebLogic.
- Utilize Object Oriented Programming Languages - Java, J2EE technologies, JavaScript, and frameworks (Spring).
- Use automation tools and scripting languages (Python, Shell).
- Utilize containerization (Docker, Kubernetes) and cloud services (Azure).
- Employ DevOps practices and tools (CI/CD pipelines, Git, Jenkins).
- Apply network protocols, load balancing, and security principles.
- Utilize database SQL queries.
- Build Linux shell scripts on demand.
Requirements
What you’ll need- Requires a Bachelor’s degree, or foreign equivalent degree in Computer Engineering, Computer Science, or Information Technology
- 3 years of experience in the job offered or a related occupation supporting large scale applications in production with an Engineering approach (SRE)
- Architecting and developing web applications
- Using Observability tools including Dynatrace, App Dynamics, Splunk, ELK, Mulesoft AnyPoint, Quantum Metric, Catchpoint to create alerts, dashboards, reports, synthetic monitoring
- Understanding and working experience with integration technologies and API Gateways, MuleSoft, WebLogic
- Utilizing Object Oriented Programming Languages - Java, J2EE technologies, JavaScript, and frameworks (Spring)
- Using automation tools and scripting languages (Python, Shell)
- Utilizing containerization (Docker, Kubernetes) and cloud services (Azure)
- Employing DevOps practices and tools (CI/CD pipelines, Git, Jenkins)
- Applying network protocols, load balancing, and security principles
- Utilizing database SQL queries
- Building Linux shell scripts on demand.
Benefits
Comp & perks- Medical/Dental/Vision coverage
- 401(k) plan
- Tuition reimbursement program
- Paid Time Off and Holidays (based on date of hire, at least 23 days of vacation each year and 9 company-designated holidays)
- Paid Parental Leave
- Paid Caregiver Leave
- Additional sick leave beyond what state and local law require may be available but is unprotected
- Adoption Reimbursement
- Disability Benefits (short term and long term)
- Life and Accidental Death Insurance
- Supplemental benefit programs: critical illness/accident hospital indemnity/group legal
- Employee Assistance Programs (EAP)
- Extensive employee wellness programs
- Employee discounts up to 50% off on eligible AT&T mobility plans and accessories, AT&T internet (and fiber where available) and AT&T phone
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
JavaJ2EEJavaScriptSpringPythonShell scriptingDockerKubernetesSQLCI/CD
Soft Skills
analytical skillsproblem-solvingcommunication skillscollaborationdocumentationcapacity planningtrend analysisincident responseleadershipknowledge transfer