
Infrastructure Operations Engineer
BridgePhase
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
About the role
- Provide Tier 3 support for complex infrastructure and application-related incidents
- Monitor system health, performance metrics, application logs, and infrastructure telemetry
- Troubleshoot and resolve production issues across AWS infrastructure and Drupal-based platforms
- Support AWS cloud services including compute, storage, networking, and security components
- Investigate and diagnose performance bottlenecks, resource constraints, and configuration issues
- Support CI/CD pipeline operations and troubleshoot deployment or release failures
- Perform root cause analysis for recurring incidents and implement preventive measures
- Coordinate incident response and resolution with development, DevSecOps, security, and infrastructure teams
- Execute routine maintenance tasks including patching, scaling, backups, and system updates
- Support deployment activities and release verification in production environments
- Manage user support tickets and ensure timely resolution within SLA requirements
- Maintain and update technical documentation for operational procedures and known issues
- Implement and maintain monitoring alerts, logging, and automated health checks
- Support disaster recovery testing and business continuity planning
- Ensure compliance with federal security requirements and audit controls
- Interface with federal stakeholders on operational status, issue escalation, and resolution
- Collaborate with AWS support and third-party vendors for escalated technical issues
Requirements
- At least 8 years of total professional experience with 5+ years in infrastructure operations, cloud engineering, or production support roles
- Prior or current experience supporting government programs (GovCon experience required)
- Strong technical knowledge and expertise in: AWS core services (EC2, S3, RDS, VPC, ELB/ALB, CloudFront, Route53)
- Cloud security services (IAM, Security Groups, KMS, CloudTrail, GuardDuty)
- Infrastructure monitoring and observability (CloudWatch, Datadog, New Relic, or similar)
- Infrastructure as Code (Terraform, CloudFormation, Ansible)
- CI/CD pipeline operations (Jenkins, GitLab CI, AWS CodePipeline)
- Linux/Unix system administration and command-line tools
- Networking concepts (VPCs, subnets, routing, VPNs, DNS)
- Log aggregation and analysis (CloudWatch Logs, ELK stack, Splunk)
- Container technologies (Docker, ECS, EKS, Kubernetes)
- Demonstrated ability to: Production incident management and escalation
- Troubleshoot complex issues under pressure in live environments
- Perform root cause analysis and implement long-term fixes
- Support security incident response and vulnerability remediation
- Execute change management and configuration control
- Maintain clear, accurate technical documentation
- Work within ITIL or similar service management frameworks
- Working knowledge and familiarity with: Federal security and compliance requirements (FedRAMP, FISMA, NIST)
- DevSecOps practices and automation tooling
- Backup, recovery, and disaster recovery procedures
- Web application architecture and performance optimization
- Database operations, backup/restore, and performance tuning
- Agile development and operations methodologies
- SLA management, KPIs, and operational reporting
- Nice to Have (Strong Plus): Hands-on Drupal experience, including operational support, troubleshooting, or performance optimization
- AWS infrastructure engineering experience beyond operations (design, modernization, or large-scale cloud migrations)
- Experience supporting enterprise-scale or mission-critical information sharing platforms
- AWS certifications (Solutions Architect, SysOps Administrator, Security Specialty)
Benefits
- 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AWS core servicesCloud security servicesInfrastructure monitoringInfrastructure as CodeCI/CD pipeline operationsLinux/Unix system administrationNetworking conceptsLog aggregation and analysisContainer technologiesWeb application architecture
Soft Skills
Production incident managementTroubleshooting under pressureRoot cause analysisChange managementTechnical documentationCollaborationProblem-solvingCommunicationTime managementSLA management
Certifications
AWS Solutions ArchitectAWS SysOps AdministratorAWS Security Specialty