BridgePhase

Infrastructure Operations Engineer

BridgePhase

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Provide Tier 3 support for complex infrastructure and application-related incidents
  • Monitor system health, performance metrics, application logs, and infrastructure telemetry
  • Troubleshoot and resolve production issues across AWS infrastructure and Drupal-based platforms
  • Support AWS cloud services including compute, storage, networking, and security components
  • Investigate and diagnose performance bottlenecks, resource constraints, and configuration issues
  • Support CI/CD pipeline operations and troubleshoot deployment or release failures
  • Perform root cause analysis for recurring incidents and implement preventive measures
  • Coordinate incident response and resolution with development, DevSecOps, security, and infrastructure teams
  • Execute routine maintenance tasks including patching, scaling, backups, and system updates
  • Support deployment activities and release verification in production environments
  • Manage user support tickets and ensure timely resolution within SLA requirements
  • Maintain and update technical documentation for operational procedures and known issues
  • Implement and maintain monitoring alerts, logging, and automated health checks
  • Support disaster recovery testing and business continuity planning
  • Ensure compliance with federal security requirements and audit controls
  • Interface with federal stakeholders on operational status, issue escalation, and resolution
  • Collaborate with AWS support and third-party vendors for escalated technical issues

Requirements

  • At least 8 years of total professional experience with 5+ years in infrastructure operations, cloud engineering, or production support roles
  • Prior or current experience supporting government programs (GovCon experience required)
  • Strong technical knowledge and expertise in: AWS core services (EC2, S3, RDS, VPC, ELB/ALB, CloudFront, Route53)
  • Cloud security services (IAM, Security Groups, KMS, CloudTrail, GuardDuty)
  • Infrastructure monitoring and observability (CloudWatch, Datadog, New Relic, or similar)
  • Infrastructure as Code (Terraform, CloudFormation, Ansible)
  • CI/CD pipeline operations (Jenkins, GitLab CI, AWS CodePipeline)
  • Linux/Unix system administration and command-line tools
  • Networking concepts (VPCs, subnets, routing, VPNs, DNS)
  • Log aggregation and analysis (CloudWatch Logs, ELK stack, Splunk)
  • Container technologies (Docker, ECS, EKS, Kubernetes)
  • Demonstrated ability to: Production incident management and escalation
  • Troubleshoot complex issues under pressure in live environments
  • Perform root cause analysis and implement long-term fixes
  • Support security incident response and vulnerability remediation
  • Execute change management and configuration control
  • Maintain clear, accurate technical documentation
  • Work within ITIL or similar service management frameworks
  • Working knowledge and familiarity with: Federal security and compliance requirements (FedRAMP, FISMA, NIST)
  • DevSecOps practices and automation tooling
  • Backup, recovery, and disaster recovery procedures
  • Web application architecture and performance optimization
  • Database operations, backup/restore, and performance tuning
  • Agile development and operations methodologies
  • SLA management, KPIs, and operational reporting
  • Nice to Have (Strong Plus): Hands-on Drupal experience, including operational support, troubleshooting, or performance optimization
  • AWS infrastructure engineering experience beyond operations (design, modernization, or large-scale cloud migrations)
  • Experience supporting enterprise-scale or mission-critical information sharing platforms
  • AWS certifications (Solutions Architect, SysOps Administrator, Security Specialty)
Benefits
  • 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AWS core servicesCloud security servicesInfrastructure monitoringInfrastructure as CodeCI/CD pipeline operationsLinux/Unix system administrationNetworking conceptsLog aggregation and analysisContainer technologiesWeb application architecture
Soft Skills
Production incident managementTroubleshooting under pressureRoot cause analysisChange managementTechnical documentationCollaborationProblem-solvingCommunicationTime managementSLA management
Certifications
AWS Solutions ArchitectAWS SysOps AdministratorAWS Security Specialty