Keyfactor

DevOps Lead

Keyfactor

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Job Level

Senior

Tech Stack

AnsibleAWSAzureCloudCyber SecurityGrafanaJavaJenkinsKafkaKubernetesPrometheusTerraform

About the role

  • We are seeking a skilled and proactive DevOps Lead Engineer who can lead operations and deployment of a highly secure platform running on Kubernetes on AWS, Azure and AWS GovCloud. In this role, you will manage the infrastructure, ensure seamless deployments, and engage with customers to understand their needs, resolve issues, and optimize platform performance.
  • You will work closely with both technical teams and customers to bridge the gap between engineering and operations, ensuring a smooth user experience and maintaining high standards of security, compliance, and reliability for our platform.
  • Applicants must hold U.S. citizenship.
  • Platform Operations and Support: Manage and optimize platform operations on Kubernetes running on AWS, Azure and AWS GovCloud.
  • Maintain infrastructure as code (IaC) using tools like Terraform, CloudFormation, or Ansible.
  • Ensure platform scalability, availability, and performance in a mission-critical environment.
  • Monitor, troubleshoot, and resolve infrastructure and application-related issues.
  • Customer Engagement and Support: Serve as the primary technical point of contact for customers during platform operation.
  • Respond to and resolve customer issues, incidents, and requests promptly and professionally.
  • Gather feedback from customers to identify pain points and work with internal teams to resolve them.
  • Participate in supporting customer operations and incident management.
  • Continuous Integration/Continuous Deployment (CI/CD): Build and maintain CI/CD pipelines for automated deployments and updates.
  • Use ArgoCD or FluxCD to manage and automate Kubernetes deployments.
  • Ensure reliable software releases and patches by collaborating with development teams.
  • Perform regular platform updates, configuration management, and environment maintenance.
  • Security and Compliance: Ensure adherence to AWS GovCloud security standards, including encryption, IAM, and network security.
  • Implement and maintain compliance with government regulations and internal security policies (e.g., FedRAMP, NIST).
  • Conduct regular audits and security reviews to proactively identify vulnerabilities and areas for improvement.
  • Collaboration and Documentation: Collaborate with cross-functional teams, including development, operations, and security, to improve system reliability and customer satisfaction.
  • Create and maintain technical documentation, runbooks, and training materials for both internal teams and customers.
  • Provide training and onboarding support for customers and team members as needed.

Requirements

  • Bachelor's degree in Computer Science, Information Technology, or related field (or equivalent experience).
  • 5+ years of experience in a DevOps, Site Reliability Engineering (SRE), or Cloud Operations role.
  • Hands-on experience with Kubernetes on AWS and Azure.
  • Strong experience with IaC tools like Terraform, CloudFormation, or Ansible.
  • Experience with CI/CD tools such as Jenkins, GitLab CI, GitHub Actions, and ArgoCD or FluxCD for managing Kubernetes deployments.
  • Experience managing and monitoring OpenSearch, Kafka, Fluentd, or simiar log aggregation services.
  • Experience with Java applications and services in cloud-native environments.
  • Familiarity with networking, security practices, and compliance standards (FedRAMP, NIST).
  • Proven ability to engage with customers, gather requirements, and deliver timely solutions.
  • Excellent problem-solving skills with a focus on customer satisfaction and operational efficiency.
  • Strong communication skills, both written and verbal, with the ability to explain technical concepts to non-technical stakeholders.