
Senior Software Engineer – Infrastructure
Ironflow AI
full-time
Posted on:
Location Type: Hybrid
Location: San Diego • California • United States
Visit company websiteExplore more
Job Level
About the role
- Design, build, and maintain production infrastructure on AWS (EKS, RDS, ECR, VPC, IAM, Secrets Manager, etc.).
- Develop and manage our Kubernetes clusters: deploy workloads, tune Karpenter node autoscaling, maintain Helm charts, and keep clusters healthy.
- Own and extend our GitOps deployment pipeline: GitHub Actions for CI/CD, ArgoCD for continuous delivery, and Helm for packaging.
- Manage supporting cluster operators including Envoy Gateway, External DNS, cert-manager, Fluent Bit, and the AWS Load Balancer Controller.
- Own and improve our observability stack—Grafana for dashboards, Loki for log aggregation, Tempo for distributed tracing, and Prometheus for metrics.
- Support multi-environment reliability across dev, stage, and production GovCloud accounts.
- Improve system resilience through load testing (Locust), E2E testing (Playwright/Cucumber), and thoughtful capacity planning.
- Contribute to backend services (FastAPI, SQLAlchemy) in Python and TypeScript.
- Work alongside product engineers as a first-class contributor, making architecture decisions that balance speed, cost, and reliability.
- Build developer experience tooling: local dev environments, CI pipeline improvements, and automated testing scaffolds that make the whole team faster.
- Support and extend Temporal-based workflow orchestration for background processing.
- Implement least-privilege IAM policies, IRSA (IAM Roles for Service Accounts), and network segmentation in a GovCloud environment.
- Manage secrets through AWS Secrets Manager and the External Secrets Operator with automated rotation.
- Maintain TLS automation via cert-manager and OIDC authentication flows.
- Enable SOC2, CMMC, and FedRAMP compliance activities: GRC platform integration, audit logging pipelines, FIPS-validated endpoint configuration, system boundary documentation, and evidence collection for third-party assessments.
Requirements
- 5+ years of professional experience in infrastructure, DevOps, SRE, or platform engineering.
- Deep hands-on experience with AWS services in production (EKS, IAM, Secrets Manager, ECR, RDS). Experience with or strong working knowledge of AWS GovCloud is a significant plus.
- Strong Kubernetes expertise: you've operated clusters, debugged networking issues, managed Helm charts, and tuned workloads.
- Proficiency in Python and/or TypeScript with a genuine interest in writing application code alongside infrastructure work.
- Experience with GitOps workflows: ArgoCD, GitHub Actions, and Helm-based deployments.
- Solid understanding of networking fundamentals (DNS, load balancing, TLS, Kubernetes Gateway API).
- Comfort with Linux systems administration and shell scripting.
- Familiarity with compliance-driven infrastructure: audit logging, access controls, and evidence collection for frameworks like CMMC, FedRAMP, and SOC 2.
- A collaborative, low-ego mindset: you thrive in small, fast-moving teams.
- Nice to Have: Experience with Envoy Gateway or the Kubernetes Gateway API.
- Background in PostgreSQL administration and schema-based multi-tenancy.
- Familiarity with the Grafana observability stack (Loki, Tempo, Prometheus).
- Experience with Karpenter for node autoscaling or cost optimization strategies for cloud spend.
- Experience with Temporal for workflow orchestration.
- Experience in a startup or high-growth environment where you wore many hats.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AWSKubernetesPythonTypeScriptGitOpsHelmPostgreSQLLinuxload testingE2E testing
Soft Skills
collaborative mindsetlow-ego mindsetteamworkarchitecture decision-makingcapacity planning
Certifications
SOC2CMMCFedRAMP