Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
RecargaPay

Head of Engineering – Cloud & Platform

RecargaPay

Head of Cloud & Platform at RecargaPay building a world-class cloud ecosystem. Lead multi-disciplinary teams to ensure reliability and efficiency in a regulated fintech environment.

Posted 5/4/2026full-timeRemote • 🇧🇷 BrazilLeadWebsite

Tech Stack

Tools & technologies
AWSCloudDistributed SystemsGrafanaJavaKafkaKubernetesMicroservicesNode.jsPrometheusPythonSpringSpring BootSpringBootTerraformVault

About the role

Key responsibilities & impact
  • Define and execute the Cloud and Platform strategy, ensuring alignment with corporate objectives, regulatory frameworks, and cost-efficiency goals.
  • Lead a multi-disciplinary organization covering Cloud Infrastructure, SRE, Platform Engineering, and DevSecOps, fostering collaboration and shared accountability for uptime, security, and performance.
  • Drive modernization of infrastructure and delivery pipelines, enabling a unified, automated, and compliant cloud environment.
  • Partner with executive leadership to define scalable operating models, balancing autonomy for product squads with standardized guardrails and golden paths.
  • Establish a long-term architectural vision for cloud services, platform frameworks, and developer enablement tools.
  • Sponsor AI-assisted engineering adoption to enhance developer productivity, reduce toil, and accelerate delivery (e.g., Copilot, Cursor, LLM-based agents).
  • Serve as the ultimate technical and strategic authority for AWS, Kubernetes, IaC, Observability, and Reliability practices across the organization.
  • Oversee the design, scalability, and governance of the AWS multi-account organization, enforcing security, compliance, and cost policies (Control Tower, SCPs, Service Catalog).
  • Lead the definition and implementation of multi-region, multi-environment architectures ensuring reliability, latency optimization, and disaster recovery readiness (RPO/RTO).
  • Institutionalize well-architected principles (Security, Reliability, Performance, Cost, Sustainability) and drive continuous improvement programs based on regular audits.
  • Evolve network and connectivity architectures (VPC, Transit Gateway, PrivateLink, Global Accelerator) to meet scaling, compliance, and availability goals.
  • Own identity, access, and secrets management lifecycle (IAM least privilege, mTLS, KMS/HSM key rotation, Vault integration).
  • Oversee monitoring and observability frameworks, implementing standards, and unified dashboards across all services.
  • Ensure SLO-driven operations, with well-defined SLIs, error budgets, and automated incident management loops.
  • Lead resilience and reliability engineering practices, including chaos engineering, failover drills, dependency fallback design, and proactive degradation handling.
  • Build and scale the company’s Internal Developer Platform (IDP), empowering teams with self-service capabilities for environment provisioning, deployments, and observability.
  • Define golden paths, opinionated tooling, and reusable infrastructure modules, enabling consistent, secure, and fast software delivery across squads.
  • Ensure trunk-based development, progressive delivery (canary, blue/green), automated rollback, and health/SLO-gated deployments are embedded into CI/CD flows.
  • Drive GitOps adoption to achieve deterministic deployments, auditability, and drift detection.
  • Expand event-driven and streaming platforms (e.g., Kafka), defining keying, partitioning, and schema evolution strategies to support scalability and data integrity.
  • Partner with Security and Compliance to embed DevSecOps and Policy-as-Code practices into CI/CD and Kubernetes admission controllers.
  • Establish and lead a FinOps program, optimizing compute, storage, and data transfer costs while ensuring transparency through chargeback/showback models.
  • Define cost-to-serve models per service and implement automated guardrails for budgeting and right-sizing.
  • Integrate cost and performance telemetry into platform dashboards to drive data-informed decision-making.
  • Partner with Finance to align cloud spend forecasts and track savings initiatives tied to architecture decisions.
  • Lead and mentor senior engineering managers and principal engineers, building high-performance, high-accountability teams.
  • Promote a culture of reliability, automation, and continuous improvement through transparent metrics and post-incident learning loops.
  • Establish governance rhythms such as architecture councils, platform guilds, and reliability reviews to align technical direction and eliminate systemic friction.
  • Collaborate closely with Risk, Compliance, and Security to uphold standards like PCI-DSS, SOC2, ISO27001, LGPD, and GDPR within cloud and platform operations.

Requirements

What you’ll need
  • Academic background oriented toward Computer Science, Engineering, or Software Development disciplines.
  • Deep expertise in AWS cloud architecture, including multi-account management, VPC design, EKS, ECS, Lambda, and networking topologies.
  • Proven experience with Infrastructure as Code (Terraform, Pulumi) and GitOps automation at scale.
  • Strong understanding of Kubernetes internals, workload orchestration, and cost/performance optimization.
  • Experience implementing SRE and reliability frameworks: SLOs, error budgets, chaos testing, and automated incident remediation.
  • Mastery of observability and monitoring (CloudWatch, Grafana, Datadog, NewRelic) with trace/metric/log correlation.
  • Proficiency in security and compliance engineering: IAM, KMS, encryption, secrets lifecycle, policy enforcement (OPA/Rego), and regulatory controls (PCI, LGPD, GDPR).
  • Experience defining and governing API and event-driven architectures (OpenAPI/AsyncAPI, Kafka schema registries).
  • Deep knowledge of progressive delivery, service mesh (e.g., Istio), and DevSecOps pipelines.
  • Strong FinOps acumen: right-sizing, egress optimization, reserved instance and savings plan strategy, and service-level cost attribution.
  • Experience integrating AI-assisted workflows (GitHub Copilot Enterprise, LLM-based linters and others) into development and CI pipelines, with measurable productivity impact.
  • Extensive hands-on experience in software engineering roles, with solid proficiency in Java (Spring Boot) and working knowledge of Python and asynchronous programming.
  • Strong foundation in Object-Oriented Programming and relational database systems.
  • Solid understanding of web and mobile application architectures, including security, session management, and development best practices.
  • Expertise in Domain-Driven Design and microservices architecture, with proven ability to design high-performance, scalable, and reliable distributed systems.
  • Demonstrated experience defining and executing architectural roadmaps aligned with business and developer-experience goals.
  • Deep knowledge of networking in AWS.
  • Advanced experience architecting VPC topologies, including Transit Gateway, private/public subnet design, NAT/GW cost optimization, and egress control for regulated environments.
  • Hands-on experience implementing observability pipelines at scale, integrating NewRelic, CloudWatch, Prometheus, Grafana, Datadog.
  • Familiarity with EKS internals: node group management, autoscaling, and Kubernetes cost/latency optimization.
  • Proven experience managing multi-region and multi-environment deployments.
  • Expertise in AWS security hardening and compliance controls, including IAM least-privilege modeling, KMS envelope encryption, CloudTrail auditing, GuardDuty detections, and automatic remediation with Lambda/Step Functions.
  • Deep understanding of container security, image signing, ECR scanning, and OPA/Rego policy design for admission controllers.
  • Advanced experience with Infrastructure as Code using Terraform (modules, workspaces, policy enforcement) and Pulumi (multi-language stacks, secrets providers, CI integration).
  • Proven ability to implement GitOps workflows, ensuring deterministic deployments and drift detection.
  • Strong policy-as-code practice to codify security/SRE guardrails across CI/CD and Kubernetes admission controllers.
  • Expertise automating application stack provisioning (app resources, service accounts, IAM bindings, egress controls) through reusable IaC modules and pipelines.
  • Deep understanding of progressive delivery (canary, blue/green, shadow traffic, automated rollback) and service mesh (Istio/Linkerd/App Mesh) for safe deployment strategies.
  • Mastery of resilience and reliability patterns: timeouts, bounded retries with jitter, circuit breakers, bulkheads, back-pressure, outbox/saga orchestration, and graceful degradation.
  • Deep knowledge of event-driven and streaming architectures (Kafka and others), including partitioning strategies, compaction/retention policies, rebalancing, ordering guarantees, exactly-once semantics, and schema evolution via registries.
  • Strong background in data performance engineering: caching (read-through/write-behind), connection pool tuning, pagination/cursoring, latency budgeting, and throughput modeling.
  • Experience with SLO-driven reliability: defining SLIs, error budgets, and reducing alert fatigue via multi-signal correlation.
  • Proficiency with production monitoring tools (NewRelic, Grafana, Datadog, CloudWatch) and advanced observability instrumentation.
  • Proven experience building self-service developer platforms (Backstage, Internal Developer Portals) that expose golden paths for application scaffolding, environment provisioning, and secure deployments.
  • Experience implementing event-driven DevEx tooling (e.g., ephemeral environments, automated CI insights, preview deployments).
  • Strong knowledge of API lifecycle management and governance (OpenAPI/AsyncAPI, contract testing, versioning, idempotency, error modeling).
  • Expertise in CI/CD automation and DevSecOps (GitHub Actions, CodeBuild/CodePipeline, artifact provenance, environment promotion, changelog automation).
  • Practical compliance-by-design experience translating PCI-DSS, KYC/AML, GDPR, and LGPD controls into technical patterns (tokenization, segmentation, audit trails, retention/erasure).
  • Experience leading AWS Well-Architected Framework reviews across all pillars (Security, Reliability, Performance, Cost, Operational Excellence, Sustainability).
  • Experience designing cost-aware architectures, balancing performance, resilience, and financial efficiency.
  • Exposure to edge computing and CDN optimization (Lambda@Edge, CloudFront Functions, custom caching policies).
  • Fluent in English and Spanish (Portuguese a plus).

Benefits

Comp & perks
  • Competitive and market-aligned salary.
  • Annual Award Program
  • Remote work — wherever you are, you’re part of the team!
  • Home office allowance through a monthly deposit in the RecargaPay app.
  • Health and dental plans with no co-pay.
  • Life insurance.
  • Flexible meal allowance (via Flash).
  • TotalPass membership to take care of your health.
  • Language classes.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AWSKubernetesInfrastructure as CodeTerraformPulumiDevSecOpsObservabilityJavaPythonEvent-driven architecture
Soft Skills
leadershipcollaborationstrategic authoritymentoringcommunicationcontinuous improvementorganizational skillsproblem-solvingaccountabilityadaptability