Develop and Maintain Core Platform Services - Design, build, and maintain highly scalable and resilient microservices supporting platform-wide capabilities. Ensure our cloud platform is modular, extensible, and meets the needs of multiple product teams.
Enhance System Reliability & Observability - Implement and manage robust monitoring and alerting systems using Datadog to ensure operational visibility and proactive issue resolution. Drive best practices for logging, tracing, and monitoring across platform services.
Infrastructure as Code & Cloud Automation - Utilize Infrastructure as Code (IaC) tools such as Terraform and Terragrunt to automate and streamline cloud infrastructure provisioning and management. Optimize deployment pipelines to improve reliability and developer efficiency.
Technical Leadership & Cross-Team Collaboration - Provide technical leadership in system design, architecture, and best practices for building scalable services. Collaborate with product managers, engineers, and other teams to align platform capabilities with business needs. Act as a connector across teams, fostering collaboration and ensuring smooth integrations.
Operational Excellence & Continuous Improvement - Participate in operational reviews, post-mortems, and reliability initiatives to enhance system stability. Create follow-up actions for incident resolution and continuously work to improve system reliability and scalability. Drive efforts to reduce technical debt and improve engineering efficiency through automation and best practices.
Requirements
A deep understanding in Authentication and Authorization.
Expertise in various auth protocols
Strong hands-on experience with multi-tenancy authentication architectures and tenant isolation strategies.
Software Development Experience - Proven track record of delivering enterprise-ready, cloud-based systems with a focus on performance, security, and scalability.
Modern Software Practices - Strong proficiency in one or more programming languages: C#, Go, or Java. Experience with API services, distributed systems, and microservice architectures.
Cloud & Site Reliability Engineering (SRE) Skills - Deep understanding of AWS, Google Cloud, or Azure with hands-on experience designing for scalability, observability, and reliability. Knowledge of Kubernetes (EKS/GKE), Docker, and cloud-native application design.
Infrastructure Automation & DevOps - Experience with Infrastructure as Code (Terraform, AWS CDK, Terragrunt). Proficiency in CI/CD tooling such as GitHub Actions, ArgoCD, or Jenkins.
Observability & Monitoring - Hands-on experience with Datadog, Prometheus, or similar monitoring tools to drive operational excellence.
Cross-Team Collaboration & Business Focus - Ability to work effectively across teams, communicate clearly, and drive alignment with multiple stakeholders. A strong understanding of customer needs and how technical solutions align with business objectives.
Platform/Core Services Experience (Highly Desired) - Prior experience working on platform, core, or shared services teams is highly desirable. Experience building foundational services that support multiple product lines and teams.
Benefits
Diversity. Inclusion. They’re more than just words for us. They are the guiding values of how we build our teams, cultivate leaders, and create a culture where people feel connected.
We take care of our employees so they can take care of our customers. Customers who come from all walks of life just like us. We hire incredible people from diverse backgrounds because when we are different together, we are stronger together.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
microservicesInfrastructure as CodeTerraformTerragruntC#GoJavaAWSGoogle CloudAzure