
Staff Engineer – Applied AI Engineering
Clari
full-time
Posted on:
Location Type: Hybrid
Location: Bengaluru • India
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Architect and maintain scalable, reusable platform infrastructure and tools that support the entire application lifecycle, including data ingestion, workload orchestration, deployment, and monitoring for distributed systems and services
- Build standardized, production-ready deployment workflows and templates using tools like Airflow and orchestration frameworks to enable rapid development and consistent deployment patterns
- Implement robust CI/CD pipelines, Docker containerization, artifact registries, and configuration management to support reproducibility, scalability, and governance across platform services
- Optimize platform technologies, including distributed computing frameworks, caching layers, message queues, and real-time data processing systems
- Automate and streamline service deployment, scaling, versioning, and infrastructure provisioning workflows, ensuring consistency, reliability, and adherence to industry best practices.
- Ensure reliability, observability, and scalability of production workloads by implementing comprehensive monitoring, alerting, performance profiling, and continuous system evaluation
- Integrate infrastructure components such as Kubernetes orchestration, distributed computing frameworks (Ray, Spark), and cloud solutions (AWS/Azure/GCP) for robust production environments
- Drive infrastructure optimization for high-throughput use-cases, including efficient resource utilization (batching, caching, auto-scaling), deployment strategies, configuration management, and system updates at scale
- Partner with data engineering, product, infrastructure, and other stakeholder teams to align platform initiatives with broader company goals, infrastructure strategy, and innovation roadmap
- Contribute actively to internal documentation, onboarding, and training programs, promoting platform adoption and continuous improvement
Requirements
- 8+ years of experience in building distributed systems or platform infrastructure designed for high-scale, data-intensive workloads
- Expert-level proficiency in Python and familiarity with infrastructure tooling (Airflow, Kubernetes, Ray, Terraform), service frameworks (FastAPI, gRPC), and observability platforms (Prometheus, Grafana, DataDog)
- Experience implementing modern DevOps and platform engineering practices, including service lifecycle management, CI/CD, Docker, Kubernetes, artifact registries, and infrastructure-as-code tools (Terraform, Helm, CloudFormation)
- Demonstrated experience working with cloud infrastructure, ideally AWS or GCP, including Kubernetes clusters (GKE/EKS), serverless architectures, and managed services (e.g., Lambda, Cloud Run, ECS)
- Proven experience with distributed data infrastructure: message queues (Kafka, Kinesis), caching systems (Redis, Memcached), database optimization, and building resilient, fault-tolerant architectures
- Experience designing and maintaining real-time data pipelines, including integrations with feature stores, streaming platforms (Kafka, Kinesis), API gateways, and observability solutions
- Familiarity with SQL (postgres) / NoSQL (MongoDB) and data warehouse modeling; capable of managing complex data queries, joins, aggregations, and transformations
- Solid understanding of platform monitoring and optimization, including identifying performance bottlenecks, latency optimization, cost management, and scaling high-throughput API services efficiently
Benefits
- Flexible working hours and hybrid work opportunities
- Life and accidental coverage
- Mental health support provided by Modern Health
- 100% paid parental leave
- Discretionary paid time off, monthly ‘take a break’ days, and Focus Fridays
- Focus on culture: Charitable giving match, plus in-person and virtual events
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonCI/CDDockerKubernetesTerraformFastAPIgRPCSQLNoSQLreal-time data processing
Soft Skills
collaborationdocumentationtrainingcontinuous improvement