Tech Stack
AWSAzureCloudDistributed SystemsGoogle Cloud PlatformKafkaPySparkSDLCSpark
About the role
- Lead the design and development of distributed systems, data pipelines, and ML infrastructure with a focus on scalability and reliability
- Own end-to-end delivery of key features and services across the full SDLC: design, implementation, testing, deployment, and operations
- Drive innovation in Big Data, Generative AI, and Graph ML by translating emerging technologies into production-ready systems
- Build and optimize scalable, real-time analytic systems powering AI Agents
- Mentor junior and mid-level engineers, provide technical guidance, and promote engineering best practices
- Collaborate across teams to ensure solutions are resilient, secure, and high-performing
Requirements
- Degree in Computer Science, Mathematics, or a related field
- 8+ years of experience across the full software development lifecycle (design, coding, reviews, testing, deployment, operations)
- 5+ years of experience with distributed Big Data systems (e.g., PySpark, Lakehouse, Kafka, Debezium, Hudi, Druid, Flink, Spark Streaming)
- Experience with sensitive or streaming data pipelines, including governance and compliance requirements
- Experience with Graph technologies (e.g., GNNs)
- Proven track record of delivering complex, high-impact software systems in production
- Experience deploying large-scale solutions on cloud platforms (AWS, Azure, GCP)
- Strong problem-solving skills and ability to excel in ambiguous environments
- (Preferred) MS in Computer Science, Machine Learning, or a related discipline
- (Preferred) Hands-on experience building Generative AI solutions (RAG, AI Agents, LLM fine-tuning) in production