Tech Stack
KubernetesLinuxNumpyPandasPython
About the role
- Design and implement performance benchmarking and analysis frameworks for next-generation AI and HPC workloads
- Work across Python (primary) and C/C++ (for performance-critical modules) to deliver reliable and scalable tools
- Take full technical ownership of the core telemetry engine, including utilizing Jupyter Notebooks and other data analysis frameworks to analyze telemetry results
- Contribute to the DevOps environment, owning CI/CD pipelines and release processes for projects
- Drive technical innovation in the performance engineering ecosystem, including contributing to building next-gen agentic AI assistant
- Build tools that run at scale on clusters, clouds, and data centers to help R&D teams and customers root-cause bottlenecks and maximize throughput
Requirements
- B.Sc. in Computer Science, or a related engineering field
- 3+ years of professional software development experience
- A proven track record of technical ownership, driving a technical agenda, and problem solving
- High-level Python development skills, building robust, well-structured, production-grade applications
- C/C++ experience, especially for performance-critical or low-level components
- Experience with modern CI/CD pipelines and DevOps practices
- Linux systems knowledge, including software packaging (RPM, DEB) (preferred)
- Experience with Python data analysis and visualization frameworks (e.g., h5py, pandas, NumPy, Matplotlib/Plotly) (preferred)
- Experience with Slurm, Kubernetes, MPI, or other distributed job orchestration and cluster management systems (preferred)
- Familiarity with agentic AI concepts or frameworks (e.g., RAG techniques, LangChain, LangGraph, LlamaIndex, etc.) (preferred)
- Experience contributing to open-source projects (preferred)