FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

AI Performance Optimization Engineer
Bright Vision TechnologiesAI Performance Optimization Engineer developing and optimizing AI workloads with a focus on throughput and latency. Collaborating across teams to implement solutions for performance gains.
Tech Stack
Tools & technologiesDistributed SystemsPython
About the role
Key responsibilities & impact- Profile and optimize end-to-end AI training and inference pipelines for throughput, latency, and cost.
- Identify and eliminate bottlenecks across data loading, model compute, communication, and memory.
- Implement and tune quantization, sparsity, and pruning strategies to reduce model footprint and accelerate inference.
- Optimize distributed training using tensor parallelism, pipeline parallelism, FSDP, and ZeRO-style sharding.
- Tune attention implementations using FlashAttention, paged attention, and related techniques.
- Implement KV cache optimization, continuous batching, and speculative decoding for LLM serving.
- Drive compiler-level optimizations using Triton, XLA, TorchInductor, or TVM, working with the broader ML framework community to land improvements that translate into measurable end-to-end performance gains.
- Optimize data pipelines, sharding strategies, and storage access patterns for high-throughput training.
- Build and maintain rigorous benchmark suites and regression frameworks across workloads.
- Collaborate with ML and platform engineering teams to embed best practices in standard pipelines.
- Drive cost-efficiency improvements through model architecture, hardware selection, and scheduling strategies.
- Evaluate new hardware and software offerings, and advise on adoption.
- Document performance tuning playbooks and share findings broadly across engineering teams.
- Stay current with AI systems research and translate advances into production improvements.
Requirements
What you’ll need- Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field.
- Six or more years of experience in performance engineering, ML systems, or HPC.
- Strong proficiency in Python and C++.
- Hands-on experience optimizing deep learning workloads on modern GPUs.
- Deep understanding of distributed training and inference techniques.
- Experience with profiling tools across CPU, GPU, and distributed systems.
- Familiarity with model compression techniques and their accuracy implications.
- Strong grasp of memory hierarchies, communication primitives, and parallelism strategies.
- Excellent measurement, debugging, and analytical reasoning skills.
- Strong communication and collaboration skills.
Benefits
Comp & perks- Comprehensive benefits
- Competitive compensation packages
- Supportive work-life balance
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonC++deep learningquantizationsparsitypruningtensor parallelismpipeline parallelismFlashAttentionprofiling tools
Soft Skills
analytical reasoningmeasurement skillsdebugging skillscommunication skillscollaboration skills