Performance analysis/ bottleneck analysis of complex, high performance GPUs and System-on-Chips (SoCs)
Work on hardware models of different levels of extraction, including performance models, RTL test benches and emulators, to find performance bottlenecks in the system
Work closely with the architecture and design teams to explore architecture trade-offs related to system performance, area, and power consumption
Understand key performance usecases or the product
Develop workloads and test suits targeting graphics, machine learning, automotive, video, compute vision applications running on these products
Drive methodologies for improving turnaround time, finding representative data-sets and enabling performance analysis early in the product development cycle
Develop required infrastructure including performance simulators, testbench components and analysis tools
Collaborate with the Deep Learning Automotive team to build real-time, cost-effective computing platforms
Requirements
BE/BTech or MS/MTech in relevant area (PhD is a plus)
2+ years of experience with exposure to performance analysis and complex system on chip and/or GPU architectures
Demonstrated history of technical leadership
Strong understanding of System-on-Chip (SoC) architecture, graphics pipeline, memory subsystem architecture and Network-on-Chip (NoC)/Interconnect architecture
Expert hands on competence in programming (C/C++) and scripting (Perl/Python)
Exposure to Verilog/System Verilog, SystemC/TLM is a strong plus
Strong debugging and analysis (including data and statistical analysis) skills, including use for rtl dumps to debug failures