
Senior Software Engineer, LLM Inferencing, AI Gateway
Wells Fargo
full-time
Posted on:
Location Type: Hybrid
Location: Charlotte • California • North Carolina • United States
Visit company websiteExplore more
Salary
💰 $100,000 - $196,000 per year
Job Level
About the role
- Lead complex Generative AI initiatives and deliverables within technical domain environments
- Contribute to large scale planning of strategies
- Design, code, test, debug, and document for projects and programs associated with technology domain, including upgrades and deployments
- Review moderately complex technical challenges that require an in-depth evaluation of technologies and procedures
- Resolve moderately complex issues and lead a team to meet existing client needs or potential new clients needs while leveraging solid understanding of the function, policies, procedures, or compliance requirements
- Collaborate and consult with peers, colleagues, and mid-level managers to resolve technical challenges and achieve goals
- Lead projects and act as an escalation point, provide guidance and direction to less experienced staff
- Engineer GPUs clusters and node pools; configure NVLink/NVSwitch, NVIDIA GPU Operator, MIG profiles, container runtime, and kernel/driver baselines for high‑throughput LLM/SLM workloads.
Requirements
- 4+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
- 1+ years of experience with GPU Inference including NVIDIA CUDA, cuDNN, NVLink/NVSwitch, MIG, NIXL, GPU profiling, and performance tuning on H100/H200 architectures
- 1+ years of experience with GPU orchestration platforms, such as RunAI (collections, queues, quotas, preemption, fair-share scheduling), OpenShift AI (RHOAI), and cluster administration on OCP or GKE
- 1+ years of experience with LLM/SLM serving frameworks, including vLLM, Triton, TensorRT‑LLM/MII, KV‑cache optimization strategies, and FP8/INT4 quantization techniques (AWQ/GPTQ)
- 1+ years of experience working with LLM API gateways, including OAuth2/mTLS authentication, rate‑limiting and quota management, OpenAPI/SDK integration, SLAs, and versioning/deprecation practices
- 2+ years of experience in Generative AI engineering, including LLM/SLM operations, fine‑tuning, evaluation pipelines, and developing model‑specific performance optimization recipes
- 4+ years of experience in Python, including scripting, automation, and model/inference‑related development
Benefits
- Health benefits
- 401(k) Plan
- Paid time off
- Disability benefits
- Life insurance, critical illness insurance, and accident insurance
- Parental leave
- Critical caregiving leave
- Discounts and savings
- Commuter benefits
- Tuition reimbursement
- Scholarships for dependent children
- Adoption reimbursement
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Software EngineeringGPU InferenceNVIDIA CUDAcuDNNNVLinkNVSwitchMIGGPU profilingperformance tuningPython
Soft Skills
leadershipcollaborationproblem-solvingcommunicationguidanceteam management