LLM Inference Deployment Engineer

EnCharge AI

LLM Inference Deployment Engineer deploying and scaling large language models on energy efficient AI accelerators. Working with AI frameworks and model optimization at EnCharge AI.

Posted 5/22/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSenior💰 $180,000 - $240,000 per yearWebsite

Tech Stack

Tools & technologies

DockerKubernetesPythonPyTorchTensorflow

About the role

Key responsibilities & impact

Deploy and optimize LLMs (GPT, LLaMA, Mistral, Falcon, etc.) post-training from libraries like HuggingFace
Utilize inference runtimes such as ONNX Runtime, vLLM for efficient execution.
Optimize batching, caching, and tensor parallelism to improve LLM scalability in real-time applications.
Develop and maintain high-performance inference pipelines using Docker, Kubernetes, and other inference servers.

Requirements

What you’ll need

Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field.
Experience in LLM inference deployment, model optimization, and runtime engineering.
Strong expertise in LLM inference frameworks (PyTorch, ONNX Runtime, vLLM, TensorRT-LLM, DeepSpeed).
In-depth knowledge of the Python programming language for model integration and performance tuning.
Strong understanding of high-level model representations and experience implementing framework-level optimizations for Generative AI use cases
Experience with containerized AI deployments (Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, TorchServe).
Strong knowledge of LLM memory optimization strategies for long-context applications.
Experience with real-time LLM applications (chatbots, code generation, retrieval-augmented generation).

Benefits

Comp & perks

🌐 Worldwide ❌ Jobs You've Hidden ⭐️ Saved Jobs ✅ Applied Jobs ✉️ Email Alerts 👤 Account EnCharge AI Website LinkedIn All Job Openings 11 - 50 employees Founded 2022 🤖 Artificial Intelligence 🔧 Hardware 🤝 B2B 💰 $100M Series B - EnCharge AI on 2025-02 Artificial Intelligence
Hardware
B2B EnCharge AI is a company that develops analog in-memory computing hardware and complementary software to accelerate on-device and edge-to-cloud AI workloads. Their technology includes the EN100 analog AI accelerator and other form factors (chiplets, ASICs, PCIe cards) designed to deliver much higher energy efficiency, compute density, and lower total cost of ownership for inference compared with conventional GPUs and digital accelerators. EnCharge emphasizes sustainability, data privacy through local processing, and deployment for enterprise and developer customers seeking efficient, scalable AI computation outside traditional cloud infrastructure. LLM Inference Deployment Engineer Job not on LinkedIn 🔥 8 minutes ago 🇺🇸 United States – Remote 💵 $180k - $240k / year ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Docker Kubernetes Python PyTorch Tensorflow Apply Now Find Hiring Managers Customize resume + cover letter Report problem ☆ Save ☑️ Mark as applied ❌ Hide 📋 Description
Deploy and optimize LLMs (GPT, LLaMA, Mistral, Falcon, etc.) post-training from libraries like HuggingFace
Utilize inference runtimes such as ONNX Runtime, vLLM for efficient execution.
Optimize batching, caching, and tensor parallelism to improve LLM scalability in real-time applications.
Develop and maintain high-performance inference pipelines using Docker, Kubernetes, and other inference servers. 🎯 Requirements
Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field.
Experience in LLM inference deployment, model optimization, and runtime engineering.
Strong expertise in LLM inference frameworks (PyTorch, ONNX Runtime, vLLM, TensorRT-LLM, DeepSpeed).
In-depth knowledge of the Python programming language for model integration and performance tuning.
Strong understanding of high-level model representations and experience implementing framework-level optimizations for Generative AI use cases
Experience with containerized AI deployments (Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, TorchServe).
Strong knowledge of LLM memory optimization strategies for long-context applications.
Experience with real-time LLM applications (chatbots, code generation, retrieval-augmented generation). Apply Now 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score Similar Jobs Site Reliability Engineer 🔥 6 hours ago SS&C Technologies 10,000+ employees 🏦 Banking 💳 Fintech Website LinkedIn All Job Openings Site Reliability Engineer optimizing infrastructure environments at SS&C Technologies. Collaborate with teams to enhance application reliability and drive technology improvements. 🇺🇸 United States – Remote ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) AWS Cloud Kubernetes OpenShift OpenStack Prometheus Splunk VMware Data Reliability Engineer 🔥 12 hours ago Empower 10,000+ employees 💸 Finance 💳 Fintech 👥 B2C Website LinkedIn All Job Openings Data Reliability Engineer managing the reliability, stability, and operational excellence of an AWS-based data platform. Collaborating with data teams for incident resolution and system improvements. 🇺🇸 United States – Remote 💵 $87.4k - $123.4k / year ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Amazon Redshift AWS Cloud DynamoDB Python Spark SQL Reliability Engineer 🔥 19 hours ago Jones Lang LaSalle Americas, Inc. 10,000+ employees Website LinkedIn All Job Openings Reliability Engineer at JLL providing engineering support for building operations and maintenance. Developing reliability strategies and overseeing engineering projects in a primarily remote role. 🇺🇸 United States – Remote 💵 $100k / year ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) Senior DevOps Engineer – Infrastructure 🔥 22 hours ago Button 51 - 200 ☁️ SaaS 🛍️ eCommerce 🤝 B2B Website LinkedIn All Job Openings Senior DevOps Engineer responsible for platform infrastructure management in a commerce-powered internet company. Collaborating with teams on scalable, stable, and operable solutions for business-critical systems. 🇺🇸 United States – Remote 💵 $133k - $172k / year ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor AWS Docker DynamoDB EC2 Google Cloud Platform Grafana JavaScript Node.js Prometheus Python Terraform Go DevOps Engineer – ML & Data Infrastructure 🕒 Yesterday High 5 Games 51 - 200 🎮 Gaming 🎲 Gambling 🤝 B2B Website LinkedIn All Job Openings DevOps Engineer responsible for building and optimizing cloud infrastructure for machine learning operations in gaming. Collaborating with data scientists and ML engineers to ensure reliability and performance. 🇺🇸 United States – Remote ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Ansible BigQuery Cloud Docker Google Cloud Platform Groovy Jenkins Kubernetes Python Terraform View More DevOps Jobs 🌐 Worldwide Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com Search Search Jobs by country Search jobs by city Search jobs by job title Search entry-level jobs Search junior-level jobs Search senior-level jobs Search jobs by tech stack Search jobs by contract type Search remote internships Search remote part-time jobs Remote jobs Anywhere in the World Companies Hiring Anywhere in the World Companies Hiring Sales People Anywhere in the World Companies Hiring Software Engineers Anywhere in the World Resources Advice Tips for finding remote jobs Interview questions and answers Resume examples Cover letter examples Post a job Affiliates Privacy policy Terms of service Job board SEO course AI Apply Copilot OpenClaw job finder Jobs by Country Remote jobs anywhere in the world (Worldwide remote jobs) Remote jobs United States Remote jobs Australia Remote jobs Brazil Remote jobs Canada Remote jobs France Remote jobs Ireland Remote jobs Germany Remote jobs Netherlands Remote jobs Spain Remote jobs UK Popular Jobs Remote data analyst jobs Remote customer support jobs Remote executive assistant jobs Remote marketing jobs Remote product designer jobs Remote product manager jobs Remote project manager jobs Remote recruiter jobs Remote sales jobs Remote software engineer jobs Jobs by Type Remote full-time jobs Remote part-time jobs Remote contract jobs Remote internship jobs Remote entry-level jobs Remote jobs with no experience required Remote junior jobs (1-3 years of experience) Digital nomad jobs Remote jobs with no degree required Freelance remote jobs Temporary remote jobs Remote jobs hiring now Stay at home mom jobs

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

LLM inference deploymentmodel optimizationruntime engineeringPython programminghigh-level model representationsframework-level optimizationsmemory optimization strategiesreal-time LLM applicationsbatchingtensor parallelism

Certifications

Bachelor’s degree in Computer ScienceMaster’s degree in Computer ScienceBachelor’s degree in Electrical EngineeringMaster’s degree in Electrical Engineering