Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
In Tandem

AI Engineer

In Tandem

AI Engineer optimizing AI infrastructure for family technology applications. Building and running AI features on dedicated GPU hardware to improve efficiency and performance.

Posted 6/16/2026full-timeRemote • Minnesota • 🇺🇸 United StatesMid-LevelSenior💰 $100,000 - $135,000 per yearWebsite

Tech Stack

Tools & technologies
AWSDockerPython

About the role

Key responsibilities & impact
  • Run and optimize our self-hosted inference stack
  • Run the inference serving layer on our own GPU hardware: choose and tune the serving stack (vLLM, SGLang, TensorRT-LLM) for high throughput and low latency.
  • Optimize aggressively: tensor parallelism, quantization (FP8, AWQ, GPTQ), KV-cache and prefix caching, continuous batching, speculative decoding, concurrency tuning.
  • Serve multiple models and features off shared hardware: multi-LoRA, routing, and request scheduling that balances internal workloads against latency-sensitive product traffic.
  • Keep our AI fast, efficient, and observable
  • Make our AI workloads efficient: improve latency, throughput, and GPU utilization so we get the most out of what we run.
  • Build the visibility: instrument performance and usage across our AI surfaces so there's clear data on how everything is running.
  • Surface the technical tradeoffs (performance, latency, efficiency) so the people making the calls have what they need to make them.
  • Build AI features and proactive agents
  • Ship the in-app agent layer that helps families coordinate: proactive nudges, smart suggestions, agents that summarize, draft, schedule, and act for busy parents.
  • Build the substrate underneath: tools, memory, orchestration, guardrails, and evaluation harnesses, integrated cleanly with production APIs alongside our architecture team.
  • Work in nimble pairs with feature owners, standing up whatever's needed to test an idea, including a vibe-coded UI when that's the fastest path to a real customer. Ship rough, learn fast, harden what works.

Requirements

What you’ll need
  • 5+ years shipping production software, including meaningful applied AI or ML work.
  • Demonstrated experience running and optimizing self-hosted LLMs on dedicated multi-GPU hardware: a serving stack (vLLM, SGLang, or TensorRT-LLM) and the optimization that comes with it (tensor parallelism, quantization, batching, KV cache).
  • A track record of optimizing inference performance and efficiency (latency, throughput, GPU utilization).
  • Strong Python and engineering fundamentals, with the full-stack range to stand up a quick UI, and the genuine desire to work app-layer features and not only infra.
  • Hands-on with agent frameworks (Claude Agent SDK, LangGraph, or similar), LLM APIs, embeddings, and RAG.
  • Comfortable with AWS and the devops this role owns: Docker, CI/CD, monitoring, and observability.
  • Experience building internal tooling or platforms others depend on. Bonus for Slack apps, MCP, or agent orchestration at team scale.

Benefits

Comp & perks
  • Medical: In Tandem pays 100% of the premium for employees AND 99% for all additional family members
  • 401k: Up to a 4% match with immediate vesting
  • Paid leave for all new parents
  • Learning & Development stipend for employees
  • Paid Time Off: 11 Holidays + Winter Break (3 Days) + Volunteer Time Off (1 Day) + Floating Holiday (1 Day)
  • Personal Time Off: 15 days for 0-1 years of employment, 20 days 1-3 years of employment
  • Supportive and flexible working environment – work from anywhere!

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Pythontensor parallelismquantizationinference performance optimizationlatency optimizationthroughput optimizationGPU utilizationagent frameworksLLM APIsembeddings
Soft Skills
collaborationproblem-solvingadaptabilitycommunicationcreativity