Tech Stack
AWSAzureCloudDockerETLGoGoogle Cloud PlatformGrafanaKafkaKubernetesPHPPrometheusPythonPyTorchScikit-LearnTensorflowTerraform
About the role
- Lead the end-to-end ML model lifecycle: identify machine learning opportunities through data and metric analysis, define and track KPIs, and ship iterations aligned with product and business goals.
- Design, develop, and maintain ML pipelines, including feature engineering, model training, deployment, and monitoring.
- Implement scalable inference services and APIs for real-time and batch predictions.
- Improve model accuracy, inference speed, and robustness through experimentation, hyperparameter tuning, and feature optimization.
- Ensure reliability through comprehensive automated testing, observability, and reproducibility of ML experiments.
- Mentor junior engineers, lead code and model reviews, and contribute to architectural decisions and technical documentation.
- Collaborate with cross-functional teams across product, engineering, and operations and with teams in the US and Japan to deliver high-impact ML solutions at global scale.
- Architect and operate highly scalable ML services and pipelines to support rapid user and product growth in the US market.
- Leverage large language models (LLMs) and generative AI to enhance search recall, content understanding, and overall user experience.
- Optimize experimentation frameworks to accelerate product iteration and innovation.
Requirements
- Strong hands-on experience across the machine learning model life cycle: training, deployment, monitoring, and optimization.
- Practical experience leveraging computer vision and natural language processing techniques in production ML systems.
- Ability to independently analyze data and model metrics to ship measurable improvements in production systems.
- Bachelor’s degree in Computer Science, Data Science, Mathematics, or a related field (or equivalent practical experience).
- 5+ years of professional experience developing and operating large-scale ML pipelines and/or backend services in high-traffic production environments, including optimizing models for latency, scalability, and cost efficiency.
- Experience with large language models (LLMs) and generative AI, including techniques such as prompt engineering, fine-tuning, vector search integration (RAG), and responsible production deployment.
- Experience with ML frameworks and pipelines (TensorFlow, PyTorch, scikit-learn, MLflow, Kubeflow, or similar).
- Strong programming expertise in Python; familiarity with Go or PHP is a plus.
- Excellent English communication skills, with the ability to collaborate effectively across functions and regions.
- Demonstrated ability to mentor and guide junior engineers.
- Preferred: Experience deploying and scaling ML services in production environments, including cloud platforms (GCP, AWS, or Azure), containerization (Docker, Kubernetes), CI/CD, Infrastructure as Code (Terraform), and observability (Prometheus, Grafana).
- Preferred: Familiarity with data engineering practices, including feature stores, data preprocessing, ETL pipelines, large-scale data management, and real-time streaming or event-driven architectures (e.g., Kafka, Pub/Sub).
- Preferred: Domain knowledge of marketplace or e‑commerce platforms.
- Preferred: Contributions to open-source projects in ML or related areas; or public technical engagement through blogs, talks, or conferences.
- Preferred: Experience working within large, cross-functional, and geographically distributed teams.
- Language: English: Business level (CEFR B2 or higher) required.
- Language: Japanese: Basic (CEFR - A2) optional.