
ML Platform Engineer – MLOps
Docuvera
full-time
Posted on:
Location Type: Hybrid
Location: Wellington • 🇳🇿 New Zealand
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
AWSDockerDynamoDBKubernetesNeo4jPostgresPythonSQLTerraform
About the role
- You’ll design, build, and run the AI infrastructure that powers Docuvera’s enterprise AI and our AI-driven content management platform.
- Day to day, you’ll turn machine-learning models into reliable products, build scalable data pipelines, and provide the foundation for AI-powered workflows.
- You’ll lead MLOps practices, stand up vector databases and knowledge graphs, and work closely with data scientists and engineers to deploy and monitor models in production.
- Your work enables key programs like company-wide knowledge assistants, AI-assisted bug triage, and intelligent content curation.
- You’ll automate ML workflows, keep AI services highly available, and ensure everything meets life-sciences regulations (FDA 21 CFR Part 11, GxP).
Requirements
- Some or all of these technical skills, experience and knowledge AWS AI/ML & data: Bedrock, SageMaker, Lambda, S3, Aurora PostgreSQL (Aurora/RDS), DynamoDB, EventBridge, SQS, EKS, ECS, Neptune, OpenSearch; hands-on architecture and cost tuning.
- Relational data & PostgreSQL: Strong SQL, schema design, indexing/partitioning, query tuning, connection management (RDS Proxy), HA/DR (Multi-AZ, read replicas, PITR), CDC/outbox patterns.
- MLOps platforms: MLflow, Kubeflow, SageMaker Pipelines for lifecycle management, experiment tracking, and automated deployments.
- Event-driven systems: EventBridge (rules, schedules, schema registry) and SQS (FIFO/Standard, DLQs, ordering/deduplication) for loosely coupled services at scale.
- Vector search & RAG: Implementing and tuning Pinecone/Milvus/Weaviate/OpenSearch and embedding workflows in production RAG systems.
- Data pipelines: Real-time ingestion with Glue, Kinesis, Lambda; integrating enterprise APIs/webhooks; EventBridge buses and SQS workers for reliable, idempotent processing.
- Containers & Kubernetes: Docker, EKS, and serverless model serving; autoscaling for AI workloads.
- Graph databases: Neptune or Neo4j with Gremlin/Cypher/SPARQL.
- Programming & automation: Python/Bash and IaC (CDK, Terraform, CloudFormation).
- Model operations: Deploying/monitoring LLMs, embeddings, and custom ML models with performance optimization.
- Enterprise integration: Model Context Protocol (MCP), API gateways, and connectors for systems like Confluence, Jira, and SharePoint.
- Observability & resilience: CloudWatch/New Relic dashboards, SLOs/SLIs, synthetic checks; queue latency/depth alerts; EventBridge failure handling; DB health and slow-query monitoring.
- AI governance: Model risk management, validation frameworks, and compliance logging for regulated AI apps.
Benefits
- offering a digital first, fully flexible working style.
- We've embraced asynchronous, hybrid working as a norm.
- modern tools and systems, with a big focus on our use of AI
- a focus on personal growth with career, learning, and development tools available, plus some dedicated ‘tools down’ personal development time in NZ,
- an additional week of paid leave;
- staff appreciation leave at Christmas plus a day off for your birthday,
- all within in a tight knit, supportive, and inclusive global community.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
machine learningdata pipelinesMLOpsSQLPythonautomationmodel operationsevent-driven systemsgraph databasescontainers
Soft skills
leadershipcollaborationcommunicationproblem-solvingorganizational skills