
AI Platform Ops Engineer
One New Zealand
full-time
Posted on:
Location Type: Hybrid
Location: New Zealand
Visit company websiteExplore more
About the role
- Lead the evaluation, approval, rollout, rollback, drift detection, and deprecation of AI services.
- Manage models and configurations as code with versioning, test coverage, and audit trails.
- Define and implement guardrails for toxicity, PII, policy violations, hallucination control, and grounding.
- Conduct red-teaming exercises and containment strategies, and report on safety posture.
- Align support structures, on-call coverage, escalation paths, and playbooks with Service Management and partners.
- Ensure runbooks are current, usable, and effective for BAU and after-hours support.
- Collaborate with DataOps on feature and embedding stores, retrieval/RAG patterns, lineage, and SLAs to ensure reliability and cost-efficiency.
- Enforce tagging to surface product-level costs, monitor budget variances and anomalies, and drive safe optimisations such as caching, batching, and model sizing.
- Maintain up-to-date standards, diagrams, evaluation reports, and runbooks to ensure clean handovers and reduce single points of failure.
Requirements
- Proven experience managing the full lifecycle of AI models and agents — from evaluation and approval to rollout, rollback, drift detection, and deprecation.
- Skilled in prompt/config/version management as code, RAG patterns, vector/feature stores, and runtime monitoring for latency, quality, and safety.
- Hands-on experience with AWS services such as Bedrock and SageMaker (or equivalents), serverless runtimes, event-driven architecture, CI/CD pipelines, observability tools (CloudWatch/OpenTelemetry), and secure connectivity.
- Practical knowledge of building and operating agentic experiences using Agentforce, integrating with Salesforce workflows (Service, Sales, Knowledge), and aligning runtime signals (safety, performance, cost) to business outcomes.
- Strong understanding of data ingestion, identity resolution, harmonisation, segmentation, activation, and governance — with a focus on how Data Cloud supports AI/RAG use cases and downstream observability.
- Proficient in Terraform-first provisioning across AWS, Snowflake, and Salesforce integrations.
- Experience with HashiCorp Vault for secrets management, PKI, dynamic credentials, policy-as-code, automated rotation, and audit trails.
- Deep knowledge of SLIs/SLOs, tracing, logging, alerting, incident response, post-incident reviews, change automation, rollback strategies, and resilience patterns.
- Comfortable partnering with Support, Service Management, and vendors for BAU and after-hours operations.
Benefits
- A fully subsidised Southern Cross health insurance cover for you and your family.
- Laptop, unlimited data plan and a market leading mobile phone.
- Lifestyle leave, giving you the option to purchase an extra week or two of annual leave.
- Discounts on One New Zealand products, services and much more!
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AI model lifecycle managementprompt managementversion management as codeRAG patternsruntime monitoringdata ingestionidentity resolutionTerraform provisioningsecrets managementSLIs/SLOs
Soft Skills
collaborationcommunicationproblem-solvingleadershiporganizational skills