Articul8 AI

Senior Software Development Engineer in Test, Chaos Engineering Specialist

Articul8 AI

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Manual Apply

Job Level

Senior

Tech Stack

AWSAzureCloudDistributed SystemsGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonRust

About the role

  • Design, develop, and maintain advanced test automation frameworks that incorporate chaos engineering principles
  • Create and execute chaos experiments that simulate various failure modes and edge cases in our distributed systems
  • Implement monitoring solutions that effectively track system performance, resilience, and failure recovery
  • Establish observability practices that provide deep insights into system behavior during chaos experiments
  • Collaborate with development teams to build resilience into our applications from the ground up
  • Develop metrics and dashboards to visualize system reliability and the impact of chaos experiments
  • Lead post-mortem analyses to identify system weaknesses discovered through chaos testing
  • Integrate chaos testing into CI/CD pipelines to validate system resilience continuously
  • Mentor engineers through code reviews, technical sessions, and hands-on guidance in test automation, chaos engineering, and monitoring best practices
  • Contribute to the company's overall testing strategy and quality assurance practices

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field
  • 5+ years of experience in software testing and quality assurance, with at least 2 years focused on chaos engineering
  • Strong programming skills in languages such as Python, Go, and/or Rust
  • Experience with chaos engineering tools such as Chaos Monkey, Gremlin, or similar frameworks
  • In-depth knowledge of monitoring systems like Prometheus, Grafana, ELK Stack, or similar tools
  • Experience implementing observability practices (metrics, logging, tracing) in distributed systems
  • Familiarity with container orchestration platforms like Kubernetes and related chaos tools
  • Experience with SRE practices and principles
  • Strong understanding of CI/CD pipelines and how to integrate testing workflows
  • Experience with cloud platforms (AWS, GCP, Azure) and their monitoring capabilities
  • Excellent communication skills with the ability to present technical findings to various stakeholders
  • Master’s degree in Computer Science, Engineering, or related field (preferred)
  • Knowledge of statistical analysis for evaluating test results and system performance (preferred)
  • Experience with distributed systems and microservice architectures (preferred)
  • Contributions to open-source testing or chaos engineering projects (preferred)
  • Familiarity with AI/ML systems and their unique testing challenges (preferred)
  • Relevant certifications in cloud platforms, testing methodologies, or chaos engineering (preferred)