Tech Stack
AnsibleAWSAzureCloudDistributed SystemsDockerGrafanaJavaJenkinsKubernetesNode.jsPrometheusPythonSDLCShell ScriptingSplunkTerraform
About the role
- Fidelity Asset Management Solutions (FAMS) team is looking for a hardworking, highly motivated Cloud Engineer with strong experience of DevOps, SRE to deliver CI/CD pipelines, support developers, enable services at high scale, high availability with resilience by using automation and Infrastructure Code.
- We build reliability into our ecosystem by applying best practices in Resiliency Engineering, Automation, Observability & Chaos Testing.
- The team comes from diverse technical backgrounds, and the responsibilities provide the opportunity for a variety of challenges.
- Ideal candidates will have a background in Cloud, DevOps engineering and SRE.
- We are looking for a Systems Thinking, SRE Engineer who has helped teams build CI/CD pipelines, scale through production insights, operational automation, developer guidance, real-time metrics, automation, automation and automation!
Requirements
- Overall experience of 8+ years with a bachelor’s degree or higher in a technology related field (e.g. Engineering, Computer Science, etc.) required
- 5+ years of hands-on experience with advanced Continuous Integration & Continuous Delivery (Jenkins), deploying and/or supporting highly distributed multi-tiered systems at scale
- 1-2 years of experience in Cloud development (AWS) and migration skills; Experience with building and operating highly resilient platforms in AWS cloud environments
- 2-4 years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation
- Hands-on experience with container orchestration, preferably with Kubernetes
- Experience operating and implementing distributed & highly concurrent service-based
- Ability to automate with various scripting languages (Python, Shell scripting)
- Experience with developing advanced Continuous Integration & Continuous Delivery (CI/CD) pipeline including software configuration management, test automation, version control, static code analysis using (Jenkins, Ansible, Docker).
- Experience managing systems using infrastructure as code tools (Terraform, AWS CloudFormation, Ansible)
- Solid understanding of Cloud Computing (AWS) and DevOps concepts including CI/CD pipelines
- Hands-on Kubernetes skills and knowledge.
- Hands on experience with one or more observability tools (Datadog, Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, etc…)
- Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale
- Proven experience in maintaining scalability and resiliency of complex environment.
- Proven experience in implementing advanced observability practices and techniques at scale with ability to utilize modern monitoring tools (DataDog, Prometheus, Splunk, …)
- Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale
- Ability to triage, execute root cause analysis, and be decisive under pressure
- Proficient communication skills with an ability to reach both technical and non-technical audience
- Ability to learn new software, method and practices and bringing them to our developers
- Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner and build and maintain effective relationships
- Demonstrated expertise in Production support, including incident management, root cause analysis, real-time monitoring, and high availability and reliability of critical systems in a fast-paced environment