Tech Stack
AWSAzureCloudDockerGoogle Cloud PlatformJavaKubernetesMySQLOraclePostgresPythonSplunkSQL
About the role
- Ensure performance, reliability, and scalability of production systems
- Identify and resolve performance bottlenecks across application code, JVM, databases, and infrastructure
- Optimize SQL queries and JVM configurations; recommend application-level enhancements
- Collaborate with development, QA, and infrastructure teams to integrate performance improvements into deployment pipelines
- Monitor system health using Splunk, Dynatrace, Extrahop, Foglight, Wireshark; develop dashboards, alerts, and automated reporting
- Participate in client onboarding by reviewing usage patterns and performing capacity planning
- Provision and optimize infrastructure resources to maintain high availability and performance
- Lead incident response, triage, root cause analysis, and post-incident reviews to improve resilience
Requirements
- 7+ years of hands-on experience in performance engineering, site reliability engineering, or a closely related technical discipline
- Advanced proficiency in SQL and database performance tuning (e.g., Oracle, PostgreSQL, MySQL)
- Deep understanding of JVM internals, memory management, and garbage collection strategies
- Strong programming skills in Java and scripting languages such as Python or Bash
- Familiarity with cloud platforms (AWS, Azure, GCP) and container orchestration technologies (e.g., Kubernetes, Docker)
- Demonstrated experience with Splunk, Dynatrace, Extrahop, Foglight, and Wireshark in production environments
- Exceptional analytical, problem-solving, communication and collaboration skills
- Bachelor’s degree in computer science, Engineering, or related field