Tech Stack
AnsibleAWSAzureCloudGoogle Cloud PlatformLinuxPythonTCP/IPTerraform
About the role
- Lead the definition and development of holistic test strategies, test plans, and test cases for complex data center network solutions, including Layer 2/3, SDN, DCN, EVPN/VxLAN, and high-performance fabrics (e.g., RDMA/RoCE).
- Work closely with network architecture and engineering teams to understand design specifications and translate them into effective test methodologies.
- Identify key performance indicators (KPIs) and scalability metrics for network validation.
- Hands-on execution of complex network tests, including functional, performance, scalability, stability, and reliability testing.
- Analyze test results, identify defects, conduct root cause analysis, and work with development teams to resolve issues.
- Perform deep-dive packet analysis and network protocol debugging using tools like Wireshark, tcpdump, and network analyzers.
- Design, develop, and maintain advanced test automation frameworks and scripts (Python, Ansible, Terraform, Robot Framework) to accelerate testing cycles and improve efficiency.
- Evaluate, recommend, and integrate new testing tools and technologies into CI/CD pipelines.
- Drive the adoption of best practices in test automation and continuous integration.
- Lead and mentor a small team of networking test engineers, providing technical guidance, code reviews, and fostering a culture of excellence and continuous improvement.
- Act as a technical subject matter expert for network testing within the organization and drive innovation in test methodologies and processes.
- Collaborate cross-functionally with network architecture, development, operations, and product teams to ensure comprehensive test coverage and high-quality deliverables.
- Clearly communicate test progress, risks, and results to stakeholders at all levels and contribute to documentation, runbooks, and operational guides.
Requirements
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field.
- 7+ years of experience in hardware and/or software testing, with at least 5 years focused on enterprise-level storage and server systems.
- 3+ years of experience in a lead or senior technical role, mentoring junior engineers or leading test initiatives.
- Proven experience in a lead or senior technical role, mentoring and guiding other engineers.
- Deep expertise in storage technologies including NVMe, SAS/SATA SSDs/HDDs, RAID, distributed file systems (Ceph, Lustre, GPFS), SAN, and NAS.
- Strong understanding of server architectures (x86, ARM, GPU servers), CPU/memory subsystems, PCIe, and power management.
- Knowledge of Baseband Management Controllers (BMC) functionality.
- Proficiency in scripting languages (Python, Bash) for test automation and data analysis.
- Experience with Linux operating systems (Ubuntu, CentOS, RHEL) and command-line tools.
- Familiarity with networking concepts (Ethernet, TCP/IP, InfiniBand) and network testing methodologies.
- Experience with test methodologies such as performance testing, reliability testing, stress testing, and fault injection.
- Excellent problem-solving, analytical, and debugging skills.
- Strong communication and interpersonal skills, with the ability to collaborate effectively across diverse teams.
- Preferred: Experience with SDN controllers and orchestration platforms.
- Preferred: Experience with ESXi.
- Preferred: Familiarity with cloud networking concepts (AWS, Azure, GCP).
- Preferred: Knowledge of network security principles and testing.
- Preferred: Relevant industry certifications (CCIE, JNCIE, CCNP Data Center, Linux Foundation Certified Engineer, Mellanox certifications).