Senior Networking Solution Test Engineer - AI Cluster Debugging

Jobgether • Switzerland
Relocation
Apply
AI Summary

Jobgether is seeking a Senior Networking Solution Test Engineer to ensure the reliability and performance of complex AI clusters. This role requires strong debugging intuition and the ability to reproduce and analyze real-world customer scenarios. The ideal candidate will have experience in Linux-based networking, system testing, and complex debugging environments.

Key Highlights
Design and review test strategies for AI cluster systems
Build and maintain realistic test environments
Lead end-to-end system debugging across hardware, firmware, and AI software layers
Key Responsibilities
Design and review test strategies and product requirements for NVLink, Ethernet, and InfiniBand-based AI cluster systems
Build and maintain realistic, large-scale test environments replicating customer-like AI infrastructure
Lead end-to-end system debugging across hardware, firmware, networking, and AI software layers
Technical Skills Required
Linux networking tools Linux debugging utilities C/C++ Python Bash Ansible NCCL RoCE RDMA
Benefits & Perks
Competitive compensation
Opportunity to work on cutting-edge AI cluster and high-performance networking technologies
Exposure to large-scale systems powering advanced AI training and inference workloads

Job Description


This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Networking Solution Test Engineer - AI Cluster Debugging in Switzerland.

This role sits at the forefront of large-scale AI infrastructure validation, where networking, systems engineering, and artificial intelligence workloads converge. You will be responsible for ensuring the reliability and performance of complex AI clusters built on high-speed interconnect technologies such as NVLink, Ethernet, and InfiniBand. Working in a highly technical and collaborative environment, you will investigate deep system-level issues spanning hardware, drivers, networking stacks, and AI frameworks. The position requires strong debugging intuition and the ability to reproduce and analyze real-world customer scenarios in advanced test environments. You will contribute directly to the stability and scalability of next-generation AI training and inference systems used at massive scale. This is a hands-on engineering role where your analysis and findings directly shape product quality and system performance.

Accountabilities

  • Design and review test strategies and product requirements for NVLink, Ethernet, and InfiniBand-based AI cluster systems.
  • Build and maintain realistic, large-scale test environments replicating customer-like AI infrastructure, including heterogeneous hardware and software stacks.
  • Lead end-to-end system debugging across hardware, firmware, networking, and AI software layers to identify and resolve root causes.
  • Analyze logs, inspect source code, and validate fixes across components such as NICs, DPUs, switches, and AI communication libraries.
  • Collaborate closely with development teams to debug and optimize protocols such as NCCL, RoCE, and RDMA.
  • Define, design, and guide automation efforts for robust testing frameworks producing actionable logs, metrics, and traces.
  • Execute regression, performance, functional, and scalability testing, and deliver clear, data-driven technical reports.
  • Profile and benchmark AI training and inference workloads, correlating application behavior with system and network performance metrics.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or equivalent hands-on experience in systems/network engineering.
  • 8+ years of experience in Linux-based networking, system testing, and complex debugging environments.
  • Strong expertise in Linux networking tools and debugging utilities (e.g., tcpdump, ethtool, iproute2, perf).
  • Proven experience in production-grade troubleshooting, hypothesis-driven debugging, and root cause analysis under pressure.
  • Solid understanding of NIC architecture, offloads, queue management, and driver/firmware interactions.
  • Deep knowledge of AI networking technologies such as NCCL, RoCE, and RDMA.
  • Ability to read, understand, and debug source code in C/C++, Python, or similar languages.
  • Strong scripting and automation skills using Bash, Python, and/or Ansible.
  • Experience working in fast-evolving technical environments with strong adaptability and learning ability.
  • Excellent analytical, communication, and collaboration skills with strong ownership mindset.

Benefits

  • Competitive compensation aligned with senior-level expertise and Swiss market standards.
  • Opportunity to work on cutting-edge AI cluster and high-performance networking technologies.
  • Exposure to large-scale systems powering advanced AI training and inference workloads.
  • Highly technical, research-driven engineering environment with strong innovation focus.
  • Collaborative international team working on next-generation infrastructure challenges.
  • Access to complex, large-scale test environments and advanced debugging tools.
  • Inclusive workplace culture supporting diversity, equity, and professional growth.
  • Relocation and accommodation of accessibility needs where applicable.

How Jobgether Works

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.


Similar Jobs

Explore other opportunities that match your interests

Quality & Technical Leadership Lead

Testing
•
11h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Cricut

United State

Mechanical Fluid Engineer

Testing
•
1d ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Blue Origin

United State

Command Systems Lab Test Manager 3

Testing
•
1d ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Northrop Grumman

United State

Subscribe our newsletter

New Things Will Always Update Regularly