CrawlJobs Logo

Supercomputing Engineer (Network)

etched.com Logo

Etched

Location Icon

Location:
United States , San Jose

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

150000.00 - 275000.00 USD / Year

Job Description:

We are seeking highly motivated and skilled Supercomputing Engineers (Network) to join our team. This team plays a critical role in developing, qualifying, and optimizing high-performance networking solutions for large-scale inference workloads. As a Pod Software Engineer, you will focus on developing and qualifying software that drives communication amongst Sohu inference nodes in multi-rack inference clusters. You will collaborate closely with kernel, platform, and telemetry teams to push the boundaries of peer-to-peer RDMA efficiency.

Job Responsibility:

  • Design, develop, and implement RDMA based networking peering, supporting high bandwidth, low latency communication across PCIe nodes within and across racks
  • Develop tests that qualify host processors (x86), NICs, TORs and device network interfaces for high performance
  • Furnish burn-in teams with tests that represent both real-world use cases and workloads for device to device networking, and extreme-load stress testing
  • Define the key metrics that system software must collect to maintain high availability and performance under extreme communications workloads

Requirements:

  • Proficiency in C/C++
  • Proficiency in at least one scripting language (e.g., Python, Bash, Go)
  • Strong experience with device-to-device networking technologies (RDMA, GPUDirect, etc.), including RoCE
  • Experience with zero-copy networking, RDMA verbs and memory registration
  • Familiarity with queue pairs, completions queues, and transport types
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Ability to work independently and as part of a team
  • Experience with version control systems (e.g., Git)
  • Experience with reading and interpreting hardware logs

Nice to have:

  • Experience with networking technologies like NVLink, Infiniband, ML Pod interconnects
  • Experience with widely deployed Top of Rack Switches (Cisco, Juniper, Arista, etc.)
  • Knowledge of server virtualization
  • Experience with tracing tools like perf, eBPF, ftrace, etc.
  • Experience with performance testing and benchmarking tools (gProf, vTune, Wireshark, etc.)
  • Familiarity with hardware diagnostic tools and techniques
  • Experience with containerization technologies (e.g., Docker, Kubernetes)
  • Experience with CI/CD pipelines
  • Experience with Rust
What we offer:
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Supercomputing Engineer (Network)

Software Engineer, Collective Communication

The Workload Networking team is responsible for the collective communication sta...
Location
Location
United States , San Francisco
Salary
Salary:
380000.00 - 555000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Background in low level performance critical software
  • Experience with collective communication is a bonus
  • Have written distributed algorithms using RDMA in the past
  • Are comfortable writing low level performance sensitive CPU and/or GPU code
  • Are familiar with network simulation techniques
Job Responsibility
Job Responsibility
  • Collaborate closely with ML researchers to design and implement efficient collective operations in C++ and CUDA
  • Ensure that our largest training jobs take full advantage of the different network transports used in our supercomputers
  • Work on simulations to inform our future supercomputer network designs
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Talent Sourcer

As we scale, we’re looking for a Talent Sourcer (Supercomputing/ML) to build and...
Location
Location
United States , San Jose
Salary
Salary:
100000.00 - 220000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience sourcing technical talent in highly competitive markets
  • deep experience sourcing software, systems, infrastructure, or hardware engineers
  • highly resourceful and love finding exceptional candidates beyond obvious platforms
  • thrive in ambiguity and enjoy building sourcing engines from scratch
  • detail-oriented, organized, and operationally strong
  • care deeply about candidate experience and employer brand
  • love working in high-velocity environments with extremely high hiring bars
Job Responsibility
Job Responsibility
  • Own top-of-funnel sourcing strategy across priority engineering roles in supercomputing, ML systems, firmware, networking, and distributed systems
  • build and maintain high-quality talent pipelines through outbound sourcing, referrals, events, research, and creative outreach
  • partner closely with recruiters and hiring managers to deeply understand role requirements, ideal profiles, and search strategy
  • develop market maps for niche technical domains and continuously expand our talent network
  • run high-volume, high-signal outbound campaigns with thoughtful personalization
  • track sourcing performance, conversion rates, and funnel health
  • continuously experiment with new sourcing channels, tools, and techniques
  • deliver a best-in-class candidate experience from first touch onward
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • housing subsidy of $2k per month for those living within walking distance of the office
  • relocation support for those moving to San Jose (Santana Row)
  • various wellness benefits covering fitness, mental health, and more
  • daily lunch + dinner in our office
  • Fulltime
Read More
Arrow Right

Strategic Finance Compute Lead

Compute is a key lever for OpenAI and AI progress. We are seeking a Strategic Fi...
Location
Location
United States , San Francisco
Salary
Salary:
185000.00 - 260000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience across strategic finance, private / growth equity, investment banking, strategy & operations, and/or business development with 3+ years of finance operating experience at a high-growth technology company
  • Experience partnering with engineering and product teams to provide financial analysis and insights to critical strategic decisions
  • Good understanding of cloud technology and compute infrastructure
  • Exceptionally strong analytical, financial modeling, and written and oral communication skills
  • Demonstrated track record of thoughtful investment decisions
  • Experience driving operational outcomes under ambitious deadlines
  • Exceptionally strong relationship building, business judgment, and communication skills
  • Bachelor’s degree or equivalent practical experience
Job Responsibility
Job Responsibility
  • Own and develop financial models across different elements of compute (GPUs, CPUs, storage and networking)
  • Lead strategic financial analysis for long-term capacity initiatives, working closely with scaling and supercomputing engineering teams
  • Maintain deep expertise on compute contract terms, pricing structures and optimization opportunities
  • Serve as a partner to FP&A and strategic finance teams, aligning compute and infrastructure with broader financial and business strategies
  • Create high-quality Exec and Board-facing presentations
  • Stay abreast of market trends and competitive dynamics to inform and improve our infrastructure strategy
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Supercomputing Engineer

Etched is building at-scale AI systems that will unlock faster, more efficient i...
Location
Location
United States , San Jose
Salary
Salary:
200000.00 - 275000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong proficiency in C/C++ or Rust for low-level systems programming
  • Deep understanding of Linux internals, kernel/user-space boundaries, and system-level debugging
  • Experience working close to hardware: drivers, DMA, interrupts, memory management, or device control paths
  • Strong debugging skills using logs, tracing, and low-level observability tools
  • Strong communication skills and comfort collaborating across hardware and software teams
Job Responsibility
Job Responsibility
  • Architect and implement low-level control-plane software responsible for system bring-up, configuration, and management of cluster-scale AI compute deployments
  • Build system services that interact directly with hardware, firmware, and the operating system
  • Develop telemetry, logging, and tracing infrastructure for diagnosing failures and driving performance improvements
  • Implement orchestration primitives for managing devices, nodes, and racks
  • Profile and tune performance across PCIe, memory, networking, kernel, and runtime layers
  • Collaborate closely with hardware, firmware, kernel, and runtime teams to co-design system interfaces and behavior
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • Fulltime
Read More
Arrow Right

Industrial Design Intern

The industrial Design Intern will be responsible for supporting staff Industrial...
Location
Location
United States , Spring
Salary
Salary:
35.00 - 40.25 USD / Hour
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently pursuing a Bachelors Degree in Industrial Design, having completed Junior year
  • Possess passion for Industrial Design and ability to understand project scope and articulate design details
  • Understanding and use of 2-D and 3-D CAD tools and software packages
  • Creo and Keyshot preferred but not required
  • Adobe Suite
  • Ability to apply analytical and problem-solving skills across complex technical programs
  • Understanding and experience in implementing and creating sketches, renderings, 3D models and prototypes
  • Strong written and verbal communication skills and able to communicate with other disciplines such as mechanical engineers and product marketing
Job Responsibility
Job Responsibility
  • Support staff Industrial Designers and Human Factors in researching, creating, and developing concepts and specifications
  • Design portions of the industrial design for physical hardware products and systems
  • Support more senior Industrial Designers on new design projects creating artifacts, such as artwork, sketches, renderings and 3D prototypes
  • Collaborate with local and international team members providing timely support across a range of programs
  • Support multiple programs across multiple business units including servers, storage, networking and supercomputing
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive suite of benefits supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right

Supercomputing Intern

Our supercomputing role focuses on the design, development, and deployment of ML...
Location
Location
United States , San Jose
Salary
Salary:
Not provided
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Progress towards a Bachelor’s, Master’s, or PhD degree in Computer Science, Engineering, or a related technical field
  • Proficiency in C/C++ or Rust
  • Proficiency in Python
  • Strong fundamentals in data structures and algorithms
  • Strong understanding of low-level software engineering
  • Strong understanding of hardware/software co-design
  • Excellent communication and collaboration skills
Job Responsibility
Job Responsibility
  • Design, development, and deployment of ML system software required for operating rack-scale systems
  • Work spanning network performance, telemetry creation and processing pipelines, and analysis of system-level health and performance
  • Deployment and provisioning of software frameworks and hardware validation
  • Maintaining secure and performant systems for data center scale ML workloads
What we offer
What we offer
  • 12-week paid internship
  • Generous housing support for those relocating
  • Daily lunch and dinner in our office
  • Direct mentorship from industry leaders and world-class engineers
  • Opportunity to work on one of the most important problems of our time
  • Fulltime
Read More
Arrow Right

Signal Integrity Engineer

OpenAI’s Hardware organization develops silicon and system-level solutions desig...
Location
Location
United States , San Francisco
Salary
Salary:
225000.00 - 445000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • At least 10 years of industry experience
  • Experience design hardware system and SerDes testing for data center applications
  • Experience and good knowledge of system design experience in the SI areas, from chip, SerDes, board, rack level
  • Experience with PCB, connector and cable design
Job Responsibility
Job Responsibility
  • Lead system signal integrity (SI) design for AI supercomputer product in the data center application
  • Collaborate with chip, package, boards, rack and system engineers, design partners to drive system SI design and develop innovative interconnect and high-speed technologies
  • Identify and evaluate new technologies and methodologies to improve signal and power integrity in product design, and contribute to the development of new products and technology by providing expertise in signal integrity
  • Perform simulation and modeling to identify and troubleshoot signal integrity issues
  • Lead system interconnect design, bring up and qualification
  • As the scope of the role and team grows, understand and influence roadmaps for hardware partners for our datacenter networks, racks, and buildings
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Component and Product Quality Engineer, Interconnects

OpenAI's Hardware organization builds supercompute platforms from silicon and bo...
Location
Location
United States , San Francisco
Salary
Salary:
123000.00 - 285000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in quality engineering, manufacturing quality, supplier quality, or reliability for interconnects or high-speed hardware used in servers, networking, storage, or high-performance compute systems
  • Hands-on experience with high-speed copper interconnect products: connectors and/or cable assemblies
  • Strong command of problem-solving and quality tools: 8D, 5-Whys, Fishbone, PFMEA/control plans, SPC/MSA (gauge R&R), and change control
  • Ability to read and interpret mechanical drawings, GD&T basics, and electrical/interface specifications
  • Experience driving supplier/CM improvements (audits, scorecards, CAPA) and managing nonconformance/MRB workflows
  • Clear written and verbal communication skills
  • ability to drive alignment across internal teams and external partners
  • Experience with cable manufacturing and assembly processes (wire treatment, resistance welding/laser welding, crimping, overmolding/injection molding, braiding/shielding, plating, and automated test)
  • Ability to travel internationally and work effectively across time zones with ODM/JDM and supplier partners
  • To comply with U.S. export control laws and regulations, candidates for this role may need to meet certain legal status requirements as provided in those laws and regulations.
Job Responsibility
Job Responsibility
  • Own end-to-end quality for high-speed interconnect hardware across the product lifecycle: early design influence, supplier/contract manufacturer readiness, qualification, ramp, and fleet quality in lab and data center environments
  • Be the quality lead for advanced interconnect components and assemblies, including high-speed copper cables, cable cartridges, patch panels, backplane/cable-backplane solutions, high-speed connectors, and related electro-mechanical interfaces
  • Partner closely with electrical, mechanical, SI/PI, systems, reliability, operations, and external vendors to prevent escapes and drive rapid, data-driven containment and corrective action
  • Drive quality-by-design: participate in design reviews, DFM/DFx, tolerance stacks, material and plating selections, connector mating strategy, strain relief, and assembly methods to reduce variation and field failures
  • Define and track quality and reliability metrics (DPPM, yield, escapes, RMA/FRACAS trends, Cpk/Ppk where applicable) for interconnects across NPI and mass production
  • Build and execute qualification strategies for cables/connectors/patch panels (mechanical, environmental, electrical, and reliability), including test coverage, sample plans, clear pass/fail criteria, defining installation criteria and processes, optics termination quality management and setting fiber standards criteria
  • Partner with engineering and operations to drive smooth ramp: risk assessments, pilot build learnings, change control, and readiness reviews (EVT/DVT/PVT/MP or equivalent phases)
  • Own supplier and CM performance management: scorecards, audits (process and quality system), and follow-up to close findings with verified effectiveness
  • Work with suppliers to improve manufacturing throughput, stability, and yields for cable and connector assembly processes
  • Lead rapid containment and root-cause investigations for failures found during bring-up, system integration tests, reliability testing, and fleet deployments
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right