CrawlJobs Logo

Supercomputing Test Software Engineer

etched.com Logo

Etched

Location Icon

Location:
Taiwan , Taipei

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking highly motivated and detail-oriented Software Engineers to join our Supercomputing Testing team. This team plays a critical role in ensuring the reliability and stability of our highest-performance Inference server hardware and software. As a Software Engineer on this team, you will design, develop, and execute comprehensive supercomputing test suites, analyze test results, and collaborate with hardware and software engineering teams at Etched and our ODM partners to identify and resolve potential issues. You will be at the forefront of ensuring our server products meet the highest quality standards before they reach our customers.

Job Responsibility:

  • Design, develop, and implement automated supercomputing test suites using common scripting languages (Python, Go, Bash) and test frameworks across all aspects of System Operation including: boot sequences, root-of-trust, system management, workload deployment and performance
  • Execute tests on server hardware, monitor system performance and health, and analyze test results
  • Investigate and debug hardware and software failures identified during testing, providing detailed reports and mitigation plans
  • Collaborate with internal and external hardware and software engineering teams to identify root causes of failures and implement corrective actions
  • Contribute to the development and maintenance of the supercomputing testing infrastructure, including portable test environments and automation tools runnable in any environment
  • Create and maintain comprehensive documentation for test plans, test cases, and test results
  • Analyze system performance metrics to identify potential bottlenecks and areas for optimization
  • Participate in continuous improvement efforts to enhance the efficiency and effectiveness of the testing process

Requirements:

  • Proficiency in at least one scripting language (e.g., Python, Bash, Go)
  • Experience with software testing methodologies and tools
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Ability to work independently and as part of a team
  • Experience with version control systems (e.g., Git)
  • Experience with reading and interpreting hardware logs

Nice to have:

  • Experience with hardware burn-in testing or reliability testing
  • Experience with performance testing and benchmarking tools
  • Familiarity with hardware diagnostic tools and techniques
  • Experience with CI/CD pipelines
  • Knowledge of low level hardware communication protocols (i2c, etc.)
  • Experience with data analysis tools and techniques
What we offer:
  • Competitive compensation packages including generous equity packages
  • Comprehensive insurance coverage and other top-of-market benefits

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Supercomputing Test Software Engineer

New

Supercomputing Engineer (Test)

We are seeking highly motivated and detail-oriented Supercomputing Engineer (Tes...
Location
Location
United States , San Jose
Salary
Salary:
150000.00 - 275000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in at least one scripting language (e.g., Python, Bash, Go)
  • Experience with software testing methodologies and tools
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Ability to work independently and as part of a team
  • Experience with version control systems (e.g., Git)
  • Experience with reading and interpreting hardware logs
Job Responsibility
Job Responsibility
  • Test Development: Design, develop, and implement automated burn-in test suites using common scripting languages (Python, Go, Bash) and test frameworks across all aspects of System Operation including: boot sequences, root-of-trust, system management, workload deployment and performance
  • Test Execution: Execute burn-in tests on server hardware, monitor system performance and health, and analyze test results
  • Failure Analysis: Investigate and debug hardware and software failures identified during testing, providing detailed reports and mitigation plans
  • Collaboration: Collaborate with internal and external hardware and software engineering teams to identify root causes of failures and implement corrective actions
  • Test Infrastructure: Contribute to the development and maintenance of the burn-in testing infrastructure, including portable test environments and automation tools runable in any environment
  • Documentation: Create and maintain comprehensive documentation for test plans, test cases, and test results
  • Performance Analysis: Analyze system performance metrics to identify potential bottlenecks and areas for optimization
  • Continuous Improvement: Participate in continuous improvement efforts to enhance the efficiency and effectiveness of the burn-in testing process
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • Fulltime
Read More
Arrow Right
New

Supercomputing Software Engineer

We are seeking a highly skilled and motivated Supercomputing Software Engineer t...
Location
Location
Taiwan , Taipei
Salary
Salary:
Not provided
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in C/C++ or Python
  • Strong understanding of BIOS and BMC firmware architectures
  • Experience with server boot processes
  • Knowledge of root-of-trust and security principles
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Experience with advanced system logging and diagnostic tools
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Experience with version control systems (e.g., Git)
  • Experience with reading and interpreting hardware logs
Job Responsibility
Job Responsibility
  • Integrate and maintain BIOS and BMC firmware, ensuring robust and efficient server boot processes
  • Measure and Tune System Performance Configuration: Analyze DRAM timings, PCIe configurations, power state transitions etc. to ensure high performance and maximal reliability
  • Root of Trust and Security: Validating security features, including root of trust mechanisms, to protect system integrity and data security
  • Advanced System Logging and Diagnostics: Design and implement advanced system logging and diagnostic capabilities to facilitate efficient troubleshooting and performance analysis
  • Data Center Orchestration Integration: Integrate and optimize node-level data center orchestration technologies, such as Kubernetes and Docker, into the system software stack
  • System Validation and Testing: Develop and execute comprehensive test plans to validate system software functionality, stability, and performance
  • Collaboration and Troubleshooting: Collaborate with hardware and software teams to diagnose and resolve complex system-level issues
What we offer
What we offer
  • Competitive compensation packages including generous equity packages
  • Comprehensive insurance coverage and other top-of-market benefits
  • Fulltime
Read More
Arrow Right

Software Engineer II

Microsoft Azure Artificial Intelligence/High Performance Computing (AI/HPC) team...
Location
Location
United States , Multiple Locations
Salary
Salary:
100600.00 - 199000.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • Be proactive and innovative about adding new metrics for monitoring the health of the supercomputers
  • Collaborate with team members and stakeholders to understand requirements and produce detailed, data-driven, collaborative design for assigned features
  • Independently uses appropriate artificial intelligence tools and practices across the software development lifecycle to develop, test, debug, and maintain code for Supercomputer health monitoring systems
  • Remain current in skills by investing time and effort into staying abreast of current developments that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale
  • Act as a Designated Responsible Individual (DRI) working on-call to monitor system/product feature/service for degradation, downtime, or interruptions and gain approval to restore system/product/service for simple problems
  • Fulltime
Read More
Arrow Right
New

Principal Software Engineer

Microsoft Azure High Performance Computing & AI Engineering (HPC & AI Eng) team ...
Location
Location
United States , Multiple Locations
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python - OR equivalent experience
  • 5+ years hands on experience designing and developing high volume low latency pipelines using products such as AzPubSub, Event Hubs, Azure Stream Analytics, Kafka, Grafana, Event Hubs, Prometheus or equivalent products
  • 3+ years of experience with one of AI/HPC system management OR High-Speed Networks OR HPC Storage OR managing Cloud Infrastructure
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Architect, design and develop high volume low latency end to end event pipelines that can provide first-to-know-insights on events causing job interrupts and job reliability
  • Conduct analysis of existing event pipelines to evaluate fidelity, granularity and latency of critical events
  • Contribute to improving key metrics such as Job Mean Time to Interrupt, Nodes in Service, Mean Time to Resolve on flagship supercomputers by enabling data scientists and domain experts to use the telemetry to identify events & issues at the intersection of datacenter and hardware, develop hypothesis, conduct A/B tests and synthesize results
  • Partner with cross organizational teams to evaluate available telemetry and latency drive architecture, design, development and deployment of end-to-end solutions to manage core infrastructure including current & next generation datacenter, IT hardware, power & cooling technologies
  • Drive engineering and operational excellence based on issues and learnings from strategic customers on their usage scenarios to improve product features and capabilities
  • Partner with teams on continuous learning and continuous improvement programs by leading the resolution of complex incidents, driving root cause analyses and championing initiatives to minimize future customer impact
  • Fulltime
Read More
Arrow Right
New

Supercomputing Engineer (Network)

We are seeking highly motivated and skilled Supercomputing Engineers (Network) t...
Location
Location
United States , San Jose
Salary
Salary:
150000.00 - 275000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in C/C++
  • Proficiency in at least one scripting language (e.g., Python, Bash, Go)
  • Strong experience with device-to-device networking technologies (RDMA, GPUDirect, etc.), including RoCE
  • Experience with zero-copy networking, RDMA verbs and memory registration
  • Familiarity with queue pairs, completions queues, and transport types
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Ability to work independently and as part of a team
  • Experience with version control systems (e.g., Git)
Job Responsibility
Job Responsibility
  • Design, develop, and implement RDMA based networking peering, supporting high bandwidth, low latency communication across PCIe nodes within and across racks
  • Develop tests that qualify host processors (x86), NICs, TORs and device network interfaces for high performance
  • Furnish burn-in teams with tests that represent both real-world use cases and workloads for device to device networking, and extreme-load stress testing
  • Define the key metrics that system software must collect to maintain high availability and performance under extreme communications workloads
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • Fulltime
Read More
Arrow Right
New

Bartender

Are you excited to work and create epic moments for guests at our multi-award-wi...
Location
Location
Australia , Melbourne
Salary
Salary:
Not provided
https://www.marriott.com Logo
Marriott Bonvoy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in a fast-paced upscale bar/F&B environment is desirable
  • Previous experience in hotels is advantageous
  • A proactive, upbeat attitude with a genuine passion for hospitality
  • Confidence engaging with guests and working as part of a team
  • A desire to learn, grow, and take on new challenges
  • A current Victorian RSA is essential
  • A playful, polished, and guest-focused approach to service
Job Responsibility
Job Responsibility
  • Be passionate about the guests’ service experience, but also have an eye for detail – from the money that moves through hands, to the information provided to guests, to the quality of drinks
  • Create memorable guest experiences through attention to detail and genuine connection
  • Maintain presentation and service standards across the restaurant and bar areas
  • Work collaboratively with the kitchen and wider hotel team
  • Take pride in product knowledge, accuracy, and operational excellence
  • Represent the W brand through confident, professional, and energetic service
What we offer
What we offer
  • Competitive Pay & Annual Reviews: reviewed every July
  • Mentor & Buddy Program: we set you up for success
  • Career Growth: global and interstate opportunities with Marriott International
  • Prime CBD Location: steps from public transport and parking
  • Birthday Perks: enjoy your special day with a paid day off
  • Exclusive Discounts: 35% off food & beverages at W Melbourne + global Marriott room & F&B discounts + 600+ local retailers discounts
  • Community Engagement: join W Melbourne’s Take Care initiatives and make a difference for our community
  • Award-Winning Workplace: third-consecutive-year Great Place to Work certified, with 2-hat restaurant, best-designed bar, and best event spaces
  • Referral Bonus: bring your friends and work alongside your tribe
  • Parttime
Read More
Arrow Right
New

Deputy Manager

We are on the lookout for a driven Deputy Manager who can inspire and empower th...
Location
Location
United Kingdom , Salford
Salary
Salary:
32024.00 GBP / Year
slugandlettuce.co.uk Logo
Slug And Lettuce
Expiration Date
March 15, 2026
Flip Icon
Requirements
Requirements
  • Must be 18 or over as the role involves the sale of alcohol
Job Responsibility
Job Responsibility
  • Provide clear guidance, encouragement, and support to help the team grow
  • Support the General Manager and work together to create and deliver our vision
  • Maintain consistency in standards and procedures
  • Foster a welcoming and inclusive environment where guests feel comfortable and valued
What we offer
What we offer
  • Award winning Deputy Manager and General Manager development programmes
  • Bonus opportunities
  • Reward Card via the MiXR App – 25% off food and drink for you and ALL your friends across our Managed Pubs
  • Stonegate Xtra Rewards – Online benefits portal offering discounts across the High Street and other retailers
  • VIP entry to our Pubs and Bars
  • Stonegate Hotel Discounts
  • Flexible working
  • Corporate Discounted Rates at David Lloyd and PureGym
  • Discounted Dental Insurance
  • Stream – Early access to your earned wages
  • Fulltime
Read More
Arrow Right
New

Process Design Engineer

We are seeking an experienced Process Design Engineer with hands-on experience i...
Location
Location
United Kingdom , Whitehaven
Salary
Salary:
Not provided
rullion.co.uk Logo
Rullion
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • HNC/HND in Mechanical Engineering or equivalent
  • Strong operational plant knowledge and experience
  • Experience with CIMAGE and Sellafield systems
  • Excellent communication and teamwork skills
  • proactive and solution-focused
  • Must hold current and active clearance at the time of application
Job Responsibility
Job Responsibility
  • Apply Lessons Learned and operational knowledge to design activities
  • Produce, review, and maintain design documentation, drawings, and specifications
  • Manage configuration control of design outputs and coordinate site implementation
  • Liaise with designers and stakeholders to ensure compliance with standards and regulations
  • Provide technical updates and progress reporting to the Process Responsible Engineer
  • Use AutoCAD, Navisworks, and 3D scanning tools to convert site data into design outputs
  • Contribute site knowledge to design reviews and constructability assessments
  • Fulltime
Read More
Arrow Right