This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking highly motivated and detail-oriented Supercomputing Engineer (Test) to join our team. This team plays a critical role in ensuring the reliability and stability of our highest-performance Inference server hardware and software. As a Software Engineer on this team, you will design, develop, and execute comprehensive burn-in test suites, analyze test results, and collaborate with hardware and software engineering teams at Etched and our ODM partners to identify and resolve potential issues. You will be at the forefront of ensuring our server products meet the highest quality standards before they reach our customers.
Job Responsibility:
Test Development: Design, develop, and implement automated burn-in test suites using common scripting languages (Python, Go, Bash) and test frameworks across all aspects of System Operation including: boot sequences, root-of-trust, system management, workload deployment and performance
Test Execution: Execute burn-in tests on server hardware, monitor system performance and health, and analyze test results
Failure Analysis: Investigate and debug hardware and software failures identified during testing, providing detailed reports and mitigation plans
Collaboration: Collaborate with internal and external hardware and software engineering teams to identify root causes of failures and implement corrective actions
Test Infrastructure: Contribute to the development and maintenance of the burn-in testing infrastructure, including portable test environments and automation tools runable in any environment
Documentation: Create and maintain comprehensive documentation for test plans, test cases, and test results
Performance Analysis: Analyze system performance metrics to identify potential bottlenecks and areas for optimization
Continuous Improvement: Participate in continuous improvement efforts to enhance the efficiency and effectiveness of the burn-in testing process
Requirements:
Proficiency in at least one scripting language (e.g., Python, Bash, Go)
Experience with software testing methodologies and tools
Strong understanding of operating systems (Linux preferred) and server hardware architectures
Ability to analyze complex technical problems and provide effective solutions
Excellent communication and collaboration skills
Ability to work independently and as part of a team
Experience with version control systems (e.g., Git)
Experience with reading and interpreting hardware logs
Nice to have:
Experience with hardware burn-in testing or reliability testing
Knowledge of server virtualization and cloud computing concepts
Experience with performance testing and benchmarking tools
Familiarity with hardware diagnostic tools and techniques
Experience with containerization technologies (e.g., Docker, Kubernetes)
Experience with CI/CD pipelines
Knowledge of low level hardware communication protocols (i2c, etc.)
Experience with data analysis tools and techniques
What we offer:
Medical, dental, and vision packages with generous premium coverage
$500 per month credit for waiving medical benefits
Housing subsidy of $2k per month for those living within walking distance of the office
Relocation support for those moving to San Jose (Santana Row)
Various wellness benefits covering fitness, mental health, and more