CrawlJobs Logo

Supercomputing Engineer

etched.com Logo

Etched

Location Icon

Location:
United States , San Jose

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

200000.00 - 275000.00 USD / Year

Job Description:

Etched is building at-scale AI systems that will unlock faster, more efficient inference for billions of people, and the Supercomputing team is critical in enabling this mission. We are seeking a highly skilled and motivated Engineer to join our Supercomputing team to help build the foundational software that powers our cluster-scale AI compute deployments. This role on the core team involves the development, integration, and debugging of critical system components, including on control-plane software, system bring-up, telemetry, orchestration primitives, and performance tuning at the hardware–software boundary.

Job Responsibility:

  • Architect and implement low-level control-plane software responsible for system bring-up, configuration, and management of cluster-scale AI compute deployments
  • Build system services that interact directly with hardware, firmware, and the operating system
  • Develop telemetry, logging, and tracing infrastructure for diagnosing failures and driving performance improvements
  • Implement orchestration primitives for managing devices, nodes, and racks
  • Profile and tune performance across PCIe, memory, networking, kernel, and runtime layers
  • Collaborate closely with hardware, firmware, kernel, and runtime teams to co-design system interfaces and behavior

Requirements:

  • Strong proficiency in C/C++ or Rust for low-level systems programming
  • Deep understanding of Linux internals, kernel/user-space boundaries, and system-level debugging
  • Experience working close to hardware: drivers, DMA, interrupts, memory management, or device control paths
  • Strong debugging skills using logs, tracing, and low-level observability tools
  • Strong communication skills and comfort collaborating across hardware and software teams

Nice to have:

  • Experience with data center orchestration technologies such as Kubernetes and Docker
  • Experience with kernel development, device drivers, or firmware-adjacent software
  • Familiarity with PCIe, NUMA, networking, or high-speed interconnects
  • Experience with tracing and profiling tools such as perf, eBPF, ftrace, or custom instrumentation
  • Experience taking complex systems from early bring-up through stable operation
  • Background in HPC, AI infrastructure, or large-scale compute systems
  • Experience designing system test harnesses and failure-injection frameworks
  • Familiarity with Kubernetes or cluster orchestration at the node or control-plane level
What we offer:
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Supercomputing Engineer

New

Supercomputing Software Engineer

We are seeking a highly skilled and motivated Supercomputing Software Engineer t...
Location
Location
Taiwan , Taipei
Salary
Salary:
Not provided
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in C/C++ or Python
  • Strong understanding of BIOS and BMC firmware architectures
  • Experience with server boot processes
  • Knowledge of root-of-trust and security principles
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Experience with advanced system logging and diagnostic tools
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Experience with version control systems (e.g., Git)
  • Experience with reading and interpreting hardware logs
Job Responsibility
Job Responsibility
  • Integrate and maintain BIOS and BMC firmware, ensuring robust and efficient server boot processes
  • Measure and Tune System Performance Configuration: Analyze DRAM timings, PCIe configurations, power state transitions etc. to ensure high performance and maximal reliability
  • Root of Trust and Security: Validating security features, including root of trust mechanisms, to protect system integrity and data security
  • Advanced System Logging and Diagnostics: Design and implement advanced system logging and diagnostic capabilities to facilitate efficient troubleshooting and performance analysis
  • Data Center Orchestration Integration: Integrate and optimize node-level data center orchestration technologies, such as Kubernetes and Docker, into the system software stack
  • System Validation and Testing: Develop and execute comprehensive test plans to validate system software functionality, stability, and performance
  • Collaboration and Troubleshooting: Collaborate with hardware and software teams to diagnose and resolve complex system-level issues
What we offer
What we offer
  • Competitive compensation packages including generous equity packages
  • Comprehensive insurance coverage and other top-of-market benefits
  • Fulltime
Read More
Arrow Right
New

Supercomputing Engineer (Test)

We are seeking highly motivated and detail-oriented Supercomputing Engineer (Tes...
Location
Location
United States , San Jose
Salary
Salary:
150000.00 - 275000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in at least one scripting language (e.g., Python, Bash, Go)
  • Experience with software testing methodologies and tools
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Ability to work independently and as part of a team
  • Experience with version control systems (e.g., Git)
  • Experience with reading and interpreting hardware logs
Job Responsibility
Job Responsibility
  • Test Development: Design, develop, and implement automated burn-in test suites using common scripting languages (Python, Go, Bash) and test frameworks across all aspects of System Operation including: boot sequences, root-of-trust, system management, workload deployment and performance
  • Test Execution: Execute burn-in tests on server hardware, monitor system performance and health, and analyze test results
  • Failure Analysis: Investigate and debug hardware and software failures identified during testing, providing detailed reports and mitigation plans
  • Collaboration: Collaborate with internal and external hardware and software engineering teams to identify root causes of failures and implement corrective actions
  • Test Infrastructure: Contribute to the development and maintenance of the burn-in testing infrastructure, including portable test environments and automation tools runable in any environment
  • Documentation: Create and maintain comprehensive documentation for test plans, test cases, and test results
  • Performance Analysis: Analyze system performance metrics to identify potential bottlenecks and areas for optimization
  • Continuous Improvement: Participate in continuous improvement efforts to enhance the efficiency and effectiveness of the burn-in testing process
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • Fulltime
Read More
Arrow Right
New

Supercomputing Engineer (Network)

We are seeking highly motivated and skilled Supercomputing Engineers (Network) t...
Location
Location
United States , San Jose
Salary
Salary:
150000.00 - 275000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in C/C++
  • Proficiency in at least one scripting language (e.g., Python, Bash, Go)
  • Strong experience with device-to-device networking technologies (RDMA, GPUDirect, etc.), including RoCE
  • Experience with zero-copy networking, RDMA verbs and memory registration
  • Familiarity with queue pairs, completions queues, and transport types
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Ability to work independently and as part of a team
  • Experience with version control systems (e.g., Git)
Job Responsibility
Job Responsibility
  • Design, develop, and implement RDMA based networking peering, supporting high bandwidth, low latency communication across PCIe nodes within and across racks
  • Develop tests that qualify host processors (x86), NICs, TORs and device network interfaces for high performance
  • Furnish burn-in teams with tests that represent both real-world use cases and workloads for device to device networking, and extreme-load stress testing
  • Define the key metrics that system software must collect to maintain high availability and performance under extreme communications workloads
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • Fulltime
Read More
Arrow Right
New

Supercomputing Test Software Engineer

We are seeking highly motivated and detail-oriented Software Engineers to join o...
Location
Location
Taiwan , Taipei
Salary
Salary:
Not provided
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in at least one scripting language (e.g., Python, Bash, Go)
  • Experience with software testing methodologies and tools
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures
  • Ability to analyze complex technical problems and provide effective solutions
  • Excellent communication and collaboration skills
  • Ability to work independently and as part of a team
  • Experience with version control systems (e.g., Git)
  • Experience with reading and interpreting hardware logs
Job Responsibility
Job Responsibility
  • Design, develop, and implement automated supercomputing test suites using common scripting languages (Python, Go, Bash) and test frameworks across all aspects of System Operation including: boot sequences, root-of-trust, system management, workload deployment and performance
  • Execute tests on server hardware, monitor system performance and health, and analyze test results
  • Investigate and debug hardware and software failures identified during testing, providing detailed reports and mitigation plans
  • Collaborate with internal and external hardware and software engineering teams to identify root causes of failures and implement corrective actions
  • Contribute to the development and maintenance of the supercomputing testing infrastructure, including portable test environments and automation tools runnable in any environment
  • Create and maintain comprehensive documentation for test plans, test cases, and test results
  • Analyze system performance metrics to identify potential bottlenecks and areas for optimization
  • Participate in continuous improvement efforts to enhance the efficiency and effectiveness of the testing process
What we offer
What we offer
  • Competitive compensation packages including generous equity packages
  • Comprehensive insurance coverage and other top-of-market benefits
  • Fulltime
Read More
Arrow Right
New

Talent Sourcer

As we scale, we’re looking for a Talent Sourcer (Supercomputing/ML) to build and...
Location
Location
United States , San Jose
Salary
Salary:
100000.00 - 220000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience sourcing technical talent in highly competitive markets
  • deep experience sourcing software, systems, infrastructure, or hardware engineers
  • highly resourceful and love finding exceptional candidates beyond obvious platforms
  • thrive in ambiguity and enjoy building sourcing engines from scratch
  • detail-oriented, organized, and operationally strong
  • care deeply about candidate experience and employer brand
  • love working in high-velocity environments with extremely high hiring bars
Job Responsibility
Job Responsibility
  • Own top-of-funnel sourcing strategy across priority engineering roles in supercomputing, ML systems, firmware, networking, and distributed systems
  • build and maintain high-quality talent pipelines through outbound sourcing, referrals, events, research, and creative outreach
  • partner closely with recruiters and hiring managers to deeply understand role requirements, ideal profiles, and search strategy
  • develop market maps for niche technical domains and continuously expand our talent network
  • run high-volume, high-signal outbound campaigns with thoughtful personalization
  • track sourcing performance, conversion rates, and funnel health
  • continuously experiment with new sourcing channels, tools, and techniques
  • deliver a best-in-class candidate experience from first touch onward
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • housing subsidy of $2k per month for those living within walking distance of the office
  • relocation support for those moving to San Jose (Santana Row)
  • various wellness benefits covering fitness, mental health, and more
  • daily lunch + dinner in our office
  • Fulltime
Read More
Arrow Right
New

Supercomputing Intern

Our supercomputing role focuses on the design, development, and deployment of ML...
Location
Location
United States , San Jose
Salary
Salary:
Not provided
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Progress towards a Bachelor’s, Master’s, or PhD degree in Computer Science, Engineering, or a related technical field
  • Proficiency in C/C++ or Rust
  • Proficiency in Python
  • Strong fundamentals in data structures and algorithms
  • Strong understanding of low-level software engineering
  • Strong understanding of hardware/software co-design
  • Excellent communication and collaboration skills
Job Responsibility
Job Responsibility
  • Design, development, and deployment of ML system software required for operating rack-scale systems
  • Work spanning network performance, telemetry creation and processing pipelines, and analysis of system-level health and performance
  • Deployment and provisioning of software frameworks and hardware validation
  • Maintaining secure and performant systems for data center scale ML workloads
What we offer
What we offer
  • 12-week paid internship
  • Generous housing support for those relocating
  • Daily lunch and dinner in our office
  • Direct mentorship from industry leaders and world-class engineers
  • Opportunity to work on one of the most important problems of our time
  • Fulltime
Read More
Arrow Right

Directed Energy Modeling and Simulation Programmer

Stellar Science is a growing Albuquerque-based scientific software development c...
Location
Location
United States , Albuquerque; Dayton; Tysons Corner
Salary
Salary:
Not provided
stellarscience.com Logo
Stellar Science
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.S. in physics, math, engineering, computer science, or a related field
  • M.S. or Ph.D. desired
  • Substantial software development experience
  • Familiarity with state-of-the-art radio frequency or laser modeling and simulation tools
  • Ability to implement, understand, and maintain mathematical and scientific codes
  • Object-oriented design and programming in C++
  • U.S. citizenship + willingness to undergo background investigation and perform some work at customer sites
Job Responsibility
Job Responsibility
  • Create and extend scientific and engineering analysis applications
  • Develop custom software products using modern technologies, including C++23, massive parallelization, and supercomputing resources
  • Support research and development in modeling and simulation of high-power microwave systems and high energy laser systems, image and signature prediction, and computational electromagnetics
What we offer
What we offer
  • Eleven paid federal holidays (which may be floated as desired)
  • Three weeks paid time off
  • A generous fully employer-funded SEP IRA
  • Fully employer-funded health insurance
  • Dental insurance
  • Disability insurance
  • Life insurance
  • Fulltime
Read More
Arrow Right
New

Member of Technical Staff, Infrastructure Data & Analytics

We are seeking experienced Infrastructure Data & Analytics Engineers to join our...
Location
Location
United States , Multiple Locations; Mountain View; San Francisco Bay area; New York City metropolitan area
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, or related technical field AND 8+ years technical engineering experience with data engineering, analytics, or data science, with increasing technical ownership in startup environment AND 6+ years experience with distributed data processing frameworks and large-scale data systems
  • OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with technical engineering experience with data engineering, analytics, or data science, with increasing technical ownership in startup environment AND 10+ years experience with distributed data processing frameworks and large-scale data systems
  • OR equivalent experience
  • Proven technical leadership in data engineering, analytics platforms, or large-scale telemetry systems
  • Hands-on experience with ETL orchestration frameworks such as Airflow, Dagster, or similar
  • Strong communication skills
  • can explain complex systems clearly to senior leader
Job Responsibility
Job Responsibility
  • Act as the technical lead and owner for infrastructure analytics across compute, storage, and networking
  • Design and build durable, scalable data pipelines that ingest telemetry from clusters, schedulers, health systems, and capacity trackers into Data Warehouse
  • Define and standardize core metrics and semantics (e.g., utilization, occupancy, MFU, goodput, capacity readiness, delivery-to-production)
  • Architect and maintain self-service dashboards and APIs for fleet, cluster, and squad-level visibility
  • Partner closely with stakeholders across Supercomputing Infra, Researchers, Strategy and Executives to ensure metrics reflect operational and business reality
  • Implement robust and fault-tolerant systems for data ingestion and processing
  • Lead data architecture and engineering decisions, applying strong technical judgment to proactively shape executive-level discussions and decisions
  • Identify data gaps and instrumentation issues
  • drive fixes by influencing upstream engineering teams
  • Establish data quality, validation, documentation, and governance so metrics are trusted and repeatable
  • Fulltime
Read More
Arrow Right