Filters

Countries

United States (3)

Work Mode

On-site work (2)

Research Engineer, Scaling Jobs (On-site work)

2 Job Offers

Filters

Research Engineer, Scaling

Join 1X in Palo Alto as a Research Engineer, Scaling. You will design and build production-grade infrastructure for large-scale robot training and efficient inference. Your work optimizing distributed systems and on-device performance will directly impact our fleet. We offer competitive benefits ...

Location

United States , Palo Alto

Salary

180000.00 - 300000.00 USD / Year

1X Technologies

Expiration Date

Until further notice

AI Research Engineer, Scaling

Join 1X as an AI Research Engineer, Scaling in Palo Alto. You will design robust infrastructure for large-scale training and inference across our humanoid robot fleet. This role requires expertise in distributed systems (e.g., TorchTitan, TensorRT) and optimizing performance from datacenter to ed...

Location

United States , Palo Alto

Salary

180000.00 - 300000.00 USD / Year

1X Technologies

Expiration Date

Until further notice

A Research Engineer, Scaling, is a specialized technical role at the intersection of machine learning research, systems engineering, and high-performance computing. Professionals in these jobs are the architects of scale, responsible for transforming cutting-edge AI prototypes and research models into robust, production-grade systems capable of operating efficiently at massive magnitudes. Their core mission is to remove computational bottlenecks, ensuring that the primary constraint for advancement is data and algorithmic innovation, not hardware limitations. These roles are critical in organizations pushing the boundaries of AI, where the ability to train larger models on bigger datasets and deploy them reliably is a fundamental competitive advantage. The typical responsibilities for a Research Engineer, Scaling, are multifaceted. On the training side, they design, build, and maintain distributed training infrastructure to enable seamless large-scale runs spanning hundreds or thousands of accelerators (like GPUs). This involves deep work on fault tolerance, experiment tracking, data pipeline optimization, and leveraging frameworks for parallelized training. On the inference side, they optimize model deployment for both datacenter and edge environments. This includes maximizing throughput and minimizing latency through techniques like model quantization, kernel optimization, efficient scheduling, and the use of advanced compilers and serving systems. A unifying thread is the relentless focus on performance: understanding compute architectures, memory hierarchies, and network communication to squeeze out maximum efficiency from every cycle. The skill set required for these highly technical jobs is demanding and interdisciplinary. A strong foundation in computer science is essential, typically evidenced by an advanced degree. Proficiency in programming languages like Python and C++ is mandatory. Candidates must possess a deep, intuitive understanding of distributed systems principles, training scaling laws, and the full stack from algorithmic code to hardware execution. Hands-on experience with distributed training frameworks (e.g., PyTorch's ecosystem tools), inference optimization toolkits (e.g., TensorRT), and performance profiling is standard. Crucially, a Research Engineer in Scaling must have a mindset geared toward extreme scalability, viewing it not as an operational detail but as a foundational enabler of breakthrough AI capabilities. They are the engineers who build the runway upon which AI research can take flight, making them pivotal in the most ambitious technology jobs today.