This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Crusoe Cloud Software Development team is seeking a passionate and experienced Senior/Staff Software Engineer specializing in Systems Applications. This pivotal role is critical in the design and development of our compute platform, specifically focusing on building compute applications for virtualized AI-platforms. An understanding of the linux kernel, virtualization, hardware tuning, distributed systems, object oriented programming, and low-level systems programming are critical to this role. Excellent communication skills and a desire to work with a wide range of technologies across the linux stack are both a must. This is a full-time position.
Job Responsibility:
Design highly reliable and performant Linux applications used to manage our virtualization stack across thousands of AI compute servers in multiple global datacenters
Integrate Crusoe applications with a wide variety of hardware and software AI chip-vendor stacks. Build solutions to optimize and monitor virtualized hardware (GPUs, Infiniband/ROCe NICs, Ephemeral Storage, etc.) in cutting-edge AI/HPC environments
Work side by side with our Linux Kernel and Hypervisor teams to ensure our Crusoe applications are seamlessly integrated with a variety of kernels and hypervisors
Analyze and enhance the performance of the entire virtualization stack, from the hypervisor to the virtualized guest OS, with a specific focus on optimizing AI/ML workloads. This includes profiling, bottleneck identification, and implementing low-level optimizations
Diagnose and resolve complex system issues across our virtualization stack (drivers, kernel, hypervisor, guest OS, and crusoe applications). Work closely with kernel and hypervisor teams to debug and resolve integration challenges
Conduct thorough code reviews to ensure the highest level of software quality, reliability, and security within compute applications and virtualization stack
Collaborate with other engineering teams, including hardware design, OS development, and AI/ML application teams, to ensure cohesive and integrated product development
Provide technical guidance and mentorship to junior engineers, fostering a culture of technical excellence and collaborative problem-solving within the compute applications team
Requirements:
Experience building applications on Linux kernels, specifically pertaining to virtualization, device drivers, memory management, and process scheduling
Solid understanding of hardware devices such as GPUs, CPUs, Infiniband and Ethernet NICs, Ephemeral Disks, and PCI Express
Strong grasp of distributed applications and highly-scalable systems design. Specific focus around communications protocols (GRPC, REST, TCP/IP, etc.), databases (Postgres, Redis), and systems design applications (Pub/Sub, Kafka)
Strong experience building software applications, both at the higher (Golang, Java, Python) and lower (C, C++, Rust) levels. Keen eye for clean, maintainable code, and a unit-test driven mindset
Ability to collaborate with teams across an organization, blocking out noise, and focusing on what needs to get done to get a project across the line
Capable of adapting quickly, eager to research new technology and not get overwhelmed by unfamiliar tech stacks
General knowledge of hypervisors, virtual machine lifecycles, and Linux KVM tooling
Understanding of how to build Gitlab or Github CI/CD pipelines that deliver bug-free code across a multitude of compute platforms
Nice to have:
Experience with virtualization specifically for AI/ML workloads, including GPU virtualization
Previous work debugging or contributing to kernel or hypervisor code, particularly around device management
Experience with configuring thousands of live compute nodes in a bare-metal production environment
What we offer:
Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement
Subscription to the Calm app
MetLife Legal
Company paid Commuter FSA benefit of $300 per month