This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Join our team as a Systems & Infrastructure Specialist for a high-intensity, expert-level project focused on training and optimizing AI models within intricate, containerized environments. In this terminal-intensive role, you'll apply a systems-first mindset to solve complex infrastructure challenges in real time. This one-time project offers significant opportunities for extension or transition into future phases for those who demonstrate elite technical execution.
Job Responsibility:
Navigate, troubleshoot, and recover dynamic infrastructure and long-running processes in real-time using command-line tools
Master and manage highly containerized environments, including orchestrating Dockerized sandboxes and CI/CD workflows
Build, maintain, and optimize systems for AI model training and high-throughput compute environments
Respond swiftly to system errors, executing dynamic mid-operation replanning and recovery
Collaborate with engineering and AI teams to ensure seamless integration, reliability, and performance
Document system architectures, incident responses, and recovery protocols with meticulous clarity
Contribute expertise to evolving project needs, adapting to new technologies and scaling strategies as required
Requirements:
Terminal-native problem solving
Dynamic infrastructure recovery
Containerized environment mastery
Systems multilingualism
Demonstrated expert proficiency working in terminal environments for system builds, server administration, and infrastructure management
Advanced problem-solving skills for multi-step troubleshooting, filesystem navigation, and process management within containerized settings
Hands-on experience with Python, Bash, JavaScript/TypeScript, Go, Rust, and/or C/C++
Deep familiarity with build systems, package managers, databases, web servers, ML frameworks, version control, and cryptography tools
Proven ability to execute dynamic infrastructure recovery and optimize long-running processes under pressure
Strong written and verbal communication skills, with a passion for precise technical documentation
Systems multilingualism: versatility across operating systems, languages, and emerging DevOps tools
Nice to have:
Prior experience in high-compute environments for AI/ML workloads
Background in Site Reliability Engineering or DevOps roles focused on mission-critical infrastructure
Familiarity with advanced container orchestration and distributed system design