This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As the first dedicated Infrastructure Engineer at Reducto, you will influence every aspect of our infrastructure from the ground up. You will architect and scale resilient systems for AI and ML workloads, automate cloud infrastructure, and implement monitoring and incident response practices that set the standard for reliability. This role requires technical leadership, hands-on systems engineering, and strong collaboration with our founders and product teams as we build a company around reliability, rapid iteration, and high-impact product delivery.
Job Responsibility:
Designing, building, and maintaining highly available, scalable infrastructure to support intensive AI/ML workloads and real-time model deployments
Implementing robust monitoring, alerting, and observability systems to ensure system health, performance, and uptime across cloud and on-prem environments
Debugging, optimizing, and automating infrastructure for fast iteration and rapid deployment cycles, focusing on both reliability and developer velocity
Proactively identifying, investigating, and resolving incidents to minimize downtime and maintain world-class service levels for enterprise customers
Collaborating closely with engineers, ML specialists, and founders to shape product, infrastructure, and security strategies
Requirements:
Have 5+ years of hands-on experience in building or supporting production-grade infrastructure and reliability processes for high-throughput systems
Are comfortable with Python or similar languages
Exceptional at working across cloud platforms, container orchestration (e.g., Kubernetes), networking, and storage technologies
Build your own tools on the fly to diagnose, experiment, and address reliability problems
Bring a quantitative, hands-on approach to system operations, automation, and continuous improvement
Are your own worst critic—have an extremely high bar for quality and always aim for robust solutions rather than quick fixes
Nice to have:
Prior experience founding a company or building products/infrastructure in early-stage, high-growth environments
Excited about automating incident management processes with LLMs/AI
Driven, ambitious, and deeply care about both technical excellence and collaborative problem-solving
Keep up with the latest trends in cloud, observability, and SRE best practices
Passionate about open-source and have contributed tools or automation to reliability communities
Have built or optimized monitoring, incident response, or high-performance computing systems for demanding AI/ML, fintech, or enterprise clients
What we offer:
Unlimited PTO
Free lunch daily at the office
Reimbursed Transportation
Generous health insurance covering medical, dental, and vision
Health and Wellness Budget up to $150/mo reimbursement
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.