This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Software Engineer on our Cloud Infrastructure team, you'll be at the forefront, architecting and building the foundational systems that power Fireworks AI's revolutionary generative AI platform. You'll spearhead the creation of one of the world's first virtual clouds, seamlessly serving AI workloads across the globe and every cloud provider. Your mission: to deliver unparalleled reliability, efficiency, and scalability, fueling the world's most innovative AI products.This is a highly technical role requiring deep expertise in distributed systems, cloud-native infrastructure, and machine learning platforms. You’ll partner closely with engineering partners, product teams, and infrastructure stakeholders to design solutions that balance performance, cost-efficiency, and operational simplicity across compute, storage, and networking layers.
Job Responsibility:
Architect and build scalable, resilient, and high-performance backend infrastructure to support distributed training, inference, and data processing pipelines
Lead technical design discussions, mentor other engineers, and establish best practices for building and operating large-scale ML infrastructure
Design and implement core backend services (e.g., job schedulers, resource managers, autoscalers, model serving layers) with a focus on efficiency and low latency
Drive infrastructure optimization initiatives, including compute cost reduction, storage lifecycle management, and network performance tuning
Collaborate cross-functionally with ML, DevOps, and product teams to translate research and product needs into robust infrastructure solutions
Continuously evaluate and integrate cloud-native and open-source technologies (e.g., Kubernetes, Ray, Kubeflow, MLFlow) to enhance our platform’s capabilities and reliability
Own end-to-end systems from design to deployment and observability, with a strong emphasis on reliability, fault tolerance, and operational excellence
Requirements:
Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
5+ years of experience designing and building backend infrastructure in cloud environments (e.g., AWS, GCP, Azure)
Proven experience in ML infrastructure and tooling (e.g., PyTorch, TensorFlow, Vertex AI, SageMaker, Kubernetes, etc.)
Strong software development skills in languages like Python, or C++
Deep understanding of distributed systems fundamentals: scheduling, orchestration, storage, networking, and compute optimization
Nice to have:
Master’s or PhD in Computer Science or related field
Experience leading infrastructure projects supporting large-scale ML/AI workloads or high-throughput systems
Familiarity with infrastructure-as-code and CI/CD tooling (e.g., Terraform, ArgoCD, GitOps)
Track record of driving system performance, reliability, and cost-efficiency improvements
Contributions to open-source cloud or ML infrastructure projects a plus
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.