CrawlJobs Logo

Senior Software Engineer - Together Cloud Infrastructure

together.ai Logo

Together AI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

160000.00 - 230000.00 USD / Year

Job Description:

Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastructure. As a Senior AI Infrastructure Engineer, you will play a key role in building the next generation AI cloud platform – a highly available, global, blazing-fast cloud infrastructure that virtualizes cutting-edge ML hardware (GB200s/GB300s, BlueField DPUs) and enables state-of-the-art ML practitioners with self-serve AI cloud services, such as on-demand + managed Kubernetes and Slurm clusters. This platform serves both our internal SaaS products (inference, fine-tuning) and our external cloud customers, spanning dozens of data centers across the world.

Job Responsibility:

  • Design, build, and maintain performant, secure, and highly-available backend services/operators that run in our data centers and automate hardware management, such as Infiniband partitioning, in-DC parallel storage provisioning, and VM provisioning
  • Design and build out the IaaS software layer for a new GB200 data center with thousands of GPUs
  • Work on a global multi-exabyte high-performance object store, serving massive datasets for pretraining
  • Build advanced observability stacks for our customers with automated node lifecycle management for fault-tolerant distributed pretraining
  • Perform architecture and research work for decentralized AI workloads
  • Work on the core, open-source Together AI platform
  • Create services, tools, and developer documentation
  • Create testing frameworks for robustness and fault-tolerance

Requirements:

  • 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
  • 5+ years experience writing high-performance, well-tested, production quality code
  • Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
  • Deep experience with Kubernetes internals a big plus, such as implementing non-trivial Kubernetes operators, device/storage/network plugins, custom schedulers, or patches thereon or Kubernetes itself
  • Deep experience with VMs/hypervisors a big plus, such as QEMU/KVM, cloud-hypervisor, VFIO, virtio, PCIE passthrough, Kubevirt, SR-IOV
  • Deep experience with DC networking tech + solutions a big plus, such as VLAN, VXLAN, VPN, VPC, OVS/OVN
  • Experience with Cluster API or similar a big plus
  • Experience working on high-performance compute, networking, and/or storage a big plus
  • Experience virtualizing GPUs and/or Infiniband a big plus
  • Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale
  • Experience with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD)
  • Experience building IaaS or PaaS systems at scale a plus
  • Experience with DPUs/SmartNICs a plus
  • GPU programming, NCCL, CUDA knowledge a plus

Nice to have:

  • Deep experience with Kubernetes internals
  • Deep experience with VMs/hypervisors
  • Deep experience with DC networking tech + solutions
  • Experience with Cluster API or similar
  • Experience working on high-performance compute, networking, and/or storage
  • Experience virtualizing GPUs and/or Infiniband
  • Experience building IaaS or PaaS systems at scale
  • Experience with DPUs/SmartNICs
  • GPU programming, NCCL, CUDA knowledge
What we offer:
  • competitive compensation
  • startup equity
  • health insurance
  • other benefits
  • flexibility in terms of remote work

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer - Together Cloud Infrastructure

Senior Software Engineer - Together Cloud Platform

About the Role: Together AI is building the AI Acceleration Cloud, an end-to-end...
Location
Location
United States , San Francisco
Salary
Salary:
160000.00 - 230000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems and API microservices
  • Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources
  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
  • Demonstrated experience with building and operating high-performance and/or globally distributed microservice architectures across one or more cloud providers (AWS, Azure, GCP)
  • Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale
  • Experience developing against and managing a relational database, such as PostgreSQL
  • Expert-level programmer in one or more of programming language (Golang preferred)
  • Proficiency in version control practices and integrating IaC with CI/CD pipelines
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Identify, design, and develop foundational backend services that power Together’s cloud platform
  • Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
  • Partner with product teams to understand functional requirements and deliver solutions that meet business needs
  • Write clear, well-tested, and maintainable software and IaC for both new and existing systems
  • Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance
  • Participate in an on-call rotation to address critical incidents when necessary
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • flexibility in terms of remote work
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

At Optimizely, we're on a mission to help people unlock their digital potential....
Location
Location
Bangladesh , Dhaka
Salary
Salary:
Not provided
optimizely.com Logo
Optimizely
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience as a software engineer
  • Deep practical understanding of modern AI technologies, with hands-on experience applying LLMs (for text analysis, user behaviour, etc.) and active daily use of AI tools
  • Strong experience in modern Kotlin development, golang, and Typescript
  • Hands-on experience with Kubernetes, Terraform, and working directly with cloud infrastructure (e.g., AWS, GCP, Azure)
  • Broad experience with a range of server-side technologies, including legacy systems like PHP, Python, and node.js
  • Experience building, delivering, and maintaining services that comprise modern microservice-based SaaS products
  • Experience with agile delivery workflows and modern software quality techniques like TDD, pair programming, etc
  • Experience developing or contributing to development platforms or software frameworks
  • Bachelor’s Degree (Computer Science or engineering preferred) or equivalent work experience
  • Displaying Technical Expertise
Job Responsibility
Job Responsibility
  • Research new opportunities and determine the architecture and design together with the Engineering Manager, Product Manager, User Experience Designer, and neighbouring teams
  • Understand the full problem-space of Content Recommendations, as well as the role of Content Intelligence to assist in our customers' success with their digital experience strategies
  • Actively research, prototype, and apply modern AI technologies to enhance modelling and analysis within our Content Recommendations engine, contributing to our AI-first product strategy
  • Play a key role in evolving our Content Recommendations platform, including the continued migration of critical legacy services (PHP, Python, node.js) to a modern, scalable Kotlin and Kubernetes-based architecture
  • Levelling up the team via knowledge sharing, extensive code & design reviews, and prioritizing building common tools, advocating improving development processes and learning
  • Reviews and contributes to interview kits for technical roles and provides interview training to engineers
What we offer
What we offer
  • Best-in-class compensation plans
  • Two annual festival bonuses
  • Recognition and rewards programs
  • Vacations days
  • Annual Work/Service Anniversary Leave
  • Parental leave (both maternity and paternity)
  • Health insurance
  • Reproductive benefits for both parents
  • Volunteering opportunities to make a difference
  • Chance to work alongside our incredible global team
  • Fulltime
Read More
Arrow Right

Senior AI Infrastructure Engineer

Together AI is building the AI Acceleration Cloud, an end-to-end platform for th...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
  • 5+ years experience writing high-performance, well-tested, production quality code
  • Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
  • Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale
  • Experience with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD)
Job Responsibility
Job Responsibility
  • Design, build, and maintain performant, secure, and highly-available backend services/operators that run in our data centers and automate hardware management, such as Infiniband partitioning, in-DC parallel storage provisioning, and VM provisioning
  • Design and build out the IaaS software layer for a new GB200 data center with thousands of GPUs
  • Work on a global multi-exabyte high-performance object store, serving massive datasets for pretraining
  • Build advanced observability stacks for our customers with automated node lifecycle management for fault-tolerant distributed pretraining
  • Perform architecture and research work for decentralized AI workloads
  • Work on the core, open-source Together AI platform
  • Create services, tools, and developer documentation
  • Create testing frameworks for robustness and fault-tolerance
  • Fulltime
Read More
Arrow Right

Senior Engineer/Technical Lead (DevOps - AWS)

As a DevOps Engineer, you will work together with other cloud engineers, archite...
Location
Location
India , Ahmedabad
Salary
Salary:
Not provided
arrow.com Logo
Arrow Electronics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in DevOps, with a strong focus on automation, cloud infrastructure, and CI/CD practices
  • Advanced knowledge of Terraform, with experience in writing, testing, and deploying modules
  • Extensive experience with AWS services (EC2, S3, RDS, Lambda, VPC, etc.) and best practices in cloud architecture
  • Proven experience in containerization with Docker and orchestration with Kubernetes in production environments
  • Strong understanding of CI/CD processes, with hands-on experience in CircleCI or similar tools
  • Proficient in Python and Linux Shell scripting for automation and process improvement
  • Experience with Datadog or similar tools for monitoring and alerting in large-scale environments
  • Proficient with Git, including branching, merging, and collaborative workflows
  • Experience with Kustomize or similar tools for managing Kubernetes configurations
  • Experience in designing and building cloud native solutions using AWS services for product development in large scale
Job Responsibility
Job Responsibility
  • Work together with other cloud engineers, architects, developers, and customer engineering teams
  • Fulltime
Read More
Arrow Right
New

Senior Software Engineer - AI Infrastructure (Scheduler) - CoreAI

The AI Platform organization builds the end-to-end Azure AI stack, from the infr...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C++, C#, Java, Scala, Rust, Go, TypeScript | OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Work on the design and development of the core AI Infrastructure distributed and in-cluster services that support large scale AI training and inferencing
  • Develop, test, and maintain control plane services written in C#, hosted on Service Fabric or Kubernetes (AKS) clusters
  • Enhance systems and applications to ensure high stability, efficiency and maintainability, low latency, tight cloud security
  • Provide operational support and DRI (on-call) responsibilities for the service
  • Develop and foster a deep understanding of the machine learning concepts, use cases, and relevant services used by our customers
  • Collaborate closely with service engineers, product managers, and internal applied research and data science teams within Microsoft to build better solutions together
  • Provide vision, expertise, and technical leadership to other team members
  • Help to grow talent in these areas
  • Embody our culture and values
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Social Tech

The Social Tech team is now seeking an experienced Senior Software Engineer to j...
Location
Location
Finland , Helsinki
Salary
Salary:
Not provided
supercell.com Logo
Supercell
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong backend engineering experience with Java and production services running at scale
  • Experience with AWS and cloud-native development practices
  • Proficiency with infrastructure as code tools such as Terraform, CloudFormation or CDK
  • Experience designing and integrating backend APIs and services using REST and/or gRPC
  • Familiarity with a range of database technologies, both non-relational (e.g. DynamoDB, Redis) and relational, and solid SQL fundamentals
  • Ability to collaborate across disciplines and teams, and communicate clearly with both engineers and non-engineers
  • Being passionate and committed to tasks, and in general an autonomous person with high levels of initiative and energy
  • An open and respectful attitude towards others and their work
Job Responsibility
Job Responsibility
  • Design, implementation, deployment and maintenance of highly scalable and available backend services for in-game social features
  • Develop and evolve APIs used by game teams, clarifying concepts, use cases and requirements together with stakeholders
  • Contribute to our Social SDK client that integrates into Supercell games across multiple platforms
  • Improve best practices around reliability, performance, and operability of our social services
  • Periodically offer round-the-clock, first-line support to production environments, as part of a rotating on-call duty
  • Fulltime
Read More
Arrow Right

Senior Systems Engineer

We build Uber's infrastructure to deploy and run all database engines and other ...
Location
Location
Denmark , Aarhus
Salary
Salary:
Not provided
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience
  • BS, MS, or Ph.D. degree in computer science, similar technical field of study, or similar practical experience
  • Background in multiple programming languages, e.g., Golang, C/C++, Python, etc.
  • Strong hands-on experience with solving Linux/Operating Systems problems at the software/hardware interface which includes disk, memory, cpu and network subsystems
  • An inherent aim is to collaborate, both within the team and across the organization
  • Excellent written and verbal interpersonal skills, and the ability to write detailed design documents, post mortems
  • A belief that your team can accomplish more together than as separate individuals
  • Attention to detail, particularly around software engineering fundamentals, testing methodologies, and quality
Job Responsibility
Job Responsibility
  • Contribute to planning, design and architecture, and building of systems, tooling, and observability in support of reliable workload scheduling, workload discovery, fleet security, host-level insights, and cloud expansion efforts
  • Actively drive collaboration across multiple teams to build alignment and progress.
  • Implement solutions in Go with a strong focus on clean, readable code with unit and integration test coverage.
  • Take an active part in code change peer reviews to ensure quality and multi-functional sharing across the team.
  • Contribute to engineering cultivation in terms of quality, monitoring, and on-call practices.
  • Own part of the team's charter and through that help setting longer-term direction for the team.
Read More
Arrow Right

Tech Lead

As Tech Lead, you will report directly to the CTO and play a central role in gui...
Location
Location
Italy , Rome
Salary
Salary:
Not provided
exelab.com Logo
Primatec Engineering
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Degree in Computer Science, Engineering, or equivalent professional experience
  • 5+ years of hands-on software development experience, with proven experience as a Tech Lead or senior technical role
  • Strong expertise in cloud-native architectures (preferably AWS)
  • Solid experience with CI/CD pipelines, Infrastructure as Code, and modern DevOps practices
  • Strong understanding of application security, cloud security fundamentals, and compliance-driven environments
  • Excellent communication and leadership skills in both Italian and English
  • A pragmatic, data-driven mindset focused on quality, delivery, and continuous improvement
Job Responsibility
Job Responsibility
  • Act as a technical reference for solutions built on AWS, Twilio, HubSpot, and Databricks
  • Lead the technical design and architecture of complex, cloud-native systems
  • Actively contribute to development through hands-on coding when needed
  • Define, maintain, and evolve coding standards, architectural guidelines, and engineering best practices
  • Perform and supervise code reviews, ensuring quality, maintainability, and scalability
  • Collaborate closely with the CTO, Project Managers, and Revenue Team to align on technical roadmaps, estimations, and priorities
  • Support planning and capacity allocation across Front-End, Back-End, and DevOps together with leadership
  • Ensure high-quality delivery by monitoring key engineering KPIs such as velocity, lead time, and defect rate
  • Drive continuous improvement initiatives across tools, processes, and ways of working
  • Embed security-by-design principles into system architecture and development workflows
What we offer
What we offer
  • A key technical leadership role with real impact on architecture, security, and delivery quality
  • Strong involvement in strategic technical decisions and organizational scaling
  • A dynamic and innovative environment where ownership and initiative are encouraged
  • Clear career growth paths toward senior technical or engineering leadership roles
  • The opportunity to shape Exelab’s engineering culture and long-term technical vision
  • Fulltime
Read More
Arrow Right