CrawlJobs Logo

Senior Software Engineer - Together Cloud Infrastructure

together.ai Logo

Together AI

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

160000.00 - 230000.00 USD / Year

Job Description:

Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastructure. As a Senior AI Infrastructure Engineer, you will play a key role in building the next generation AI cloud platform – a highly available, global, blazing-fast cloud infrastructure that virtualizes cutting-edge ML hardware (GB200s/GB300s, BlueField DPUs) and enables state-of-the-art ML practitioners with self-serve AI cloud services, such as on-demand + managed Kubernetes and Slurm clusters. This platform serves both our internal SaaS products (inference, fine-tuning) and our external cloud customers, spanning dozens of data centers across the world.

Job Responsibility:

  • Design, build, and maintain performant, secure, and highly-available backend services/operators that run in our data centers and automate hardware management, such as Infiniband partitioning, in-DC parallel storage provisioning, and VM provisioning
  • Design and build out the IaaS software layer for a new GB200 data center with thousands of GPUs
  • Work on a global multi-exabyte high-performance object store, serving massive datasets for pretraining
  • Build advanced observability stacks for our customers with automated node lifecycle management for fault-tolerant distributed pretraining
  • Perform architecture and research work for decentralized AI workloads
  • Work on the core, open-source Together AI platform
  • Create services, tools, and developer documentation
  • Create testing frameworks for robustness and fault-tolerance

Requirements:

  • 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
  • 5+ years experience writing high-performance, well-tested, production quality code
  • Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
  • Deep experience with Kubernetes internals a big plus, such as implementing non-trivial Kubernetes operators, device/storage/network plugins, custom schedulers, or patches thereon or Kubernetes itself
  • Deep experience with VMs/hypervisors a big plus, such as QEMU/KVM, cloud-hypervisor, VFIO, virtio, PCIE passthrough, Kubevirt, SR-IOV
  • Deep experience with DC networking tech + solutions a big plus, such as VLAN, VXLAN, VPN, VPC, OVS/OVN
  • Experience with Cluster API or similar a big plus
  • Experience working on high-performance compute, networking, and/or storage a big plus
  • Experience virtualizing GPUs and/or Infiniband a big plus
  • Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale
  • Experience with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD)
  • Experience building IaaS or PaaS systems at scale a plus
  • Experience with DPUs/SmartNICs a plus
  • GPU programming, NCCL, CUDA knowledge a plus

Nice to have:

  • Deep experience with Kubernetes internals
  • Deep experience with VMs/hypervisors
  • Deep experience with DC networking tech + solutions
  • Experience with Cluster API or similar
  • Experience working on high-performance compute, networking, and/or storage
  • Experience virtualizing GPUs and/or Infiniband
  • Experience building IaaS or PaaS systems at scale
  • Experience with DPUs/SmartNICs
  • GPU programming, NCCL, CUDA knowledge
What we offer:
  • competitive compensation
  • startup equity
  • health insurance
  • other benefits
  • flexibility in terms of remote work

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer - Together Cloud Infrastructure

New

Senior Software Engineer - Together Cloud Platform

About the Role: Together AI is building the AI Acceleration Cloud, an end-to-end...
Location
Location
United States , San Francisco
Salary
Salary:
160000.00 - 230000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems and API microservices
  • Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources
  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
  • Demonstrated experience with building and operating high-performance and/or globally distributed microservice architectures across one or more cloud providers (AWS, Azure, GCP)
  • Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale
  • Experience developing against and managing a relational database, such as PostgreSQL
  • Expert-level programmer in one or more of programming language (Golang preferred)
  • Proficiency in version control practices and integrating IaC with CI/CD pipelines
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Identify, design, and develop foundational backend services that power Together’s cloud platform
  • Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
  • Partner with product teams to understand functional requirements and deliver solutions that meet business needs
  • Write clear, well-tested, and maintainable software and IaC for both new and existing systems
  • Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance
  • Participate in an on-call rotation to address critical incidents when necessary
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • flexibility in terms of remote work
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

At Optimizely, we're on a mission to help people unlock their digital potential....
Location
Location
Bangladesh , Dhaka
Salary
Salary:
Not provided
optimizely.com Logo
Optimizely
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience as a software engineer
  • Deep practical understanding of modern AI technologies, with hands-on experience applying LLMs (for text analysis, user behaviour, etc.) and active daily use of AI tools
  • Strong experience in modern Kotlin development, golang, and Typescript
  • Hands-on experience with Kubernetes, Terraform, and working directly with cloud infrastructure (e.g., AWS, GCP, Azure)
  • Broad experience with a range of server-side technologies, including legacy systems like PHP, Python, and node.js
  • Experience building, delivering, and maintaining services that comprise modern microservice-based SaaS products
  • Experience with agile delivery workflows and modern software quality techniques like TDD, pair programming, etc
  • Experience developing or contributing to development platforms or software frameworks
  • Bachelor’s Degree (Computer Science or engineering preferred) or equivalent work experience
  • Displaying Technical Expertise
Job Responsibility
Job Responsibility
  • Research new opportunities and determine the architecture and design together with the Engineering Manager, Product Manager, User Experience Designer, and neighbouring teams
  • Understand the full problem-space of Content Recommendations, as well as the role of Content Intelligence to assist in our customers' success with their digital experience strategies
  • Actively research, prototype, and apply modern AI technologies to enhance modelling and analysis within our Content Recommendations engine, contributing to our AI-first product strategy
  • Play a key role in evolving our Content Recommendations platform, including the continued migration of critical legacy services (PHP, Python, node.js) to a modern, scalable Kotlin and Kubernetes-based architecture
  • Levelling up the team via knowledge sharing, extensive code & design reviews, and prioritizing building common tools, advocating improving development processes and learning
  • Reviews and contributes to interview kits for technical roles and provides interview training to engineers
What we offer
What we offer
  • Best-in-class compensation plans
  • Two annual festival bonuses
  • Recognition and rewards programs
  • Vacations days
  • Annual Work/Service Anniversary Leave
  • Parental leave (both maternity and paternity)
  • Health insurance
  • Reproductive benefits for both parents
  • Volunteering opportunities to make a difference
  • Chance to work alongside our incredible global team
  • Fulltime
Read More
Arrow Right
New

Senior AI Infrastructure Engineer

Together AI is building the AI Acceleration Cloud, an end-to-end platform for th...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
  • 5+ years experience writing high-performance, well-tested, production quality code
  • Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
  • Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale
  • Experience with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD)
Job Responsibility
Job Responsibility
  • Design, build, and maintain performant, secure, and highly-available backend services/operators that run in our data centers and automate hardware management, such as Infiniband partitioning, in-DC parallel storage provisioning, and VM provisioning
  • Design and build out the IaaS software layer for a new GB200 data center with thousands of GPUs
  • Work on a global multi-exabyte high-performance object store, serving massive datasets for pretraining
  • Build advanced observability stacks for our customers with automated node lifecycle management for fault-tolerant distributed pretraining
  • Perform architecture and research work for decentralized AI workloads
  • Work on the core, open-source Together AI platform
  • Create services, tools, and developer documentation
  • Create testing frameworks for robustness and fault-tolerance
  • Fulltime
Read More
Arrow Right
New

Senior Engineer/Technical Lead (DevOps - AWS)

As a DevOps Engineer, you will work together with other cloud engineers, archite...
Location
Location
India , Ahmedabad
Salary
Salary:
Not provided
arrow.com Logo
Arrow Electronics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in DevOps, with a strong focus on automation, cloud infrastructure, and CI/CD practices
  • Advanced knowledge of Terraform, with experience in writing, testing, and deploying modules
  • Extensive experience with AWS services (EC2, S3, RDS, Lambda, VPC, etc.) and best practices in cloud architecture
  • Proven experience in containerization with Docker and orchestration with Kubernetes in production environments
  • Strong understanding of CI/CD processes, with hands-on experience in CircleCI or similar tools
  • Proficient in Python and Linux Shell scripting for automation and process improvement
  • Experience with Datadog or similar tools for monitoring and alerting in large-scale environments
  • Proficient with Git, including branching, merging, and collaborative workflows
  • Experience with Kustomize or similar tools for managing Kubernetes configurations
  • Experience in designing and building cloud native solutions using AWS services for product development in large scale
Job Responsibility
Job Responsibility
  • Work together with other cloud engineers, architects, developers, and customer engineering teams
  • Fulltime
Read More
Arrow Right

Tech Lead

As Tech Lead, you will report directly to the CTO and play a central role in gui...
Location
Location
Italy , Rome
Salary
Salary:
Not provided
exelab.com Logo
Primatec Engineering
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Degree in Computer Science, Engineering, or equivalent professional experience
  • 5+ years of hands-on software development experience, with proven experience as a Tech Lead or senior technical role
  • Strong expertise in cloud-native architectures (preferably AWS)
  • Solid experience with CI/CD pipelines, Infrastructure as Code, and modern DevOps practices
  • Strong understanding of application security, cloud security fundamentals, and compliance-driven environments
  • Excellent communication and leadership skills in both Italian and English
  • A pragmatic, data-driven mindset focused on quality, delivery, and continuous improvement
Job Responsibility
Job Responsibility
  • Act as a technical reference for solutions built on AWS, Twilio, HubSpot, and Databricks
  • Lead the technical design and architecture of complex, cloud-native systems
  • Actively contribute to development through hands-on coding when needed
  • Define, maintain, and evolve coding standards, architectural guidelines, and engineering best practices
  • Perform and supervise code reviews, ensuring quality, maintainability, and scalability
  • Collaborate closely with the CTO, Project Managers, and Revenue Team to align on technical roadmaps, estimations, and priorities
  • Support planning and capacity allocation across Front-End, Back-End, and DevOps together with leadership
  • Ensure high-quality delivery by monitoring key engineering KPIs such as velocity, lead time, and defect rate
  • Drive continuous improvement initiatives across tools, processes, and ways of working
  • Embed security-by-design principles into system architecture and development workflows
What we offer
What we offer
  • A key technical leadership role with real impact on architecture, security, and delivery quality
  • Strong involvement in strategic technical decisions and organizational scaling
  • A dynamic and innovative environment where ownership and initiative are encouraged
  • Clear career growth paths toward senior technical or engineering leadership roles
  • The opportunity to shape Exelab’s engineering culture and long-term technical vision
  • Fulltime
Read More
Arrow Right
New

Senior Software Engineer, Observability

The AI Infrastructure team at Together AI is at the forefront of building and sc...
Location
Location
United States , San Francisco
Salary
Salary:
160000.00 - 260000.00 USD / Year
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems and API microservices
  • Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources
  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
  • Demonstrated experience with building and operating high-performance and/or globally distributed microservice architectures across one or more cloud providers (AWS, Azure, GCP)
Job Responsibility
Job Responsibility
  • Identify, design, and develop foundational backend services that power Together’s cloud platform
  • Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
  • Partner with product teams to understand functional requirements and deliver solutions that meet business needs
  • Write clear, well-tested, and maintainable software and IaC for both new and existing systems
  • Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance
  • Participate in an on-call rotation to address critical incidents when necessary
What we offer
What we offer
  • competitive compensation
  • startup equity
  • health insurance
  • flexibility in terms of remote work
  • Fulltime
Read More
Arrow Right

Senior Engineering Manager, Platform

We’re hiring a senior engineering manager to lead Aiven’s Internal Platform Team...
Location
Location
Finland , Helsinki
Salary
Salary:
Not provided
aiven.io Logo
Aiven Deutschland GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven track record of leading experienced teams in successful software product deliveries
  • Communicate cross-functionally and have a strong product sense
  • Experience in recruiting engineering talent
  • Experience in agile software development
  • Comfortable leading initiatives and presenting to groups
  • Fluent communication skills in English, written and verbal
  • Experience building and designing scalable platforms and distributed systems
  • Experience, preferably hands-on, in public cloud infrastructure and networking (AWS, GCP or Azure)
  • Relational database knowledge (PostgreSQL, MySQL, etc.)
  • Streaming service knowledge (Kafka, etc.)
Job Responsibility
Job Responsibility
  • Plan the team’s roadmap together with product leadership and the team
  • Managing the team’s backlog and projects
  • Be accountable for the team’s output and performance by supporting the team’s deliveries
  • Grow the team by hiring excellent engineering talent
  • Offer 1-on-1 coaching and support to the team members
  • Create a psychologically safe, high-trust environment where curiosity, healthy debate, and exchange of ideas thrive
  • Facilitate team meetings like plannings, kick-offs and retrospectives
What we offer
What we offer
  • Participate in Aiven’s equity plan
  • Balance work and life with our hybrid work policy
  • Choose the equipment you need to set yourself up for success
  • Use your Professional Development Plan budget for learning opportunities
  • Receive holistic wellbeing support through our global Employee Assistance Program
  • Inquire about our Global Time Off Commitment (Parental and Sick Leave, as well as Personal Time)
  • Enjoy country-specific benefits for our global cast
  • Fulltime
Read More
Arrow Right

Lead Software Engineer - Python

Soliton is a high-technology, that's it, software company working with the top s...
Location
Location
India , Bangalore; Coimbatore
Salary
Salary:
Not provided
solitontech.com Logo
Soliton
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive background as a Python developer with experience of 5-7 years and engineering graduation, preferably but not limited to the electronics background
  • Proficiency in Python asynchronous programming (asyncio) and strong understanding of microservices architecture
  • Good understanding of design pattern and Multithreading concepts
  • Hands on experience with CI/CD pipelines and testing frameworks, specifically pytest and familiarity with DevOps practices
  • Prior experience in Team handling is a must
  • Experience with web frameworks such as FastAPI and Django for building API-driven applications
Job Responsibility
Job Responsibility
  • Architect and Design service-based systems comprising frontend, backend, databases and AI services
  • Configure, Deploy and Scale systems and solutions using cloud infrastructure like Azure and AWS
  • Develop, Codevelop and Guide the team from the proposed architecture and design to a functional and scalable solution
  • Establish the best coding practices to develop, debug and design user interfaces
  • Lead and help the team generate and propose innovative ideas with impact analyses to customers
  • Setting the benchmark for the high standard of quality and attention to detail required to deliver beautiful and impactful solutions
  • Take the lead to plan for and identify technical risks and issues
  • Work with the Agile team to create a cohesive, common purpose and bring your team together to analyze performance metrics, retrospections and action plans
  • Implement detailed Design with required prototypes to explore new concepts
  • Play a key role in understanding the requirements and priorities of the customer and breaking them down for detailed estimation and planning
What we offer
What we offer
  • Solitons choose their work hours as long as it takes into account the requirements of the job
  • We take special care to support mothers to excel at work while they handle their responsibilities at home
  • Share a portion of our profits with all Solitons
  • Starting from your second year with us, you’ll be eligible to receive a share of the company’s profits
  • Health insurance for employees and families, gym and cycle allowance
Read More
Arrow Right