CrawlJobs Logo

Staff Software Engineer, Inference Infrastructure

cohere.com Logo

Cohere

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products. Join us on our mission and shape the future!

Job Responsibility:

  • Developing, deploying, and operating the AI platform delivering Cohere's large language models through easy to use API endpoints
  • Working closely with many teams to deploy optimized NLP models to production in low latency, high throughput, and high availability environments
  • Interfacing with customers and creating customized deployments to meet their specific needs

Requirements:

  • 5+ years of engineering experience running production infrastructure at a large scale
  • Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads on those clusters
  • Experience with Kubernetes dev and production coding and support
  • Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving
  • Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments
  • Experience in compute/storage/network resource and cost management
  • Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork
  • The grit and adaptability to solve complex technical challenges that evolve day to day
  • Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference
  • Strong understanding or working experience with distributed systems
  • Experience in Golang, C++ or other languages designed for high-performance scalable servers)
What we offer:
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Staff Software Engineer, Inference Infrastructure

Senior Staff Machine Learning Engineer

Help design our AI platform and develop our next generation of machine learning ...
Location
Location
United States , San Francisco
Salary
Salary:
216500.00 - 324500.00 USD / Year
gofundme.com Logo
GoFundMe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields
  • Experience emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation
  • Extensive experience designing, developing, and operating scalable backend systems
  • Experience applying software engineering best practices such as domain-driven design, event-driven architectures, and microservices
  • Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices
  • Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design
  • Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices
  • Strong leadership skills, including effective planning and management of complex projects, mentoring of team members, and fostering a collaborative, high-performing engineering culture
  • Excellent communicator, able to articulate complex technical concepts clearly to both technical and non-technical stakeholders
  • Bachelor's degree in Computer Science, Software Engineering, or a related technical field (preferred)
Job Responsibility
Job Responsibility
  • Design and implement AI platforms to enable scalable and secure access to LLMs from multiple model providers for diverse use cases
  • Design and implement agentic workflows, agentic tool ecosystems, and LLM prompt management solutions
  • Design, build, and optimize scalable model training, fine tuning, and inference pipelines, ensuring robust integration with production systems
  • Influence technical strategy and approach to developing embedding stores, vector databases, and other reusable assets
  • Lead initiatives to streamline ML and AI workflows, improve operational efficiency, and establish standardized procedures to achieve consistent, high-quality results across our AI systems
  • Design and develop backend services and RESTful APIs using Python and FastAPI, integrating seamlessly with ML pipelines and services
  • Take operational responsibility for team-owned services, including performance monitoring, optimization, troubleshooting, and participation in an on-call rotation
  • Collaborate with both technical and non-technical colleagues, including data and applied scientists, software engineers, product managers, and business stakeholders, to deliver reliable and scalable ML-driven products
  • Coach and mentor fellow ML engineers, promoting a culture of collaboration, continuous improvement, and engineering excellence within the team
  • Employ a diverse set of tools and platforms including Python, AWS, Databricks, Docker, Kubernetes, FastAPI, Terraform, Snowflake, Coralogix, and GitHub to build, deploy, and maintain scalable, highly available machine learning infrastructure
What we offer
What we offer
  • Competitive pay
  • Comprehensive healthcare benefits
  • Financial assistance for things like hybrid work, family planning
  • Generous parental leave
  • Flexible time-off policies
  • Mental health and wellness resources
  • Learning, development, and recognition programs
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Cloud Infrastructure

As a Software Engineer on our Cloud Infrastructure team, you'll be at the forefr...
Location
Location
United States , New York, NY; San Mateo, CA; Redwood City, CA
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • 5+ years of experience designing and building backend infrastructure in cloud environments (e.g., AWS, GCP, Azure)
  • Proven experience in ML infrastructure and tooling (e.g., PyTorch, TensorFlow, Vertex AI, SageMaker, Kubernetes, etc.)
  • Strong software development skills in languages like Python, or C++
  • Deep understanding of distributed systems fundamentals: scheduling, orchestration, storage, networking, and compute optimization
Job Responsibility
Job Responsibility
  • Architect and build scalable, resilient, and high-performance backend infrastructure to support distributed training, inference, and data processing pipelines
  • Lead technical design discussions, mentor other engineers, and establish best practices for building and operating large-scale ML infrastructure
  • Design and implement core backend services (e.g., job schedulers, resource managers, autoscalers, model serving layers) with a focus on efficiency and low latency
  • Drive infrastructure optimization initiatives, including compute cost reduction, storage lifecycle management, and network performance tuning
  • Collaborate cross-functionally with ML, DevOps, and product teams to translate research and product needs into robust infrastructure solutions
  • Continuously evaluate and integrate cloud-native and open-source technologies (e.g., Kubernetes, Ray, Kubeflow, MLFlow) to enhance our platform’s capabilities and reliability
  • Own end-to-end systems from design to deployment and observability, with a strong emphasis on reliability, fault tolerance, and operational excellence
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Staff Machine Learning Engineer

As a Staff Machine Learning Engineer at Aignostics, you will play a crucial role...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
aignostics.com Logo
Aignostics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Advanced degree in a relevant field or extensive work experience
  • 8+ years of industry experience, with at least 2 years as Staff Engineer or an equivalent role
  • Proven track record of driving technical excellence and innovation
  • Solid background in data-intensive systems and software architecture, design patterns and clean coding
  • Expert Python programming and fluency in C/C++ or other low-level language(s)
  • Experience with designing and implementing large-scale, distributed ML systems and platforms
  • Proven track record of deploying ML models into production environments
  • Strong knowledge of machine learning fundamentals
  • Experience with deep learning frameworks (e.g. Pytorch and Tensorflow) and state-of-the-art techniques (e.g. generative models)
  • Deep understanding of cloud technologies (e.g. GCP, AWS), containerization and orchestration (Kubernetes)
Job Responsibility
Job Responsibility
  • Define and drive the technical architecture and system design principles for our AI platform and infrastructure
  • Work in close collaboration with engineering leads to build flexible frameworks and systems for model training, evaluation and inference across different pathology applications
  • Guide the CTO office, product management and fellow engineering leads through complex decisions by providing expert consultation on feasibility, architecture, trade-offs and risk mitigation strategies, while ensuring alignment with our technical vision
  • Foster technical alignment across teams by establishing shared architectural principles and best practices, facilitating cross-team design reviews to enable consistent decision-making across domains
  • Champion technical excellence by leading strategic initiatives that modernize our architecture and reduce technical debt while measuring and improving our technical health metrics
  • Elevate the technical capabilities of our engineering staff through structured mentoring, workshops and establishing comprehensive technical guidelines that enable teams to make better design decisions
  • Drive innovation by evaluating emerging technologies, leading proof-of-concept initiatives and building support for strategic technical investments that advance our engineering capabilities while ensuring measurable business value
What we offer
What we offer
  • Cutting-edge AI research and development, with involvement of Charité, TU Berlin and our other partners
  • Work with a welcoming, diverse and highly international team of colleagues
  • Opportunity to take responsibility and grow your role within the startup
  • Expand your skills by benefitting from our Learning & Development yearly budget of 1,000 € (plus 2 L&D days), language classes and internal development programs
  • Mentoring program, you’ll learn from great experts
  • Flexible working hours and teleworking policy
  • 30 paid vacations days per year
  • We are family & pet friendly and support flexible parental leave options
  • Pick a subsidized membership of your choice among public transport, sports and well-being
  • Enjoy our social gatherings, lunches and off-site events for a fun and inclusive work environment
Read More
Arrow Right

Staff Product Security Engineer

We’re looking for a Staff Product Security Engineer to lead the design and imple...
Location
Location
United States
Salary
Salary:
184000.00 - 252000.00 USD / Year
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in product, application, or cloud security engineering
  • Deep understanding of secure SDLC, threat modeling, and secure architecture design
  • Proven expertise with AWS cloud security concepts and best practices
  • Strong experience with container security, orchestration, and runtime protection
  • Proficiency in Python, Java, and/or JavaScript for security automation, code review, and tooling
  • Experience securing AI/ML pipelines, data workflows, or model-serving infrastructure
  • Familiarity with DevSecOps and continuous integration/deployment environments
Job Responsibility
Job Responsibility
  • Embed robust security practices throughout the software and AI development lifecycle (SDLC)
  • Lead secure design reviews, threat modeling, and risk assessments for AI-driven products, APIs, and backend services
  • Partner with engineering and product teams to ensure security, privacy, and compliance by design
  • Build and maintain security automation and governance frameworks that integrate seamlessly into development workflows
  • Architect and enforce security controls for AI/ML systems, including model training, data pipelines, and inference environments
  • Identify and mitigate AI-specific attack vectors such as data poisoning, model inversion, prompt injection, and model theft
  • Collaborate with governance and compliance teams to align with ethical AI principles and frameworks like NIST AI RMF and the EU AI Act
  • Implement model provenance, integrity, and auditability controls to ensure responsible and secure AI operations
  • Partner with DevOps and SRE teams to secure service meshes, container networking, and secrets management
  • Drive software supply chain security, including artifact integrity, dependency management, and vulnerability reduction
What we offer
What we offer
  • Competitive compensation, benefits, and career growth opportunities
  • Opportunity to shape and drive product security strategy
  • Collaborative and security-minded engineering culture
  • Work on cutting-edge security challenges in a fast-growing company
  • Performance-based bonus, equity, and a generous benefits program
  • Fulltime
Read More
Arrow Right
New

Staff Infrastructure Software Engineer, Enterprise AI

Scale GP is building the next generation of enterprise-grade Generative AI produ...
Location
Location
United States , New York; San Francisco
Salary
Salary:
216200.00 - 270250.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in a senior role
  • 5+ years of full-time software engineering experience
  • Deep understanding of modern infrastructure practices, including CI/CD, IaC (e.g., Terraform, Helm Charts), container orchestration (e.g., Kubernetes) and observability platforms (e.g., Datadog, Prometheus, Grafana)
  • Extensive experience with at least one major cloud provider (AWS, Azure, or GCP)
  • Strong knowledge of security and compliance in enterprise environments, with a focus on access management, data isolation, and customer-specific VPC setups
  • Proficiency in Python or JavaScript/TypeScript, and SQL
Job Responsibility
Job Responsibility
  • Define the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers
  • Lead the infrastructure roadmap with a strong focus on compliance, privacy, and security standards, including designing change management and data isolation strategies
  • Own the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics) to proactively ensure system health and enable rapid incident response
  • Drive developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms throughout the engineering organization
  • Solve the toughest engineering problems related to multi-tenancy, data isolation, and high-performance inference at a massive scale, taking end-to-end ownership across the full product lifecycle
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • equity based compensation
  • additional benefits such as a commuter stipend
  • Fulltime
Read More
Arrow Right

Staff ML Infrastructure Engineer

We are seeking a Staff / Principal ML Infrastructure Engineer to lead the design...
Location
Location
United States , San Francisco
Salary
Salary:
Not provided
darwinrecruitment.com Logo
Darwin Recruitment GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of software engineering experience, including 3+ years building production ML systems
  • Deep experience with distributed training and inference frameworks (e.g., PyTorch, JAX, TensorFlow)
  • Familiarity with model serving technologies and orchestration (e.g., Triton, Ray, Kubernetes)
  • Strong understanding of GPU/TPU infrastructure, performance optimization, and scalability challenges
  • Proven experience solving reliability, latency, and cost trade-offs in production ML systems
  • Excellent collaboration, communication, and problem-solving skills
Job Responsibility
Job Responsibility
  • Design, implement, and maintain high-performance infrastructure for training and serving LLMs
  • Optimize model pipelines for efficiency, latency, and cost at scale
  • Collaborate with ML researchers, platform engineers, and product teams to deploy models safely into production
  • Build monitoring, alerting, and tooling to ensure reliability and observability of large-scale ML systems
  • Evaluate and integrate new frameworks, tools, and architectures to improve ML workflows
  • Provide technical leadership and mentorship to other engineers on the team
What we offer
What we offer
  • Flexible work arrangements and competitive compensation
  • Fulltime
Read More
Arrow Right

Staff ML Infrastructure Engineer

We are seeking a Staff / Principal ML Infrastructure Engineer to lead the design...
Location
Location
United States , New York
Salary
Salary:
Not provided
darwinrecruitment.com Logo
Darwin Recruitment GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of software engineering experience
  • 3+ years building production ML systems
  • Deep experience with distributed training and inference frameworks (e.g., PyTorch, JAX, TensorFlow)
  • Familiarity with model serving technologies and orchestration (e.g., Triton, Ray, Kubernetes)
  • Strong understanding of GPU/TPU infrastructure, performance optimization, and scalability challenges
  • Proven experience solving reliability, latency, and cost trade-offs in production ML systems
  • Excellent collaboration, communication, and problem-solving skills
Job Responsibility
Job Responsibility
  • Design, implement, and maintain high-performance infrastructure for training and serving LLMs
  • Optimize model pipelines for efficiency, latency, and cost at scale
  • Collaborate with ML researchers, platform engineers, and product teams to deploy models safely into production
  • Build monitoring, alerting, and tooling to ensure reliability and observability of large-scale ML systems
  • Evaluate and integrate new frameworks, tools, and architectures to improve ML workflows
  • Provide technical leadership and mentorship to other engineers on the team
What we offer
What we offer
  • Flexible work arrangements
  • competitive compensation
  • Fulltime
Read More
Arrow Right
New

Staff Software Engineer - AI Applications

Vanilla is seeking a Staff Software Engineer - AI Applications with a strong bac...
Location
Location
United States
Salary
Salary:
190000.00 - 210000.00 USD / Year
Vanilla Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, a related field, or equivalent practical experience
  • 8+ years relevant work experience
  • Proficiency in modern programming languages such as Python or Javascript
  • Experience with OpenAI, Anthropic, or similar for both chat and API interfaces
  • Deep understanding of machine learning and AI technologies, including the ability to design, train, and implement machine learning models and use natural language processing techniques for automation
  • Production experience with scalability and best-practices of AI infrastructure
  • Must have experience with AI observability, monitoring, and signaling using tools like LangChain or LangGraph
  • Hands-on experience using RAG and chunking to tune LLM performance
  • Experienced with LLM orchestration tooling and decision frameworks
  • Experience or exposure building agentic capabilities and workflows
Job Responsibility
Job Responsibility
  • Machine learning and AI: You are passionate and knowledgeable about the current and future state of AI
  • You will be utilizing existing Large Language Models to build applied AI applications focused on producing high accuracy rates. Your software engineer skills will come into play here as you'll take ownership in constructing services to ingest results
  • You will work with product, and engineering teams and build models/services that can ingest data, extract key information and surface insights
  • You can build tooling to support model training, evaluation, inference serving, monitoring and alerting
  • You want to use the latest ML frameworks and open source tools to develop new model training pipelines
  • Hands On Coding: You have direct experience with software engineering and are familiar with modern languages like Python, Javascript, Go, Rust
  • You have experience building microservices and understand the tradeoffs of the approach
  • Data handling: You can identify, extract, transform, and load data from disparate sources into a centralized system. You are able to normalize, cleanse, and validate this data
  • Database management: You are able to design and implement schemas, optimize queries, and manage database performance
  • Project management: You must be an effective self-organizer: prioritize tasks, manage resources, and communicate effectively with non-technical stakeholders
What we offer
What we offer
  • Flexible paid time off policy and 10 company-wide paid holidays
  • Parental leave, 4 weeks for all full-time employees and up to 12 weeks for birthing parents
  • Medical, dental, and vision benefits coverage for employees and their families
  • 401K eligibility after one month of employment
  • Free estate planning documents
  • Budget for learning & development and home office setup
  • Paid parking or transit for hybrid and in office employees
  • Fulltime
Read More
Arrow Right