CrawlJobs Logo

Member of Technical Staff, Training Infra Engineer

cohere.com Logo

Cohere

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Contribute in and provide strong support for model training pipelines, ship state of the art models to production, and bridge the gap between research and production. We have one of the highest ratio of compute to engineers in the world. We do not delineate strongly between engineering and research. Everyone will contribute to writing production code and supporting our research effort depending on individual interest and organizational needs. We have all the compute, data, and talent available for you to do your best work.

Job Responsibility:

  • Design and write high-performant and scalable software for training
  • Improve our training setup from an infrastructure and codebase performance standpoint
  • Craft and implement tools to speed up our training cycles and improve the overall efficacy of our training infrastructure
  • Research, implement, and experiment with ideas on our supercompute and data infrastructure
  • Learn from and work with the best researchers in the field

Requirements:

  • Extremely strong software engineering skills
  • Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR
  • Experience with distributed training infrastructures (Kubernetes, Slurm) and associated frameworks (Ray)
  • Experience using large-scale distributed training strategies
  • Hands on experience on training large model at scale and having contributed to the tooling and/or setup of the training infrastructure

Nice to have:

paper at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP)

What we offer:
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 2204 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Member of Technical Staff, Training Infra Engineer

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Chevy Chase; New York City; Palo Alto
Salary
Salary:
115000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Palo Alto
Salary
Salary:
90000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Data Infra - MAI Superintelligence Team

Help build the world’s most advanced multimodal dataset at Microsoft AI. We are ...
Location
Location
United States , Mountain View
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering
  • OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 8+ years experience in business analytics, data science, software development, data modeling, or data engineering
  • OR equivalent experience
  • Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 12+ years experience in business analytics, data science, software development, data modeling, or data engineering
  • OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 15+ years experience in business analytics, data science, software development, data modeling, or data engineering
  • OR equivalent experience
  • 4+ years experience with data governance, data compliance and/or data security
  • Passionate about the role of data in large-scale AI model training
  • Thrive in a highly collaborative, fast-paced environment
  • Have a high degree of expertise and pay close attention to details
Job Responsibility
Job Responsibility
  • Design and develop data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video)
  • Own and maintain critical data infrastructures, including spark, ray, vector databases, and others
  • Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models
  • Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation
  • Embody our culture and values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Data Infra - MAI Superintelligence Team

Help build the world’s most advanced multimodal dataset at Microsoft AI. We are ...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling or data engineering work
  • OR Master’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ year(s) experience in business analytics, data science, software development, or data engineering work
  • OR equivalent experience
  • Bachelor’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 8+ years experience in business analytics, data science, software development, data modeling or data engineering work
  • OR Master’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years of business analytics, data science, software development, data modeling or data engineering work experience
  • OR equivalent experience
Job Responsibility
Job Responsibility
  • Design and develop data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video)
  • Own and maintain critical data infrastructures, including spark, ray, vector databases, and others
  • Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models
  • Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation
  • Embody our culture and values
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Multimodal Infrastructure

Microsoft AI is looking for a Member of Technical Staff, Multimodal Infrastructu...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Experience in multi-modal data processing: Strong proficiency in distributed data processing infra (resource utilization management, fault tolerance, ray & spark) and CPU/GPU batch processing optimizations
  • Experience with state-of-art model inference and serving frameworks
  • Experience with image/video/audio data processing
  • Experience with common data formats for efficient I/O
  • Experience in multi-modal pretraining and post-training: Strong proficiency in deep learning frameworks such as PyTorch, Megatron and Deepspeed
  • Knowledge of auto-regressive and diffusion transformer models
  • Experience with distributed training techniques such as data parallelism, model parallelism, and pipeline parallelism
  • Proven experiences in at least one of the following areas: image/video generation and editing
  • efficient architectures (e.g., MoE, window attention)
Job Responsibility
Job Responsibility
  • Design, develop and maintain large-scale multimodal data processing pipelines
  • Design, develop and maintain large-scale multimodal model pretraining and post-training frameworks
  • Design, develop and maintain large-scale multimodal model inference and serving frameworks
  • Work with research scientists and product engineers to solve infra-related problems
  • Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively
  • Enjoy working in a fast-paced, design-driven, product development cycle
  • Embody our Culture and Values
  • Fulltime
Read More
Arrow Right

Production Engineering Manager, Rotational Network Engineering (RNE) Program

This is unique role within Meta's Infrastructure organization. Meta’s Rotational...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years of Networking, System Administration, Software Engineering or Product Development experience
  • Familiarity with source control, software development cycles and practices
  • Experience with launching and iterating on product, services, tools or technical frameworks
  • Experience managing an engineering team
  • B.S. in Engineering or equivalent experience
  • Analytical and troubleshooting skills
Job Responsibility
Job Responsibility
  • Build a plan for each team member with technical leads and mentors from Network Infrastructure, Backbone, and Datacenter Engineering
  • Establish and foster fruitful working relationships with various stakeholders and teams within Network Infra
  • Manage and grow multi-disciplinary recruiting plans across universities and industry
  • Develop and manage work plans from recruiting, to mentor and task selection to team assignment
  • Manage expectations of all interested parties: define clear program roadmap with key deliverables and milestone dates, maintain program information wiki pages, and identify and communicate risks and adjustments to the overall program to meet recruitment demands from Network Engineering teams
  • Understand the network product delivery cycle
  • Work closely with dedicated recruiting staff to expand the team, including sourcing candidates, interviewing candidates, participating in conferences/events, and on-boarding new employees
  • Influence Network Infrastructure teams for their buy in to the program, obtain agreement to provide mentors and projects and to consider rotational engineers as one of their hiring pipelines
  • Enable and unblock engineers through coaching, learning, and mentorship programs
  • Responsible for people management of a team of engineers, providing performance reviews, continual feedback, coaching and career growth for direct reports
Read More
Arrow Right

Shift Supervisor

We're building a world of health around every individual — shaping a more connec...
Location
Location
United States , Richfield
Salary
Salary:
16.50 - 24.00 USD / Hour
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
June 22, 2026
Flip Icon
Requirements
Requirements
  • Deductive reasoning ability, analytical skills and computer skills
  • Advanced communication skills and supervision skills
  • Ability to work a flexible schedule, including some early morning, overnight and weekend shifts, to work overtime as needed, and to respond to urgent issues at the store when they arise
Job Responsibility
Job Responsibility
  • Work effectively with store management and store crews
  • Supervise the store's crew through assigning, directing and following up of all activities
  • Effectively communicate information both to and from store management and crews
  • Assist customers with their questions, problems and complaints
  • Promote CVS customer service culture (Greet, offer help, and thank)
  • Handle all customer relations issues in accordance with company policy and promote a positive shopping experience for all CVS customers
  • Maintain customer/patient confidentiality
  • Price merchandise
  • Stock shelves
  • Execute the displays, sign and inventory of weekly, promotional, and seasonal merchandise
What we offer
What we offer
  • Dental
  • Vision
  • Wellness resources
  • Employee discounts
  • Access to certain voluntary benefits
  • Other programs
  • Parttime
Read More
Arrow Right

Mobile Associate, Bilingual Preferred - Retail Sales

This role supports retail operations by engaging customers and facilitating thei...
Location
Location
United States , St. Louis
Salary
Salary:
18.00 - 20.00 USD / Hour
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High School Diploma/GED (Required)
  • 6 months of customer service and/or sales experience, Retail environment (Preferred)
  • Passionate customer advocate with the desire to be yourself when connecting and having fun doing it! (Required)
  • Competitive drive and proven ability to succeed in a fast-paced sales environment. (Required)
  • Willingness to work alongside peers and store leaders, learning and sharing ideas, while serving customers and providing resolutions to issues. (Required)
  • Effective at balancing customer needs and performance goals. (Required)
  • At least 18 years of age
  • Legally authorized to work in the United States
Job Responsibility
Job Responsibility
  • Develop proficiency in customer service and sales to deliver personalized technology and service solutions that meet individual needs
  • Utilize digital tools to demonstrate network coverage, service plans, and product features to enhance customer understanding and engagement
  • Complete required training to build knowledge of retail processes, systems, and wireless technology innovations
  • Collaborate with colleagues across channels to support a seamless customer experience and contribute to team initiatives
  • Also responsible for other duties/projects as assigned by business management as needed
What we offer
What we offer
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Free year-round money coaches
  • Medical, dental and vision insurance
  • Flexible spending account
  • Paid time off and up to 12 paid holidays
  • Paid parental and family leave
  • Family building benefits
  • Back-up care
  • Parttime
Read More
Arrow Right