CrawlJobs Logo

Member of Technical Staff, Training Performance Engineer

cohere.com Logo

Cohere

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

As a Performance Engineer in the Pre-Training team you will be responsible for optimizing the performance of our advanced language models and systems. Their primary focus is on improving key model training metrics, such as training throughput, ensuring high accelerator utilization. The team combines expertise in software engineering, machine learning, and low-level kernel design and development to design robust systems and enhance model performance. You will work on identifying and removing performance bottlenecks, develop cutting-edge training and profiling tools to help Cohere's mission of providing efficient and reliable language understanding and generation capabilities and drive innovation in the field of natural language processing.

Job Responsibility:

  • Design and write high-performant and scalable software for training
  • Understand architectural modifications and design choices and their effects on training throughput and quality
  • Write low-level CUDA, triton kernels to squeeze every last bit of performance from our accelerators
  • Research, implement, and experiment with ideas on our supercompute and data infrastructure
  • Learn from and work with the best researchers in the field

Requirements:

  • Extremely strong software engineering skills
  • Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR
  • Experience writing kernels for GPUs using CUDA, triton, etc
  • Experience using large-scale distributed training strategies
  • Familiarity with autoregressive sequence models, such as Transformers

Nice to have:

Paper at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP)

What we offer:
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Member of Technical Staff, Training Performance Engineer

Member of Technical Staff, AI Training Infrastructure

As a Training Infrastructure Engineer, you'll design, build, and optimize the in...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
  • 3+ years of experience with distributed systems and ML infrastructure
  • Experience with PyTorch
  • Proficiency in cloud platforms (AWS, GCP, Azure)
  • Experience with containerization, orchestration (Kubernetes, Docker)
  • Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for large-scale model training workloads
  • Develop and maintain distributed training pipelines for LLMs and multimodal models
  • Optimize training performance across multiple GPUs, nodes, and data centers
  • Implement monitoring, logging, and debugging tools for training operations
  • Architect and maintain data storage solutions for large-scale training datasets
  • Automate infrastructure provisioning, scaling, and orchestration for model training
  • Collaborate with researchers to implement and optimize training methodologies
  • Analyze and improve efficiency, scalability, and cost-effectiveness of training systems
  • Troubleshoot complex performance issues in distributed training environments
What we offer
What we offer
  • meaningful equity in a fast-growing startup
  • comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Performance Optimization

We're looking for a Software Engineer focused on Performance Optimization to hel...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
  • 5+ years of experience working on performance optimization or high-performance computing systems
  • Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI)
  • Familiarity with PyTorch and performance-critical model execution
  • Experience with distributed system debugging and optimization in multi-GPU environments
  • Deep understanding of GPU architecture, parallel programming models, and compute kernels
Job Responsibility
Job Responsibility
  • Optimize system and GPU performance for high-throughput AI workloads across training and inference
  • Analyze and improve latency, throughput, memory usage, and compute efficiency
  • Profile system performance to detect and resolve GPU- and kernel-level bottlenecks
  • Implement low-level optimizations using CUDA, Triton, and other performance tooling
  • Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models)
  • Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency
  • Improve support for mixed precision, quantization, and model graph optimization
  • Build and maintain performance benchmarking and monitoring infrastructure
  • Scale inference and training systems across multi-GPU, multi-node environments
  • Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Cloud Infrastructure

As a Software Engineer on our Cloud Infrastructure team, you'll be at the forefr...
Location
Location
United States , New York, NY; San Mateo, CA; Redwood City, CA
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • 5+ years of experience designing and building backend infrastructure in cloud environments (e.g., AWS, GCP, Azure)
  • Proven experience in ML infrastructure and tooling (e.g., PyTorch, TensorFlow, Vertex AI, SageMaker, Kubernetes, etc.)
  • Strong software development skills in languages like Python, or C++
  • Deep understanding of distributed systems fundamentals: scheduling, orchestration, storage, networking, and compute optimization
Job Responsibility
Job Responsibility
  • Architect and build scalable, resilient, and high-performance backend infrastructure to support distributed training, inference, and data processing pipelines
  • Lead technical design discussions, mentor other engineers, and establish best practices for building and operating large-scale ML infrastructure
  • Design and implement core backend services (e.g., job schedulers, resource managers, autoscalers, model serving layers) with a focus on efficiency and low latency
  • Drive infrastructure optimization initiatives, including compute cost reduction, storage lifecycle management, and network performance tuning
  • Collaborate cross-functionally with ML, DevOps, and product teams to translate research and product needs into robust infrastructure solutions
  • Continuously evaluate and integrate cloud-native and open-source technologies (e.g., Kubernetes, Ray, Kubeflow, MLFlow) to enhance our platform’s capabilities and reliability
  • Own end-to-end systems from design to deployment and observability, with a strong emphasis on reliability, fault tolerance, and operational excellence
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Research

As a Member of Technical Staff on the Research team, you’ll push the boundaries ...
Location
Location
United States , San Mateo
Salary
Salary:
175000.00 - 240000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Research background in Artificial Intelligence, Machine Learning, Physics, or similar field
  • Experience solving analytical problems using analytic and quantitative approaches
  • Experience communicating research to audiences with different backgrounds
  • Experience coding in C/C++, Python, or other similar languages
Job Responsibility
Job Responsibility
  • Conduct foundational research to advance the capabilities, efficiency, and reliability of LLMs and multimodal systems
  • Design, implement, and evaluate novel model architectures, training methods, and optimization techniques
  • Collaborate with engineering teams to transition research prototypes into production-grade systems
  • Analyze empirical results, identify performance bottlenecks, and iterate quickly to improve model quality
  • Contribute to internal research strategy by identifying high-impact opportunities and emerging trends in AI
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Senior Staff Engineer

Provide engineering and consulting services for a broad array of projects and cl...
Location
Location
United States , Gahanna, Ohio
Salary
Salary:
Not provided
terracon.com Logo
Terracon Consultants, Inc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Engineering
  • Minimum 3-5 years’ experience
  • Valid driver’s license with acceptable violation history
  • Engineer-in-Training (EIT) or Engineering Intern (EI) required and ability to obtain a registration as a Professional Engineer (PE)
Job Responsibility
Job Responsibility
  • Follow safety rules, guidelines and standards for all projects
  • Participate in pre-task planning
  • Report any safety issues or concerns to management
  • Integrate quality leadership practices into daily regimen
  • Provide consistent quality standards on proposal and project delivery
  • Manage and execute small to medium projects and sometimes more complex projects
  • Develop the scope of work for routine projects
  • Prepare cost estimates, specifications, and other project related documents
  • Analyze data, and perform engineering calculations and analysis
  • Coordinate and supervise field investigations, laboratory testing, and evaluation of respective data, development of recommendations
What we offer
What we offer
  • Medical
  • dental
  • vision
  • life insurance
  • 401(k) plan
  • paid time off and holidays
  • education reimbursement
  • various bonus programs
  • Fulltime
Read More
Arrow Right

Geotechnical Senior Staff Engineer

Provide engineering and consulting services for a broad array of projects and cl...
Location
Location
United States , Midvale
Salary
Salary:
Not provided
terracon.com Logo
Terracon Consultants, Inc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Engineering
  • Minimum 3-5 years’ experience
  • Valid driver’s license with acceptable violation history
  • Engineer-in-Training (EIT) or Engineering Intern (EI) required and ability to obtain a registration as a Professional Engineer (PE)
Job Responsibility
Job Responsibility
  • Follow safety rules, guidelines and standards for all projects
  • Participate in pre-task planning
  • Report any safety issues or concerns to management
  • Integrate quality leadership practices into daily regimen
  • Provide consistent quality standards on proposal and project delivery
  • Manage and execute small to medium projects and sometimes more complex projects
  • Develop the scope of work for routine projects
  • Prepare cost estimates, specifications, and other project related documents
  • Analyze data, and perform engineering calculations and analysis
  • Coordinate and supervise field investigations, laboratory testing, and evaluation of respective data, development of recommendations
What we offer
What we offer
  • medical
  • dental
  • vision
  • life insurance
  • 401(k) plan
  • paid time off and holidays
  • education reimbursement
  • various bonus programs
  • Fulltime
Read More
Arrow Right

Engineering Lab Technical Lead

Working in an Engineering R&D laboratory with the responsibility of providing le...
Location
Location
United States , Easton
Salary
Salary:
Not provided
victaulic.com Logo
Victaulic
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven Leadership Capabilities
  • Ability to effectively provide hands-on job training to technical staff
  • Respect for Job, Co-Workers & Company
  • In-depth knowledge and familiarity with Victaulic products
  • Associate and/or technical degree preferred
  • Minimum five (5) years industrial or lab experience
  • Proficient in blueprint reading, shop math and precision instrument operation
  • Ability to maintain records and logs
  • utilizing software to prepare charts and graphs
  • Capable of taking verbal and written instructions
Job Responsibility
Job Responsibility
  • Responsible for leading technical technician team, ensuring work orders are being processed timely, supporting company priorities, identifying and resolving problems including resource capacity limitations, and providing technical advice and development to team members
  • Responsible for working with your technical staff to perform work orders and promote and drive team development (in teaching/training and learning) by aiding in seeking internal or external training opportunities
  • Use employee performance appraisals as a professional development tool to provide goals, direction, and guidance for group members
  • Encourage your team’s communication with engineering staff and other departments to keep them advised of developments with their requested support
  • Encourage continuous improvement developments in the lab in accordance with Lean & Safety initiatives
  • Monitor the quality of the team’s work, overseeing team’s adherence to established lab methods and procedures
  • Communicate resource constraints to lab management as necessary, if unable to resolve timely
  • Be able to perform all tasks of team members following established procedures for doing such work, as required
  • Inspection of samples, castings, gaskets or rubber components for defects and compliance with Engineering Specifications and Drawings
  • Assemble products for hydrostatic, flex, bending moment test, low temperature testing, and heat aging to comply with Engineering specifications
  • Fulltime
Read More
Arrow Right

Senior Staff Machine Learning Engineer

Help design our AI platform and develop our next generation of machine learning ...
Location
Location
United States , San Francisco
Salary
Salary:
216500.00 - 324500.00 USD / Year
gofundme.com Logo
GoFundMe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields
  • Experience emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation
  • Extensive experience designing, developing, and operating scalable backend systems
  • Experience applying software engineering best practices such as domain-driven design, event-driven architectures, and microservices
  • Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices
  • Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design
  • Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices
  • Strong leadership skills, including effective planning and management of complex projects, mentoring of team members, and fostering a collaborative, high-performing engineering culture
  • Excellent communicator, able to articulate complex technical concepts clearly to both technical and non-technical stakeholders
  • Bachelor's degree in Computer Science, Software Engineering, or a related technical field (preferred)
Job Responsibility
Job Responsibility
  • Design and implement AI platforms to enable scalable and secure access to LLMs from multiple model providers for diverse use cases
  • Design and implement agentic workflows, agentic tool ecosystems, and LLM prompt management solutions
  • Design, build, and optimize scalable model training, fine tuning, and inference pipelines, ensuring robust integration with production systems
  • Influence technical strategy and approach to developing embedding stores, vector databases, and other reusable assets
  • Lead initiatives to streamline ML and AI workflows, improve operational efficiency, and establish standardized procedures to achieve consistent, high-quality results across our AI systems
  • Design and develop backend services and RESTful APIs using Python and FastAPI, integrating seamlessly with ML pipelines and services
  • Take operational responsibility for team-owned services, including performance monitoring, optimization, troubleshooting, and participation in an on-call rotation
  • Collaborate with both technical and non-technical colleagues, including data and applied scientists, software engineers, product managers, and business stakeholders, to deliver reliable and scalable ML-driven products
  • Coach and mentor fellow ML engineers, promoting a culture of collaboration, continuous improvement, and engineering excellence within the team
  • Employ a diverse set of tools and platforms including Python, AWS, Databricks, Docker, Kubernetes, FastAPI, Terraform, Snowflake, Coralogix, and GitHub to build, deploy, and maintain scalable, highly available machine learning infrastructure
What we offer
What we offer
  • Competitive pay
  • Comprehensive healthcare benefits
  • Financial assistance for things like hybrid work, family planning
  • Generous parental leave
  • Flexible time-off policies
  • Mental health and wellness resources
  • Learning, development, and recognition programs
  • Fulltime
Read More
Arrow Right