CrawlJobs Logo

Full-Stack Software Engineer, Inference

cohere.com Logo

Cohere

Location Icon

Location:

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.

Job Responsibility:

  • Improve the platform’s auth, billing, and payment systems
  • Add new features to the interactive Playground where customers can try our models
  • Implement new platform features for managing deployments
  • Write and ship minimal code that runs in low-resource environments, and has highly stringent deployment mechanisms
  • As security and privacy are paramount, you will sometimes need to reinvent the wheel, and won’t be able to use the most popular libraries or tooling

Requirements:

  • 5+ years of experience writing clean backend code
  • Experience with Golang and React
  • Built payment systems and have experience with subscription or usage-based SaaS, and/or products with a freemium model
  • Strong coding abilities and comfortable working across the stack
  • Worked in both large enterprises and startups
  • Excel in fast-paced environments and can execute while priorities and objectives are a moving target
What we offer:
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Full-Stack Software Engineer, Inference

Software Engineer, Full-Stack

We’re seeking a Full-Stack Software Engineer to play a highly impactful role in ...
Location
Location
United States , San Mateo
Salary
Salary:
Not provided
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3 - 7 years of software engineering experience
  • Deeply understand how a product fits into the business landscape
  • Proficiency in TypeScript and Python
  • Be a customer obsessed engineer who loves talking to users and getting feedback
  • Strong ability to make design decisions and craft great experiences
  • Willing to think outside of the box and build a product from scratch for users to serve new needs and use cases
  • Understanding of responsive design, component-based architecture, and UX fundamentals
  • Strong communication and collaboration skills
Job Responsibility
Job Responsibility
  • Contribute to the Fireworks Platform (developer-facing web app, serverless and on-demand inference, Python SDK) alongside other team members
  • Design and implement full stack technical features to address business problems
  • Ship features that users care about, iterate rapidly and ideate constantly
  • Rapidly prototype and experiment with a data driven focus
  • Own feature development from backend APIs to frontend user interfaces
  • Directly engage with users through various channels (Discord, meetups, etc.) and convert their needs into shipped features
  • Be able to explain why a feature matters to customers as well as its importance in the competitive landscape
  • Be a user of the inference platform to have a deep sense of what’s working and what’s not working in the product
What we offer
What we offer
  • Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure
  • Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally
  • Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results
  • Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation
  • Fulltime
Read More
Arrow Right

Full Stack Engineer

Join a small, mission-driven startup building voice-first AI technology to impro...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 185000.00 USD / Year
weareorbis.com Logo
Orbis Consultants
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years full-stack experience, strong backend focus (Node.js, Next.js/React, TypeScript/JavaScript, Nest.js)
  • Experience with REST APIs, SQL/PostgreSQL (Prisma/GraphQL a plus), Docker, CI/CD
  • Startup mindset, collaborative, proactive, and mission-driven
Job Responsibility
Job Responsibility
  • Build backend infrastructure, APIs, and data pipelines powering AI research and real-time inference
  • Ensure HIPAA-compliant, secure systems that integrate with healthcare software
  • Collaborate with clinical and regulatory teams while shaping startup architecture and strategy
  • Fulltime
Read More
Arrow Right

Software Engineer, Research - Human Data

OpenAI’s mission is to ensure that artificial general intelligence (AGI) benefit...
Location
Location
United States; United Kingdom , San Francisco; London
Salary
Salary:
230000.00 - 385000.00 USD / Year
openai.com Logo
OpenAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering fundamentals
  • Experience building production systems at scale
  • Enjoy full-stack development with end-to-end ownership
  • Motivated by high-impact collaboration with research teams and solving novel, ambiguous problems
  • Excited to shape how AI systems learn from human preferences and reflect a broad range of human values
  • Care deeply about inclusive tooling and building systems that enhance model safety, reliability, and usefulness
Job Responsibility
Job Responsibility
  • Build and maintain robust full-stack systems for feedback collection, data labeling, and evaluation pipelines, while maintaining high levels of security
  • Translate experimental alignment research into scalable production infrastructure, including inference and model training stacks
  • Design and iterate on user-facing tools and backend services to support high-quality data workflows
  • Partner with researchers, engineers, and program leads to shape feedback loops and model interaction paradigms
  • Drive infrastructure improvements that enable faster iteration and scaling across OpenAI’s frontier models, from internal research tooling all the way to production ChatGPT
What we offer
What we offer
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Fulltime
Read More
Arrow Right

Head of Inference Kernels

As a core member of the team, you will play a pivotal role in leading a high-per...
Location
Location
United States , San Jose
Salary
Salary:
200000.00 - 300000.00 USD / Year
etched.com Logo
Etched
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in designing and optimizing GPU kernels for deep learning on GPUs using CUDA, and assembly (ASM)
  • Experience with low-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance
  • Deep fluency with transformer inference architecture, optimization levers, and full-stack systems (e.g., vLLM, custom runtimes)
  • History of delivering tangible perf wins on GPU hardware or custom AI accelerators
  • Solid understanding of roofline models of compute throughput, memory bandwidth and interconnect performance
  • Experienced in running large-scale workloads on heterogeneous compute clusters, optimizing for efficiency and scalability of AI workloads
  • Scopes projects crisply, sets aggressive but realistic milestones, and drives technical decision-making across the team
  • Anticipates blockers and shifts resources proactively
Job Responsibility
Job Responsibility
  • Architect Best-in-Class Inference Performance on Sohu: Deliver continuous batching throughput exceeding B200 by ≥10x on priority workloads
  • Develop Best-in-Performance Inference Mega Kernels: Develop complex, fused kernels that increase chip utilization and reduce inference latency, and validate these optimizations through benchmarking and regression-tested in production pipelines
  • Architect Model Mapping Strategies: Develop system level optimizations using a mix of techniques such tensor parallelism and expert parallelism for optimal performance
  • Hardware-Software Co-design of Inference-time Algorithmic Innovation: Develop and deploy production-ready inference-time algorithmic improvements (e.g., speculative decoding, prefill-decode disaggregation, KV cache offloading)
  • Build Scalable Team and Roadmap: Grow and retain a team of high-performing inference optimization engineers
  • Cross-Functional Performance Alignment: Ensure inference stack and performance goals are aligned with the software infrastructure teams, GTM and hardware teams for future generations of our hardware
What we offer
What we offer
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • significant equity package
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

The AI & Innovation team at Microsoft Suzhou is seeking a highly motivated Senio...
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, Electrical Engineering, or related technical field AND 4+ years of technical engineering experience with coding in languages such as Python, C++, or C#
  • OR equivalent industry experience
  • 7+ years of software engineering experience with a focus on AI/ML systems
  • Proven experience with one or more of the following: Developing or applying generative AI models
  • Building and optimizing inference pipelines for large AI models on cloud infrastructure
  • Integrating AI features into consumer-facing web or mobile applications at scale
  • Working with programmatic advertising ecosystems
  • Familiarity with cloud services (Azure preferred), microservices architecture, and DevOps practices
  • Hands-on experience in at least two of the three core areas: AI/ML Prototyping: Experience with deep learning frameworks (PyTorch, TensorFlow) and implementing/tuning models from recent literature
  • Video/Graphics Processing: Experience with video codecs (FFmpeg), computer graphics, GPU programming (CUDA), or real-time media pipelines
Job Responsibility
Job Responsibility
  • Rapid AI Prototyping: Design, build, and iterate on high-potential prototypes for AI-powered video generation, editing, and content understanding
  • System Integration & Productionization: Bridge the gap between research prototypes and production-ready systems
  • Integrate AI video generation capabilities with large-scale advertising platforms and consumer products
  • Full-Stack Development: Develop end-to-end solutions encompassing backend AI service APIs, model inference optimization, and frontend interfaces
  • Cross-Functional Collaboration: Work closely with Applied Scientists, Machine Learning Engineers, Product Managers, and Ads Platform teams
  • Technical Leadership: Drive architectural decisions for scalable, reliable, and cost-effective AI service deployment
  • Mentor junior engineers and promote engineering best practices
  • Live Site Ownership: Participate in on-call rotations and act as a Designated Responsible Individual (DRI) to ensure the health, performance, and reliability of services
  • Fulltime
Read More
Arrow Right

Ai Solutions Architect / Field Application Engineer

We are looking for an AI enthusiast with strong technical fundamentals and custo...
Location
Location
United States , Austin
Salary
Salary:
102320.00 - 153480.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Electrical Engineering, Computer Engineering, or a related field (or equivalent practical experience)
  • Strong interest in AI/ML technologies and a desire to work across hardware and software layers
  • Hands-on experience with Linux-based systems
  • Programming experience in one or more of the following: Python, C/C++, Bash
  • Familiarity with AI frameworks or tools (e.g., PyTorch, TensorFlow, ONNX, Hugging Face, or similar)
  • Strong communication skills with the ability to explain technical concepts clearly
  • Ability to work effectively in a team-oriented, cross-functional environment
Job Responsibility
Job Responsibility
  • Serve as a technical point of contact for customers, supporting AI and HPC workloads on AMD CPU and GPU platforms
  • Work directly with customers to understand their use cases, requirements, and constraints, and guide them through solution design and deployment
  • Deliver technical presentations, demos, and architecture walkthroughs to both technical and non-technical audiences
  • Program-manage customer opportunities as they grow in complexity, coordinating activities across internal and external stakeholders
  • Perform hands-on system bring-up including hardware installation, firmware configuration, OS installation, and driver setup
  • Deploy and validate open-source AI and HPC software stacks (e.g., Linux, ROCm, AI frameworks, containers)
  • Run functionality, performance, and scalability benchmarks on CPU and GPU workloads
  • Perform first-level profiling and analysis of applications to identify performance bottlenecks and optimization opportunities
  • Support AI workloads such as training, inference, and data preprocessing across CPU and GPU platforms
  • Develop working knowledge of AMD CPU and GPU architectures and how they impact real-world workloads
  • Fulltime
Read More
Arrow Right

Ai Solutions Architect / Field Application Engineer

We are looking for an AI enthusiast with strong technical fundamentals and custo...
Location
Location
United States , Austin
Salary
Salary:
128400.00 - 192600.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Electrical Engineering, Computer Engineering, or a related field (or equivalent practical experience)
  • Strong interest in AI/ML technologies and a desire to work across hardware and software layers
  • Hands-on experience with Linux-based systems
  • Programming experience in one or more of the following: Python, C/C++, Bash
  • Familiarity with AI frameworks or tools (e.g., PyTorch, TensorFlow, ONNX, Hugging Face, or similar)
  • Strong communication skills with the ability to explain technical concepts clearly
  • Ability to work effectively in a team-oriented, cross-functional environment
Job Responsibility
Job Responsibility
  • Serve as a technical point of contact for customers, supporting AI and HPC workloads on AMD CPU and GPU platforms
  • Work directly with customers to understand their use cases, requirements, and constraints, and guide them through solution design and deployment
  • Deliver technical presentations, demos, and architecture walkthroughs to both technical and non-technical audiences
  • Program-manage customer opportunities as they grow in complexity, coordinating activities across internal and external stakeholders
  • Perform hands-on system bring-up including hardware installation, firmware configuration, OS installation, and driver setup
  • Deploy and validate open-source AI and HPC software stacks (e.g., Linux, ROCm, AI frameworks, containers)
  • Run functionality, performance, and scalability benchmarks on CPU and GPU workloads
  • Perform first-level profiling and analysis of applications to identify performance bottlenecks and optimization opportunities
  • Support AI workloads such as training, inference, and data preprocessing across CPU and GPU platforms
  • Develop working knowledge of AMD CPU and GPU architectures and how they impact real-world workloads
  • Fulltime
Read More
Arrow Right

AI Systems Engineer - Agentic Autonomy

We are seeking an AI Systems Engineer with deep expertise in large language mode...
Location
Location
United States , Greater Boston
Salary
Salary:
140000.00 - 180000.00 USD / Year
havocai.com Logo
HavocAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s, Master’s, or PhD in Computer Science, Machine Learning, Robotics, or a related field
  • Deep hands-on experience building with LLMs and multi-agent/agentic AI frameworks
  • Strong software engineering background in modern ML frameworks, cloud orchestration, and API development
  • Experience integrating AI systems into larger software architectures or robotics/autonomy workflows
  • Understanding of RAG pipelines, tool-use frameworks, LLM function-calling, memory systems, and agent orchestration
  • Experience with safety evaluation, model alignment, or mission-critical AI system validation
  • Ability to lead system-level design discussions and coordinate across multiple engineering disciplines
  • Must be a U.S. Citizen and eligible to obtain a Secret Clearance
Job Responsibility
Job Responsibility
  • Lead the design and development of LLM-powered software modules for mission reasoning, planning, operator interaction, and autonomous decision support
  • Integrate LLMs and agentic systems into HavocAI’s autonomy architecture, including ROS/ROS2 systems, planning engines, and mission software
  • Build multi-agent, tool-using AI systems that interact with perception data, mission databases, simulation systems, and operator inputs
  • Develop APIs, wrappers, and orchestration layers enabling LLMs to interface safely with embedded, cloud, and edge compute environments
  • Optimize LLM inference pipelines for performance, latency, and reliability in field-deployed systems
  • Evaluate model behavior, perform safety testing, and develop guardrails for mission-critical use cases
  • Collaborate with autonomy, embedded, simulation, and full-stack teams to define requirements and ensure robust system-level integration
  • Guide strategic decisions on model selection, fine-tuning approaches, safety frameworks, and long-term AI architecture
  • Contribute to field testing, operator evaluations, and iterative deployment cycles for AI-augmented autonomy systems
What we offer
What we offer
  • 100% Employer paid Health, Dental and Vision Insurance for you and your families
  • Life Insurance (Employer Paid)
  • Ability to participate in the companies 401k program (Matching)
  • Unlimited PTO policy with an enforced 2 week minimum
  • Equity Package
  • Work / Home Office Stipend
  • Global Entry
  • 16 Week Paid Parental Leave
  • Monthly Health and Wellness Stipend
  • Fulltime
Read More
Arrow Right