CrawlJobs Logo

Senior ML Software Engineer - Integration & Quality

cerebras.net Logo

Cerebras Systems

Location Icon

Location:
United States; Canada , Sunnyvale

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. As a Senior Software Engineer in ML Integration and Quality team, you will play a pivotal role in bringing together and delivering all software and hardware components for Cerebras AI platform. You will focus on SW components feature integration and quality. Pre deployment/production validation for Cerebras training and inference solution. As part of this role, you will influence the best testing practice, good debugging methodology, effective cross team communication and advocate for world-class products.

Job Responsibility:

  • Develop and execute a comprehensive integration and QA strategy aligned with the roadmap of the Cerebras AI solution
  • Execute with good software integration methodology, collaborate with effective communication and ensure quality
  • Break down complex tasks into smaller tasks, be a problem solver and help debug
  • Automation of workflows, testbed setups and building tools to monitor/debug
  • Implement creative ways to break Cerebras software and identify potential
  • Contribute to developing SW specifications with a focus on ML
  • Drive quality of various software and hardware components of Cerebras AI platform to ensure accuracy, performance and usability of ML training and inference
  • Ability to work in a fast-paced environment and make the necessary prioritizations and judgements which affects productivity at a company
  • Define and implement quality metrics to measure product and process quality, provide actionable insights and recommendations to drive continuous
  • Provide regular updates on quality, key metrics, and risks to engineering and business stakeholders
  • Collaborate with software and product team to develop clear acceptance criteria and deliver quality product
  • Execute and deliver with strong sense of ownership and quality driven

Requirements:

  • 5+ years of relevant industry experience in Software integration, development
  • Strong automation and programming skills using one or more programming languages like Python, C++ or go
  • Experience in testing compute/machine learning/networking/storage systems within a large-scale enterprise environment
  • Experience in debugging issues across distributed scale out
  • Experience in understanding complex systems and putting together thorough test-plans
  • Experience working effectively across teams, including product development, product management, customer operations, and field teams
  • Excellent verbal and written communication
  • Strong organizational skills, teamwork, and can-do attitude
  • Experience working with geographically dispersed teams across time

Nice to have:

  • Experience in working with ML workloads such as LLM/Multimodal training or
  • Experience with hardware architecture, performance optimizations, compilers and ML frameworks
  • Experience working with distributed systems, cloud and
  • Experience working with microservices deployment, debugging
What we offer:
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs

Additional Information:

Job Posted:
February 17, 2026

Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior ML Software Engineer - Integration & Quality

Senior Software Engineer - Network Enablement (Applied ML)

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred)
  • Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark
  • Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference
  • Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics
  • Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline & online parity, monitoring and incident response
Job Responsibility
Job Responsibility
  • Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows)
  • Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact)
  • Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses
  • Build and operate offline training pipelines and production batch scoring for bank intelligence products
  • Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring
  • Implement model CI/CD, model/version registry, and safe rollout/rollback strategies
  • Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs
  • Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions
  • Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection)
  • Ensure fairness, explainability and PII-aware handling for partner-facing ML features
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

We're looking for a Software Engineer to join our Data Department, someone with ...
Location
Location
Spain , Madrid
Salary
Salary:
Not provided
https://feverup.com/fe Logo
Fever
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong backend engineering foundations and a passion for writing high-quality Python code
  • Solid understanding of OOP, software architecture patterns (clean architecture, hexagonal), and design principles
  • Experience working with relational databases and SQL (PostgreSQL, Snowflake, or similar)
  • Familiarity with containerization and deployment workflows (Docker, Kubernetes)
  • Comfortable communicating in English in a cross-functional technical environment
  • Pragmatic mindset — you balance technical quality with business impact and speed of delivery
Job Responsibility
Job Responsibility
  • Build and maintain backend services and data pipelines that enable ML models and automations to run reliably at scale
  • Design robust systems to automate business processes and make them available through APIs or event-based architectures
  • Translate complex business and analytical needs into technical solutions that create leverage across CRM, Marketing, Product, and Data Science teams
  • Own your services end-to-end, from architecture to deployment and monitoring, applying strong engineering discipline
  • Collaborate closely with Data Science, Machine Learning and Data Engineering to ensure smooth integration of data sources and model infrastructure
What we offer
What we offer
  • Responsibility from day one and professional and personal growth
  • Opportunity to have a real impact in a high-growth global category leader
  • A compensation package consisting of base salary and the potential to earn a significant bonus for top performance
  • Stock options plan
  • 40% discount on all Fever events and experiences
  • Health insurance and other benefits such as Flexible remuneration with a 100% tax exemption through Cobee
  • English / Spanish Lessons
  • Wellhub Membership
  • Possibility to receive in advance part of your salary by Payflow
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

At JFrog, we’re reinventing DevOps and MLOps to help the world’s greatest compan...
Location
Location
Israel , Netanya/Tel Aviv
Salary
Salary:
Not provided
jfrog.com Logo
JFrog
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of proven experience in software development
  • Strong background in designing, developing, and debugging complex distributed systems (e.g., microservices, event-driven architectures)
  • Hands-on experience with containerized environments, microservices, and Kubernetes
  • Proven experience with at least one major cloud provider (e.g., AWS, GCP, Azure)
  • Ability to lead technical discussions, mentor engineers, and drive architectural decisions
Job Responsibility
Job Responsibility
  • Be an integral part of a highly skilled team working to build the leading MLOps platform in the industry
  • Maintain and evolve the Runtime team’s products, ensuring their reliability and scalability
  • Design and develop a complete hosting system that supports various types of inference, analytics, monitoring, distribution, and more – enabling customers to run large-scale real-time, batch, and streaming ML pipelines
  • Play a key role in shaping our cross-company engineering culture
  • Conduct high-quality design reviews with a strong emphasis on scalability, maintainability, security, and sound use of design patterns
  • Write maintainable, well-tested code in multiple programming languages
  • Continuously improve the efficiency, scalability, and stability of critical system components
Read More
Arrow Right

Senior Platform Engineer, ML Data Systems

We’re looking for an ML Data Engineer to evolve our eval dataset tools to meet t...
Location
Location
United States , Mountain View
Salary
Salary:
137871.00 - 172339.00 USD / Year
khanacademy.org Logo
Khan Academy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field
  • 5 years of Software Engineering experience with 3+ of those years working with large ML datasets, especially those in open-source repositories such as Hugging Face
  • Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect)
  • Experience with data versioning tools (e.g., DVC, LakeFS) and cloud storage systems
  • Familiarity with machine learning workflows — from training data preparation to evaluation
  • Familiarity with the architecture and operation of large language models, and a nuanced understanding of their capabilities and limitations
  • Attention to detail and an obsession with data quality and reproducibility
  • Motivated by the Khan Academy mission “to provide a free world-class education for anyone, anywhere.”
  • Proven cross-cultural competency skills demonstrating self-awareness, awareness of other, and the ability to adopt inclusive perspectives, attitudes, and behaviors to drive inclusion and belonging throughout the organization.
Job Responsibility
Job Responsibility
  • Evolve and maintain pipelines for transforming raw trace data into ML-ready datasets
  • Clean, normalize, and enrich data while preserving semantic meaning and consistency
  • Prepare and format datasets for human labeling, and integrate results into ML datasets
  • Develop and maintain scalable ETL pipelines using Airflow, DBT, Go, and Python running on GCP
  • Implement automated tests and validation to detect data drift or labeling inconsistencies
  • Collaborate with AI engineers, platform developers, and product teams to define data strategies in support of continuously improving the quality of Khan’s AI-based tutoring
  • Contribute to shared tools and documentation for dataset management and AI evaluation
  • Inform our data governance strategies for proper data retention, PII controls/scrubbing, and isolation of particularly sensitive data such as offensive test imagery.
What we offer
What we offer
  • Competitive salaries
  • Ample paid time off as needed
  • 8 pre-scheduled Wellness Days in 2026 occurring on a Monday or a Friday for a 3-day weekend boost
  • Remote-first culture - that caters to your time zone, with open flexibility as needed, at times
  • Generous parental leave
  • An exceptional team that trusts you and gives you the freedom to do your best
  • The chance to put your talents towards a deeply meaningful mission and the opportunity to work on high-impact products that are already defining the future of education
  • Opportunities to connect through affinity, ally, and social groups
  • 401(k) + 4% matching & comprehensive insurance, including medical, dental, vision, and life.
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Machine Learning

At Machina Labs, we’re reshaping manufacturing through advanced robotics and mac...
Location
Location
United States , Chatsworth
Salary
Salary:
155000.00 - 190000.00 USD / Year
machinalabs.ai Logo
Machina Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • MS or PhD. in Data Science, Computer Science, Machine Learning, Statistics, or a related field
  • 4+ years of hands-on experience in machine learning systems, algorithms, and applications (e.g. deep learning, time series analysis, etc.)
  • Experienced and very comfortable coming up with machine learning architectures and training models from scratch
  • Extensive Python programming experience
  • Familiarity with big data platforms (Hadoop, Spark, Hive) and analytics environments (Databricks, Sagemaker/Azure ML, Jupyter)
  • Experience in build/release systems and processes
  • Quick learner of new technologies and experienced in fast-paced iterative design
  • Strong communicator with the ability to explain complex topics to technical and non-technical audiences
  • Proven track record of being able to solve complex problems independently and as part of an integrated team
Job Responsibility
Job Responsibility
  • Identify opportunities for machine learning automation and predictive modeling by analyzing available data and collaborating with engineers and manufacturing process experts
  • Conduct data mining, develop model architectures, train and deploy models, and define metrics aligned with business objectives
  • Design, develop, and deploy ETL and data cleansing processes to extract relevant features for modeling (in collaboration with other team members)
  • Assist team members in data analysis and interpretation
  • Design and conduct experiments to test and validate solutions and models
  • Build a production-ready pipeline supporting multiple machine learning models
  • Develop monitoring tools for data quality and system performance
  • Provide guidance to team members and actively participate in interview processes to hire additional team members
What we offer
What we offer
  • Medical, Dental, Vision
  • PTO
  • Stock Options
  • Fulltime
Read More
Arrow Right

Senior AI ML Engineer

We are seeking a highly skilled and experienced Assistant Vice President (AVP), ...
Location
Location
India , Pune
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Data Science, Artificial Intelligence, Machine Learning, Statistics, or a related quantitative field
  • Minimum of 6+ years of professional experience in Data Science, Machine Learning Engineering, or a similar role, with a strong track record of deploying ML models to production
  • Proven experience in a lead or senior technical role
  • Expert-level proficiency in Python programming, including experience with relevant data science libraries (e.g., Pandas, NumPy, Scikit-learn) and deep learning frameworks (e.g., TensorFlow, PyTorch)
  • Strong hands-on experience designing, developing, and deploying RESTful APIs using FastAPI
  • Solid understanding and practical experience with CI/CD tools and methodologies (e.g., Jenkins, GitLab CI, GitHub Actions, Azure DevOps) for MLOps
  • Experience with MLOps platforms, model monitoring, and model versioning
  • Experience with at least one major cloud provider (e.g., AWS, Azure, GCP) for deploying and managing ML workloads
  • Proficiency in SQL and experience working with relational and/or NoSQL databases
  • Deep understanding of machine learning algorithms, statistical modeling, and data mining techniques
Job Responsibility
Job Responsibility
  • Design, develop, and implement advanced machine learning models (e.g., predictive, prescriptive, generative AI) to solve complex business problems, from initial data exploration and feature engineering to model training and evaluation
  • Lead the deployment of AI/ML models into production environments, ensuring scalability, reliability, and performance
  • Build and maintain robust, high-performance APIs (using frameworks like FastAPI) to serve machine learning models and integrate them with existing applications and systems
  • Establish and manage continuous integration and continuous deployment (CI/CD) pipelines for ML code and model deployments, promoting automation and efficiency
  • Collaborate with data engineers to ensure optimal data pipelines and data quality for model development and deployment
  • Conduct rigorous experimentation, A/B testing, and model performance monitoring to continuously improve and optimize AI/ML solutions
  • Promote and enforce best practices in software development, including clean code, unit testing, documentation, and version control
  • Mentor junior team members, contribute to technical discussions, and drive the adoption of new technologies and methodologies within the team
  • Effectively communicate complex technical concepts and model results to both technical and non-technical stakeholders.
What we offer
What we offer
  • Not explicitly stated.
  • Fulltime
Read More
Arrow Right

Senior Principal Data Platform Software Engineer

We’re looking for a Sr Principal Data Platform Software Engineer (P70) to be a k...
Location
Location
Salary
Salary:
239400.00 - 312550.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years in Data Engineering, Software Engineering, or related roles, with substantial exposure to big data ecosystems
  • Demonstrated experience building and operating data platforms or large‑scale data services in production
  • Proven track record of building services from the ground up (requirements → design → implementation → deployment → ongoing ownership)
  • Hands‑on experience with AWS, GCP (e.g., compute, storage, data, and streaming services) and cloud‑native architectures
  • Practical experience with big data technologies, such as Databricks, Apache Spark, AWS EMR, Apache Flink, or StarRocks
  • Strong programming skills in one or more of: Kotlin, Scala, Java, Python
  • Experience leading cross‑team technical initiatives and influencing senior stakeholders
  • Experience mentoring Staff/Principal engineers and lifting the technical bar for a team or org
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Design, develop and own delivery of high quality big data and analytical platform solutions aiming to solve Atlassian’s needs to support millions of users with optimal cost, minimal latency and maximum reliability
  • Improve and operate large‑scale distributed data systems in the cloud (primarily AWS, with increasing integration with GCP and Kubernetes‑based microservices)
  • Drive the evolution of our high-performance analytical databases and its integrations with products, cloud infrastructures (AWS and GCP) and isolated cloud environments
  • Help define and uplift engineering and operational standards for petabyte scale data platforms, with sub‑second analytic queries and multi‑region availability (coding guidelines, code review practices, observability, incident response, SLIs/SLOs)
  • Partner across multiple product and platform teams (including Analytics, Marketplace/Ecosystem, Core Data Platform, ML Platform, Search, and Oasis/FedRAMP) to deliver company‑wide initiatives that depend on reliable, high‑quality data
  • Act as a technical mentor and multiplier, raising the bar on design quality, code quality, and operational excellence across the broader team
  • Design and implement self‑healing, resilient data platforms with strong observability, fault tolerance, and recovery characteristics
  • Own the long‑term architecture and technical direction of Atlassian’s product data platform with projects that are directly tied to Atlassian’s company-level OKRs
  • Be accountable for the reliability, cost efficiency, and strategic direction of Atlassian’s product analytical data platform
  • Partner with executives and influence senior leaders to align engineering efforts with Atlassian’s long-term business objectives
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Backend

The Staff Engineer will work closely with AI/ML engineers, product managers, app...
Location
Location
United States , NYC
Salary
Salary:
160000.00 - 190000.00 USD / Year
conductor.com Logo
Conductor
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Completed studies in Computer Science, Mathematics, engineering or a related field or equivalent professional experience
  • 8+ years of experience in software development, with experience in product-driven companies
  • Strong expertise in system design, distributed computing, and scalable architecture patterns for handling large datasets and high-throughput applications
  • Proficiency in multiple programming languages with strong Python coding skills. Experience with Java is highly valued
  • Strong database experience including both SQL and NoSQL systems, with knowledge of data modeling and optimization techniques
  • Experience with AI/ML technologies including LLMs, vector databases (e.g., Milvus), embeddings, and ML frameworks
  • Knowledge of MLOps practices, model deployment, and AI system integration in production environments
  • Experience working across the full software development lifecycle including CI/CD, monitoring, testing, and production deployment
  • Proven track record of technical leadership, mentoring engineers, and driving engineering excellence within teams
  • Up-to-date with rapidly-evolving technologies and demonstrated ability to evaluate and adopt new tools and frameworks
Job Responsibility
Job Responsibility
  • Lead the technical architecture, design, and implementation of large-scale distributed systems and data platforms to support customer needs and business growth
  • Oversee the planning, execution, and successful delivery of complex engineering projects, ensuring adherence to engineering best practices and quality standards
  • Design and build scalable, high-performance backend systems and APIs that handle millions of requests and large datasets efficiently
  • Architect robust data processing pipelines and ETL workflows using modern cloud technologies and distributed computing frameworks
  • Drive technical decision-making across the engineering organization, evaluating trade-offs and establishing engineering standards and practices
  • Lead cross-functional collaboration with product, AI/ML engineering, data engineering, and infrastructure teams to deliver comprehensive solutions
  • Build and maintain CI/CD pipelines, monitoring systems, and deployment automation to ensure reliable software delivery
  • Implement AI/ML capabilities including LLM integration, vector databases, and intelligent content processing workflows
  • Mentor senior and junior engineers, fostering technical excellence and knowledge sharing within the engineering organization
What we offer
What we offer
  • 100% covered employee medical plan
  • a dental & vision plans
  • 401(k) with employer contribution
  • an unlimited vacation policy
  • 10 sick days
  • short-term disability
  • long-term disability
  • generous paid parental leave
  • employee assistance program
  • flexible savings accounts
  • Fulltime
Read More
Arrow Right