CrawlJobs Logo

Senior Software Engineer, ML Infrastructure

Arena Intelligence, Inc.

Location Icon

Location:
United States , Bay Area

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

LMArena is seeking a Senior Software Engineer (Infrastructure) to lead the design and development of scalable, high-performance real-time data and API infrastructure. In this role, you’ll architect systems that capture and process large volumes of serving requests in real time, powering the insights that help researchers and developers build the world’s most advanced AI and its applications. Your work will be foundational to how we surface trustworthy, transparent, and timely evaluation signals across the platform. This role is ideal for someone who thrives in fast-moving environments, cares deeply about performance and reliability, and wants to build systems that help the AI community better understand what models are the best for their real-world use cases.

Job Responsibility:

  • Architect and scale high-performance, real-time API and data systems
  • Design and implement low-latency pipelines to process and analyze large-scale event streams
  • Ensure reliability through robust data integrity, availability, and consistency mechanisms
  • Mentor and guide engineers on infrastructure best practices, architecture, and performance tuning
  • Collaborate cross-functionally with AI researchers, product leaders, and engineers to anticipate evolving infrastructure needs and deliver resilient, extensible systems

Requirements:

  • 5+ years of experience in software engineering, with a focus on infrastructure or large-scale data and ML systems
  • Deep expertise in distributed systems, stream processing, and scalable backend architecture
  • Proven ability to design and operate low-latency, high-throughput, and fault-tolerant systems
  • Strong foundation in systems design, performance tuning, and building reliable, fault-tolerant services
  • Comfortable in a dynamic, high-ownership, fast-growth environment

Nice to have:

Prior experience with PyTorch model development is a plus.

What we offer:
  • Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs.
  • The opportunity to work on cutting-edge AI with a small, mission-driven team
  • A culture that values transparency, trust, and community impact

Additional Information:

Job Posted:
February 20, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer, ML Infrastructure

Senior Software Engineer - Data Infrastructure

We build the data and machine learning infrastructure to enable Plaid engineers ...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of software engineering experience
  • Extensive hands-on software engineering experience, with a strong track record of delivering successful projects within the Data Infrastructure or Platform domain at similar or larger companies
  • Deep understanding of one of: ML Infrastructure systems, including Feature Stores, Training Infrastructure, Serving Infrastructure, and Model Monitoring OR Data Infrastructure systems, including Data Warehouses, Data Lakehouses, Apache Spark, Streaming Infrastructure, Workflow Orchestration
  • Strong cross-functional collaboration, communication, and project management skills, with proven ability to coordinate effectively
  • Proficiency in coding, testing, and system design, ensuring reliable and scalable solutions
  • Demonstrated leadership abilities, including experience mentoring and guiding junior engineers
Job Responsibility
Job Responsibility
  • Contribute towards the long-term technical roadmap for data-driven and machine learning iteration at Plaid
  • Leading key data infrastructure projects such as improving ML development golden paths, implementing offline streaming solutions for data freshness, building net new ETL pipeline infrastructure, and evolving data warehouse or data lakehouse capabilities
  • Working with stakeholders in other teams and functions to define technical roadmaps for key backend systems and abstractions across Plaid
  • Debugging, troubleshooting, and reducing operational burden for our Data Platform
  • Growing the team via mentorship and leadership, reviewing technical documents and code changes
What we offer
What we offer
  • medical, dental, vision, and 401(k)
  • equity and/or commission
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
What we offer
What we offer
  • medical, dental, vision, and 401(k)
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Network Enablement (Applied ML)

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred)
  • Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark
  • Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference
  • Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics
  • Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline & online parity, monitoring and incident response
Job Responsibility
Job Responsibility
  • Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows)
  • Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact)
  • Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses
  • Build and operate offline training pipelines and production batch scoring for bank intelligence products
  • Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring
  • Implement model CI/CD, model/version registry, and safe rollout/rollback strategies
  • Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs
  • Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions
  • Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection)
  • Ensure fairness, explainability and PII-aware handling for partner-facing ML features
What we offer
What we offer
  • medical
  • dental
  • vision
  • 401(k)
  • equity
  • commission
  • Fulltime
Read More
Arrow Right

Senior Software Engineer II - AI/ML

As a Senior Software Engineer II at Aledade, we maintain, improve, and expand ou...
Location
Location
United States
Salary
Salary:
Not provided
aledade.com Logo
Aledade, Inc.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/BTech (or higher) in Computer Science, Engineering or a related field
  • 6+ years experience as an engineer building full-stack web applications as part of a cross-functional team
  • 3+ years of experience working with SQL or other database querying language on large multi-table data sets
  • 3+ years of experience acting as a trusted technical decision-maker in a team setting, solving for short-term and long-term business value
  • 3+ years of experience coaching other engineers
Job Responsibility
Job Responsibility
  • Develop and implement scalable and performant solutions
  • Partner, as a peer, with Engineering Managers, Product Managers, and stakeholders throughout Aledade to develop and execute technical roadmaps using Agile processes
  • Mentor and coach more junior engineers including thorough pull request reviews for other developers and be receptive to critical feedback on your own work
  • Improve AI/ML infrastructure for model development, training, and deployment, with a focus on large language models and other generative AI architectures
  • Design multi-year vision, shaping the direction of crucial generative AI areas - text generation, image synthesis, multimodal models, and personalized content creation
  • Architect systems to enhance the capabilities and relevance of AI models, making complex data sets more accessible and actionable
  • Design and implement prompt engineering strategies to effectively guide generative AI models
  • Work closely with Product Management, Practices, Sales, Customer Success, and other stakeholders to identify and prioritize applied AI use cases within the organization
  • Analyze product usage patterns and trends to make data-driven decisions and forecasts for generative AI applications
  • Maintain the security of protected patient health information and ensure compliance with relevant regulations in the context of AI
  • Fulltime
Read More
Arrow Right

Senior ML Platform Engineer

At WHOOP, we're on a mission to unlock human performance and healthspan. WHOOP e...
Location
Location
United States , Boston
Salary
Salary:
150000.00 - 210000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s Degree in Computer Science, Engineering, or a related field
  • or equivalent practical experience
  • 5+ years of experience in software engineering with a focus on ML infrastructure, cloud platforms, or MLOps
  • Strong programming skills in Python, with experience in building distributed systems and REST/gRPC APIs
  • Deep knowledge of cloud-native services and infrastructure-as-code (e.g., AWS CDK, Terraform, CloudFormation)
  • Hands-on experience with model deployment platforms such as AWS SageMaker, Vertex AI, or Kubernetes-based serving stacks
  • Proficiency in ML lifecycle tools (MLflow, Weights & Biases, BentoML) and containerization strategies (Docker, Kubernetes)
  • Understanding of data engineering and ingestion pipelines, with ability to interface with data lakes, feature stores, and streaming systems
  • Proven ability to work cross-functionally with Data Science, Data Platform, and Software Engineering teams, influencing decisions and driving alignment
  • Passion for AI and automation to solve real-world problems and improve operational workflows
Job Responsibility
Job Responsibility
  • Architect, build, own, and operate scalable ML infrastructure in cloud environments (e.g., AWS), optimizing for speed, observability, cost, and reproducibility
  • Create, support, and maintain core MLOps infrastructure (e.g., MLflow, feature store, experiment tracking, model registry), ensuring reliability, scalability, and long-term sustainability
  • Develop, evolve, and operate MLOps platforms and frameworks that standardize model deployment, versioning, drift detection, and lifecycle management at scale
  • Implement and continuously maintain end-to-end CI/CD pipelines for ML models using orchestration tools (e.g., Prefect, Airflow, Argo Workflows), ensuring robust testing, reproducibility, and traceability
  • Partner closely with Data Science, Sensor Intelligence, and Data Platform teams to operationalize and support model development, deployment, and monitoring workflows
  • Build, manage, and maintain both real-time and batch inference infrastructure, supporting diverse use cases from physiological analytics to personalized feedback loops for WHOOP members
  • Design, implement, and own automated observability tooling (e.g., for model latency, data drift, accuracy degradation), integrating metrics, logging, and alerting with existing platforms
  • Leverage AI-powered tools and automation to reduce operational overhead, enhance developer productivity, and accelerate model release cycles
  • Contribute to and maintain internal platform documentation, SDKs, and training materials, enabling self-service capabilities for model deployment and experimentation
  • Continuously evaluate and integrate emerging technologies and deployment strategies, influencing WHOOP’s roadmap for AI-driven platform efficiency, reliability, and scale
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

We're looking for a Software Engineer to join our Data Department, someone with ...
Location
Location
Spain , Madrid
Salary
Salary:
Not provided
https://feverup.com/fe Logo
Fever
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong backend engineering foundations and a passion for writing high-quality Python code
  • Solid understanding of OOP, software architecture patterns (clean architecture, hexagonal), and design principles
  • Experience working with relational databases and SQL (PostgreSQL, Snowflake, or similar)
  • Familiarity with containerization and deployment workflows (Docker, Kubernetes)
  • Comfortable communicating in English in a cross-functional technical environment
  • Pragmatic mindset — you balance technical quality with business impact and speed of delivery
Job Responsibility
Job Responsibility
  • Build and maintain backend services and data pipelines that enable ML models and automations to run reliably at scale
  • Design robust systems to automate business processes and make them available through APIs or event-based architectures
  • Translate complex business and analytical needs into technical solutions that create leverage across CRM, Marketing, Product, and Data Science teams
  • Own your services end-to-end, from architecture to deployment and monitoring, applying strong engineering discipline
  • Collaborate closely with Data Science, Machine Learning and Data Engineering to ensure smooth integration of data sources and model infrastructure
What we offer
What we offer
  • Responsibility from day one and professional and personal growth
  • Opportunity to have a real impact in a high-growth global category leader
  • A compensation package consisting of base salary and the potential to earn a significant bonus for top performance
  • Stock options plan
  • 40% discount on all Fever events and experiences
  • Health insurance and other benefits such as Flexible remuneration with a 100% tax exemption through Cobee
  • English / Spanish Lessons
  • Wellhub Membership
  • Possibility to receive in advance part of your salary by Payflow
  • Fulltime
Read More
Arrow Right

Senior Full Stack Software Engineer

Tutor Intelligence builds software to enable ordinary robots to achieve extraord...
Location
Location
United States , Watertown
Salary
Salary:
140000.00 - 190000.00 USD / Year
tutorintelligence.com Logo
Tutor Intelligence
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming skills in Python
  • Software engineering tooling: git, unix shell, etc
  • Collaborative nature and social skill set
  • Interest in robotics, AI, solving hard problems, or improving the future of humanity
  • Passion for building things (and just getting stuff done)
Job Responsibility
Job Responsibility
  • Architecting and engineering core software across one or more of: robot software, backend services, ML services, cloud infrastructure / dev-ops
  • Involvement in new project planning
What we offer
What we offer
  • generous equity
  • fully covered health + dental
  • unlimited PTO
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Infrastructure Engineer

As a Senior ML Infrastructure Engineer at Plus, you will design scalable archite...
Location
Location
United States , Santa Clara
Salary
Salary:
160000.00 - 200000.00 USD / Year
plus.ai Logo
PlusAI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Phd or MS in Computer Science, Electrical Engineering, or related field
  • Good oral and written communication skills
  • Phd new grad or Masters with 3+ years of software engineering experience with a focus on ML infrastructure or distributed systems
  • Proficiency in in Python, C++, SQL
  • Deep understanding of containerization, orchestration technologies, distributed ML workload, and experiment tracking tools (e.g., Docker, Kubernetes, multiprocessing, Kubeflow, and mlflow)
  • Deploy and manage resources across multiple cloud platforms (AWS, GCP, or on-prem environments)
  • Proficiency in at least one deep learning framework, such as PyTorch and data pipeline tools (e.g., Apache Airflow, Prefect)
  • Strong knowledge of distributed systems, databases, and storage solutions
  • Extensive software design and development skills
  • Ability to learn and adapt to new technologies and contribute in a productive environment
Job Responsibility
Job Responsibility
  • Design and develop scalable, high-performance systems for training, inference, deploying, and monitoring ML models at scale
  • Build and maintain efficient data pipelines, model versioning systems, and experiment tracking frameworks
  • Collaborate with cross-functional teams, including ML researchers and engineers, to identify bottlenecks and improve platform usability
  • Implement distributed systems and storage solutions optimized for machine learning workloadsDrive improvements in CI/CD workflows for ML models and infrastructure
  • Ensure high availability and reliability of the ML platform by implementing robust monitoring, logging, and alerting systems
  • Stay current with industry trends and integrate relevant tools and frameworks to enhance the platform
  • Mentor junior engineers and contribute to a culture of technical excellence
  • Ensure that your work is performed in accordance with the company’s Quality Management System (QMS) requirements and contribute to continuous improvement efforts
  • Ensure team compliance with QMS, monitor quality, and drive process improvements
What we offer
What we offer
  • Work, learn and grow in a highly future-oriented, innovative and dynamic field
  • Wide range of opportunities for personal and professional development
  • Catered free lunch, unlimited snacks and beverages
  • Highly competitive salary and benefits package, including 401(k) plan
  • Fulltime
Read More
Arrow Right