Ai infrastructure engineer, model serving platform Job at Scale (San Francisco)

Senior ML Platform Engineer

At WHOOP, we're on a mission to unlock human performance and healthspan. WHOOP e...

Location

United States , Boston

Salary:

150000.00 - 210000.00 USD / Year

Whoop

Expiration Date

Until further notice

Requirements

Bachelor’s or Master’s Degree in Computer Science, Engineering, or a related field
or equivalent practical experience
5+ years of experience in software engineering with a focus on ML infrastructure, cloud platforms, or MLOps
Strong programming skills in Python, with experience in building distributed systems and REST/gRPC APIs
Deep knowledge of cloud-native services and infrastructure-as-code (e.g., AWS CDK, Terraform, CloudFormation)
Hands-on experience with model deployment platforms such as AWS SageMaker, Vertex AI, or Kubernetes-based serving stacks
Proficiency in ML lifecycle tools (MLflow, Weights & Biases, BentoML) and containerization strategies (Docker, Kubernetes)
Understanding of data engineering and ingestion pipelines, with ability to interface with data lakes, feature stores, and streaming systems
Proven ability to work cross-functionally with Data Science, Data Platform, and Software Engineering teams, influencing decisions and driving alignment
Passion for AI and automation to solve real-world problems and improve operational workflows

Job Responsibility

Architect, build, own, and operate scalable ML infrastructure in cloud environments (e.g., AWS), optimizing for speed, observability, cost, and reproducibility
Create, support, and maintain core MLOps infrastructure (e.g., MLflow, feature store, experiment tracking, model registry), ensuring reliability, scalability, and long-term sustainability
Develop, evolve, and operate MLOps platforms and frameworks that standardize model deployment, versioning, drift detection, and lifecycle management at scale
Implement and continuously maintain end-to-end CI/CD pipelines for ML models using orchestration tools (e.g., Prefect, Airflow, Argo Workflows), ensuring robust testing, reproducibility, and traceability
Partner closely with Data Science, Sensor Intelligence, and Data Platform teams to operationalize and support model development, deployment, and monitoring workflows
Build, manage, and maintain both real-time and batch inference infrastructure, supporting diverse use cases from physiological analytics to personalized feedback loops for WHOOP members
Design, implement, and own automated observability tooling (e.g., for model latency, data drift, accuracy degradation), integrating metrics, logging, and alerting with existing platforms
Leverage AI-powered tools and automation to reduce operational overhead, enhance developer productivity, and accelerate model release cycles
Contribute to and maintain internal platform documentation, SDKs, and training materials, enabling self-service capabilities for model deployment and experimentation
Continuously evaluate and integrate emerging technologies and deployment strategies, influencing WHOOP’s roadmap for AI-driven platform efficiency, reliability, and scale

What we offer

equity
benefits

Fulltime

Software Engineer, Infrastructure

As a Software Engineer on our Infrastructure team, you will help design and buil...

Location

United States , New York; San Mateo; Redwood City

Salary:

140000.00 - 150000.00 USD / Year

Fireworks AI

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
Strong programming skills in Python, C++, or a similar language
Solid understanding of computer systems concepts such as networking, storage, and distributed computing
Familiarity with cloud platforms like AWS, GCP, or Azure, and containerization tools like Docker or Kubernetes
Knowledge and interest in cloud infrastructure, distributed systems, and machine learning

Job Responsibility

Contribute to the design and development of scalable backend infrastructure that supports distributed training, inference, and data pipelines
Build and maintain core backend services such as job schedulers, autoscalers, resource managers, and model serving systems
Support performance optimization, cost efficiency, and reliability improvements across compute, storage, and networking layers
Collaborate with ML, DevOps, and product teams to translate research and product needs into infrastructure solutions
Learn and apply modern cloud technologies including Kubernetes, Ray, Kubeflow, and MLFlow
Participate in code reviews, technical discussions, and continuous integration and deployment processes

What we offer

Meaningful equity in a fast-growing startup
Competitive salary and comprehensive benefits package

Fulltime

Machine Learning Platform / Backend Engineer

We are seeking a Machine Learning Platform/Backend Engineer to design, build, an...

Location

Serbia; Romania , Belgrade; Timișoara

Salary:

Not provided

Everseen

Expiration Date

Until further notice

Requirements

4-5+ years of work experience in either ML infrastructure, MLOps, or Platform Engineering
Bachelors degree or equivalent focusing on the computer science field is preferred
Excellent communication and collaboration skills
Expert knowledge of Python
Experience with CI/CD tools (e.g., GitLab, Jenkins)
Hands-on experience with Kubernetes, Docker, and cloud services
Understanding of ML training pipelines, data lifecycle, and model serving concepts
Familiarity with workflow orchestration tools (e.g., Airflow, Kubeflow, Ray, Vertex AI, Azure ML)
A demonstrated understanding of the ML lifecycle, model versioning, and monitoring
Experience with ML frameworks (e.g., TensorFlow, PyTorch)

Job Responsibility

Design, build, and maintain scalable infrastructure that empowers data scientists and machine learning engineers
Own the design and implementation of the internal ML platform, enabling end-to-end workflow orchestration, resource management, and automation using cloud-native technologies (GCP/Azure)
Design and manage Kubernetes-based infrastructure for multi-tenant GPU and CPU workloads with strong isolation, quota control, and monitoring
Integrate and extend orchestration tools (Airflow, Kubeflow, Ray, Vertex AI, Azure ML or custom schedulers) to automate data processing, training, and deployment pipelines
Develop shared services for model behavior/performance tracking, data/datasets versioning, and artifact management (MLflow, DVC, or custom registries)
Build out documentation in relation to architecture, policies and operations runbooks
Share skills, knowledge, and expertise with members of the data engineering team
Foster a culture of collaboration and continuous learning by organizing training sessions, workshops, and knowledge-sharing sessions
Collaborate and drive progress with cross-functional teams to design and develop new features and functionalities
Ensure that the developed solutions meet project objectives and enhance user experience

Fulltime

AI (Infrastructure & Pipelines) Architect

As an AI Architect and reporting to Everseen's CTO, you will be responsible for ...

Location

Timișoara / Belgrade

Salary:

Not provided

Everseen

Expiration Date

Until further notice

Requirements

Strong knowledge of Machine Learning, Deep Learning
Generative AI & LLM Integration
Edge AI & Real-time Inference
Proficiency in Python, and ML frameworks like TensorFlow, PyTorch, Hugging Face Transformers
Experience with cloud platforms (Azure, GCP)
Understanding of data architecture, big data technologies, and model deployment
Strong understanding of ML training pipelines, data lifecycle, and model serving concepts
Experience with GPU orchestration (e.g., NVIDIA GPU Operator, MIG)
Experience with MLOps and AI governance frameworks
Familiarity with ethical AI practices and data privacy regulations

Job Responsibility

Systems & Infrastructure: Architect and oversee the adoption of scalable AI infrastructures and select appropriate frameworks, tools, and technologies
Performance & Monitoring Standardisation: Ensure all products and systems adhere to a commonly defined performance evaluation methodology and appropriate monitoring and reporting systems are in place
Compliance & Ethics: Ensure AI solutions adhere to ethical standards and regulatory requirements
Integration: Collaborate with data scientists, engineers, and business stakeholders to integrate AI into products and services
Leadership: Provide technical guidance and mentorship to cross-functional teams

Fulltime

Vice President - Bigdata Engineer - AI & NLP

The Applications Development Technology Lead Analyst is a senior-level position ...

Location

India , Chennai

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

13+ years of relevant experience in Apps Development or systems analysis role
Extensive experience in system analysis and programming of software applications
Experience in managing and implementing successful projects
Expert in coding Python in building Machine Learning and developing LLM-based applications in a professional environment
SQL skills able to perform data interrogations
Proficiency in enterprise-level application development using Java 8, Scala, Oracle (or comparable database), and Messaging infrastructure like Solace, Kafka, Tibco EMS
Develop LLM solutions for querying structured data with natural language, including RAG architectures on enterprise knowledge bases
Build, scale, and optimize data science workloads, applying best MLOps practices for production
Lead the design and development of LLM-based tools to increase data accessibility, focusing on text-to-SQL platforms
Train and fine-tune LLM models to accurately interpret natural language queries and generate SQL queries

Job Responsibility

Partner with multiple management teams to ensure appropriate integration of functions to meet goals
Identify and define necessary system enhancements to deploy new products and process improvements
Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
Provide expertise in area and advanced knowledge of applications programming
Ensure application design adheres to the overall architecture blueprint
Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets

What we offer

Global Benefits
Best-in-class benefits to be well, live well and save well

Fulltime

New