Member of Technical Staff, Research Tooling & Data Platform Job at Runway

Member of Technical Staff - Platform Engineer

Platform Engineer to join our team building backend infrastructure for new ML-po...

Location

United States , Palo Alto

Salary:

175000.00 - 350000.00 USD / Year

Inflection AI

Expiration Date

Until further notice

Requirements

Backend engineering experience with Python, TypeScript, or Node.js
Hands-on experience working with production PyTorch models, model checkpoints, and inference logic
Strong knowledge of building APIs and services that are scalable, stable, and secure
Passion for bridging backend engineering and ML systems, especially at the infrastructure layer
Familiarity with tools such as FastAPI, Postgres, Redis, Kubernetes, and React
Desire to be hands-on and contribute to shaping the foundation of a new enterprise ML product
Have a bachelor’s degree or equivalent in a related field to the offered position requirements

Job Responsibility

Build and maintain backend services to support LLM integration, inference orchestration, and data flow
Write clean, reliable Python code for experimentation, model integration, and production systems
Collaborate closely with ML researchers to rapidly iterate on product ideas and deploy features
Design and implement infrastructure to handle scalable inference workloads and enterprise-level use cases
Own system components and ensure reliability, observability, and maintainability from day one

What we offer

Diverse medical, dental and vision options
401k matching program
Unlimited paid time off
Parental leave and flexibility for all parents and caregivers
Support of country-specific visa needs for international employees living in the Bay Area
Competitive stock options

Member of Technical Staff, AI Training Infrastructure

As a Training Infrastructure Engineer, you'll design, build, and optimize the in...

Location

United States , San Mateo

Salary:

175000.00 - 220000.00 USD / Year

Fireworks AI

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
3+ years of experience with distributed systems and ML infrastructure
Experience with PyTorch
Proficiency in cloud platforms (AWS, GCP, Azure)
Experience with containerization, orchestration (Kubernetes, Docker)
Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)

Job Responsibility

Design and implement scalable infrastructure for large-scale model training workloads
Develop and maintain distributed training pipelines for LLMs and multimodal models
Optimize training performance across multiple GPUs, nodes, and data centers
Implement monitoring, logging, and debugging tools for training operations
Architect and maintain data storage solutions for large-scale training datasets
Automate infrastructure provisioning, scaling, and orchestration for model training
Collaborate with researchers to implement and optimize training methodologies
Analyze and improve efficiency, scalability, and cost-effectiveness of training systems
Troubleshoot complex performance issues in distributed training environments

What we offer

meaningful equity in a fast-growing startup
comprehensive benefits package

Fulltime

Member of Technical Staff, Cloud Infrastructure

As a Software Engineer on our Cloud Infrastructure team, you'll be at the forefr...

Location

United States , New York, NY; San Mateo, CA; Redwood City, CA

Salary:

175000.00 - 220000.00 USD / Year

Fireworks AI

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
5+ years of experience designing and building backend infrastructure in cloud environments (e.g., AWS, GCP, Azure)
Proven experience in ML infrastructure and tooling (e.g., PyTorch, TensorFlow, Vertex AI, SageMaker, Kubernetes, etc.)
Strong software development skills in languages like Python, or C++
Deep understanding of distributed systems fundamentals: scheduling, orchestration, storage, networking, and compute optimization

Job Responsibility

Architect and build scalable, resilient, and high-performance backend infrastructure to support distributed training, inference, and data processing pipelines
Lead technical design discussions, mentor other engineers, and establish best practices for building and operating large-scale ML infrastructure
Design and implement core backend services (e.g., job schedulers, resource managers, autoscalers, model serving layers) with a focus on efficiency and low latency
Drive infrastructure optimization initiatives, including compute cost reduction, storage lifecycle management, and network performance tuning
Collaborate cross-functionally with ML, DevOps, and product teams to translate research and product needs into robust infrastructure solutions
Continuously evaluate and integrate cloud-native and open-source technologies (e.g., Kubernetes, Ray, Kubeflow, MLFlow) to enhance our platform’s capabilities and reliability
Own end-to-end systems from design to deployment and observability, with a strong emphasis on reliability, fault tolerance, and operational excellence

What we offer

Meaningful equity in a fast-growing startup
Competitive salary
Comprehensive benefits package

Fulltime

Systems Analyst 3 - Data Engineer

Sammons Financial Group is seeking a Systems Analyst – Data Engineer to design, ...

Location

United States , Sioux Falls; West Des Moines; Chicago

Salary:

82654.00 - 172197.00 USD / Year

Sammons Financial Group

Expiration Date

Until further notice

Requirements

College Degree in the field of computer science, information science, management information systems Preferred
Minimum 8 years' IT development experience or equivalent Preferred
Effective verbal and written communications skills and the ability to communicate with business partners and other IT staff
Problem solving skills sufficient to perform research and recommend a proposed solution to problems
Able to work on multiple tasks and meet established deadlines
Able to effectively direct and coordinate the work of other team members on a project without having HR management responsibility for them
Knowledge of computer programming languages as required for the system
Criminal background check required

Job Responsibility

Design, develop, and implement scalable data ingestion, integration, and processing pipelines across cloud platforms (Azure, Snowflake/ and similar EDW/Lakehouse platforms , AWS)
Develop and manage data orchestration workflows using tools such as Azure Data Factory (ADF), Azure Data Lake (ADLS), dbt, and comparable technologies
Ingest and process large volumes of structured, semi-structured, and unstructured data, including compressed formats (e.g., .tar), and automate extraction, transformation, and loading processes
Design and implement modern data lakehouse architectures, including Iceberg (or similar table formats), to support scalable and high-performance analytics
Develop and maintain data models that accurately represent complex relationships within life insurance and policy administration domains
Integrate enterprise data platforms with internal and external systems (e.g., APIs, Kafka, MuleSoft) to enable real-time and batch data exchange
Collaborate with product owners, architects, analysts, and developers to translate business, functional, and non-functional requirements into scalable technical solutions
Establish and enforce data engineering standards, best practices, and governance controls across ingestion, transformation, and storage layers
Implement data quality validation, reconciliation processes, and error handling to ensure accuracy, consistency, and reliability of data pipelines
Monitor pipeline performance, reliability, scalability, and cost efficiency

What we offer

Comprehensive health coverage for you and your family, including Medical, Dental, Vision, HSA & FSA options, and term life insurance
Competitive compensation with a performance-based incentive program tied to clear goals and individual and/or company success
Invest in your future with our 100% company-funded Employee Stock Ownership Plan (ESOP), plus automatic enrollment in our 401(k)
Work–life balance that means something. Friday afternoons off year-round, generous paid time off, and paid holidays
Commit to your growth with paid development time, tuition reimbursement, and professional development opportunities across industry, individual, and leadership programs
Make an impact beyond the workplace through volunteer time off, and our company nonprofit matching gift program, supporting the causes that matter most to you
An ownership culture that inspires
join a connected, values-driven workplace where employees take accountability, support one another, and are empowered to do their best work—together shaping our future shared success

Fulltime

Staff Software Engineer - AI/ML Platform

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Chevy Chase; New York City; Palo Alto

Salary:

115000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Chevy Chase; New York City; Palo Alto

Salary:

115000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Palo Alto

Salary:

90000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Data Scientist

The Data Scientist plays a pivotal role in planning, executing, and delivering m...

Location

United States , Camden

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Master’s, or PhD in Computer Science, Data Science, Engineering, Statistics, Applied Mathematics, Operations Research, or a related quantitative field
Specialization in ML, AI, cognitive science, or data science is highly preferred
3-5 years of hands-on experience planning and executing end-to-end data science projects with demonstrated impact on clinical or operational outcomes in business environments
Advanced programming proficiency in Python or R with strong expertise in machine learning frameworks (scikit-learn, TensorFlow, PyTorch) and statistical analysis tools
Expertise in machine learning and statistical techniques including supervised/unsupervised learning, deep learning, NLP, computer vision, regression models, ensemble methods, and experimental design (A/B testing)
Strong data engineering capabilities including SQL/NoSQL database programming, distributed computing tools (Hadoop, Spark, Kafka), data pipeline development, and experience with cloud platforms (AWS, Azure, GCP)
Production ML and MLOps experience including model deployment, monitoring, containerization (Docker, Kubernetes), version control, and applying DevOps principles to data science workflows
Data visualization and communication excellence with ability to create compelling dashboards (Tableau, Power BI), translate complex technical findings into actionable insights, and present to diverse audiences from executives to frontline staff
Cross-functional collaboration skills with proven ability to work in agile environments, partner with stakeholders to align technical solutions with business objectives, and mentor junior team members
Healthcare domain knowledge preferred, particularly experience with Epic EHR systems, clinical workflows, and healthcare data standards, along with relevant certifications (Clarity /Caboodle, Google Cloud ML Engineer, AWS ML Specialist)

Job Responsibility

Collect, clean, and analyze datasets from diverse internal and external sources, applying advanced data wrangling techniques
Acquire access to various databases and source systems (SQL, NoSQL, graph databases) and create data pipelines
Apply statistical analysis and visualization techniques to explore and prepare data
Design, develop, and validate machine learning, statistical, and optimization models
Select appropriate algorithms and models for AI/ML and test them for accuracy, robustness, and fairness
Perform feature selection and engineering
Integrate domain knowledge into ML solutions
Conduct controlled experiments (A/B and multivariate testing)
Collaborate with MLOps, data engineers, and IT to evaluate deployment options
Continuously monitor execution and health of production ML models

Fulltime

Select Country

Member of Technical Staff, Research Tooling & Data Platform

Runway

Location:
United States

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
January 20, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Member of Technical Staff, Research Tooling & Data Platform

Member of Technical Staff - Platform Engineer

Member of Technical Staff, AI Training Infrastructure