Machine Learning Systems Engineer Job at Susquehanna International Group (Bala Cynwyd (Philadelphia Area), Pennsylvania)

Machine Learning Systems Engineer

As a Machine Learning Systems Engineer on the AI & ML Platform team, you will bu...

Location

United States

Salary:

145800.00 - 229125.00 USD / Year

Atlassian

Expiration Date

Until further notice

Requirements

Fluency in at least one modern object-oriented programming language (preferably Java/Kotlin)
Understanding and experience with Machine Learning project lifecycle and tools
Understanding of LLMs, best deployment practices and inference optimisation
Experience in building and implementing high-performance RESTful micro-services
Experience building and operating large scale distributed systems using Amazon Web Services (Sagemaker, S3, Cloud Formation, AWS Security and Networking)
Experience with Continuous Delivery and Continuous Integration

Job Responsibility

Build and scale the core infrastructure to allow software engineers, ML engineers & data scientists to develop, train, evaluate, deploy, and operate Machine Learning models and pipelines
Build systems for product teams like Jira & Confluence to provide access to curated LLMs
Use software development expertise to solve difficult problems, tackling infrastructure and architecture challenges
Lead engineers to drive involved projects from technical design to launch
Collaborate with other teams and internal customers to set expectations, gather input and communicate results
Regularly tackle complex problems in the team, from technical design to launch
Routinely tackle complex architecture challenges and defines coding standards & patterns for the team
Lead the team through times of ambiguity, help them adapt and deliver positive impact
Mentor junior members on the team

What we offer

Health coverage
Paid volunteer days
Wellness resources
Bonuses
Commissions
Equity

Fulltime

Principal Machine Learning Systems Engineer

Search Platform powers the search functionality in Atlassian products. The team ...

Location

Salary:

Not provided

Atlassian

Expiration Date

Until further notice

Requirements

10+ years experience in multiple hands-on software/technology leadership roles, with end-to-end responsibility through the software development lifecycle
Worked on scaling ML use cases for 50+ TB of data
Good understanding of PySpark and Databricks jobs scaling challenges
Experience with ML workflows and observability at scale.
Bachelor's degree with a preference for Computer Science degree
Expertise with one or more prominent languages such as Java, Python, Kotlin, Go, or TypeScript is required.
Understanding of SaaS, PaaS, IaaS industry with hands-on experience with public cloud offerings (e.g., AWS, GCP, or Azure)
Java, Spring, REST, and NoSQL databases
Experience building event-driven based on SQS, SNS, Kafka or equivalent technologies
Knowledge to evaluate trade-offs between correctness, robustness, performance, space and time

Job Responsibility

Handle complex problems in the team from technical design to launch
Determine plans-of-attack on large projects
Solve complex architecture challenges and apply architectural standards and start using them on new projects
Lead code reviews & documentation and take on complex bug fixes, especially on high-risk problems
Set the standard for meaningful code reviews
Partner across engineering teams to take on company-wide programmes in multiple projects
Transfer your depth of knowledge from your current language to excel as a Software Engineer
Mentor junior members of the team

What we offer

Atlassians can choose where they work – whether in an office, from home, or a combination of the two
health and wellbeing resources
paid volunteer days

Senior Machine Learning Systems Engineer

Our organization drives AI innovation across Jira products. We deliver seamless ...

Location

Salary:

Not provided

Atlassian

Expiration Date

Until further notice

Requirements

Extensive experience building Machine Learning and AI solutions (4+ years)
Proven experience developing, deploying, and maintaining end-to-end ML systems, including data engineering, model serving, and monitoring
Expert proficiency with GenAI frameworks and tools, including developing and fine-tuning large language models (LLMs) and building retrieval-augmented generation (RAG) systems
Expert proficiency in Python and ML frameworks like PyTorch, TensorFlow, or JAX
Experience implementing MLOps, CI/CD pipelines, and automation for continuous training, deployment, and monitoring of ML models

Job Responsibility

Collaborate with software engineers, data scientists, and product managers to solve complex problems
Lead projects from technical design through launch
Partner with teams to achieve impactful results
Deliver robust ML solutions to build AI features reaching millions
This includes curating ML datasets, fine-tuning open-source LLMs, or accessing proprietary LLMs
Mentor junior members of the team

What we offer

Health and wellbeing resources
Paid volunteer days

Senior Machine Learning Systems Engineer

As a Senior Machine Learning Systems Engineer at Abridge, you’ll play a pivotal ...

Location

United States , San Francisco

Salary:

221000.00 - 260000.00 USD / Year

Abridge

Expiration Date

Until further notice

Requirements

Strong experience in building and deploying machine learning models in production environments
Deep understanding of container orchestration and distributed systems architecture
Expertise in Kubernetes administration, including custom resource definitions, operators, and cluster management
Experience developing APIs and managing distributed systems for both batch and real-time workloads
Excellent communication skills, with the ability to interface between research and product engineering

Job Responsibility

Design, deploy and maintain scalable Kubernetes clusters for AI model inference and training
Develop, optimize, and maintain ML model serving and training infrastructure, ensuring high-performance and low-latency
Collaborate with ML and product teams to scale backend infrastructure for AI-driven products, focusing on model deployment, throughout optimization, and compute efficiency
Optimize compute-heavy workflows and enhance GPU utilization for ML workloads
Build a robust model API orchestration system
Collaborate with leadership to define and implement strategies for scaling infrastructure as the company grows, ensuring long-term efficiency and performance

What we offer

Generous Time Off: 14 paid holidays, flexible PTO for salaried employees, and accrued time off for hourly employees
Comprehensive Health Plans: Medical, Dental, and Vision coverage for all full-time employees and their families
Generous HSA Contribution: If you choose a High Deductible Health Plan, Abridge makes monthly contributions to your HSA
Paid Parental Leave: Generous paid parental leave for all full-time employees
Family Forming Benefits: Resources and financial support to help you build your family
401(k) Matching: Contribution matching to help invest in your future
Personal Device Allowance: Tax free funds for personal device usage
Pre-tax Benefits: Access to Flexible Spending Accounts (FSA) and Commuter Benefits
Lifestyle Wallet: Monthly contributions for fitness, professional development, coworking, and more
Mental Health Support: Dedicated access to therapy and coaching to help you reach your goals

Fulltime

Research Engineer - Machine Learning and Systems

We are hiring a principal level Research Engineer with deep strength in machine ...

Location

United States , New York City

Salary:

100000.00 - 250000.00 USD / Year

Helpcare AI

Expiration Date

Until further notice

Requirements

PhD in Computer Science, Machine Learning, Computer Graphics, Computer Vision, or related field, or equivalent research track record
Seven or more years of experience in applied ML or research engineering including significant time in fast paced or startup settings
Strong publication record in top venues such as NeurIPS, ICLR, ICML, CVPR, ECCV, ICCV, SIGGRAPH, or TOG with multiple first author papers or equivalent impactful artifacts
Proven experience training and serving large models at scale including multi GPU or multi node training, distributed data loading, mixed precision, and memory optimization
Fluency in Python and C++ and experience writing efficient CUDA or Triton kernels
Expertise with PyTorch or JAX and modern tooling for experiment tracking, evaluation, and deployment
Demonstrated ability to take ideas from paper to production with measurable impact on users or business outcomes
Strong systems skills including profiling, performance tuning, reliability engineering, and cost awareness
Excellent communication with the ability to work across research and product teams

Job Responsibility

Research, design, and implement models and systems across vision, generative modeling, simulation, rendering, and 3D perception
Build data, training, evaluation, and deployment pipelines with strong observability and reproducibility
Translate research insights into reliable production services that meet product and latency requirements
Contribute hands on across prototyping, optimization, integration, and scaling
Survey new methods and run grounded evaluations to identify what to adopt and when
Share expertise through design reviews, mentoring, and documentation

What we offer

Relocation support available

Fulltime

Machine Learning Systems Research Engineer, Agent Post-training - Enterprise GenAI

The Enterprise ML Research Lab works on the front lines of this AI revolution. W...

Location

United States , San Francisco; New York

Salary:

218400.00 - 273000.00 USD / Year

Scale

Expiration Date

Until further notice

Requirements

At least 1-3 years of LLM training in a production environment
Passionate about system optimization
Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc.
Ability to demonstrate know-how on how to operate the architecture of the modern GPU cluster
Experience with multi-node LLM training and inference
Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc.
Strong written and verbal communication skills to operate in a cross functional team environment
PhD or Masters in Computer Science or a related field

Job Responsibility

Build, profile and optimize our training and inference framework
Post-train state of the art models, developed both internally and from the community, to define stable post-training recipes for our enterprise engagements
Collaborate with ML teams to accelerate their research and development, and enable them to develop the next generation of models and data curation
Create a next-gen agent training algorithm for multi-agent/multi-tool rollouts

What we offer

Comprehensive health, dental and vision coverage
retirement benefits
a learning and development stipend
generous PTO
additional benefits such as a commuter stipend
equity based compensation

Fulltime

Machine Learning Data Engineer - Systems & Retrieval

As a Machine Learning Data Engineer - Systems & Retrieval, you will build and op...

Location

United States , Palo Alto

Salary:

Not provided

Zyphra

Expiration Date

Until further notice

Requirements

Strong software engineering background with fluency in Python
Experience designing, building, and maintaining data pipelines in production environments
Deep understanding of data structures, storage formats, and distributed data systems
Familiarity with indexing and retrieval techniques for large-scale document corpora
Understanding of database systems (SQL and NoSQL), their internals, and performance characteristics
Strong attention to security, access controls, and compliance best practices (e.g., GDPR, SOC2)
Excellent debugging, observability, and logging practices to support reliability at scale
Strong communication skills and experience collaborating across ML, infra, and product teams

Job Responsibility

Design and implementation of distributed data ingestion and transformation pipelines
Building retrieval and indexing systems that support RAG and other LLM-based methods
Mining and organizing large unstructured datasets, both in research and production environments
Collaborating with ML engineers, systems engineers, and DevOps to scale pipelines and observability
Ensuring compliance and access control in data handling, with security and auditability in mind

What we offer

Comprehensive medical, dental, vision, and FSA plans
Competitive compensation and 401(k)
Relocation and immigration support on a case-by-case basis
On-site meals prepared by a dedicated culinary team
Thursday Happy Hours

Fulltime

Machine Learning Engineer, Distributed Data Systems

As a Research Engineer, Distributed Data Systems, you will design and scale the ...

Location

United States , San Francisco

Salary:

295000.00 - 445000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

Strong experience with distributed systems and large-scale infrastructure
Detail-oriented and bring rigor to building and maintaining reliable systems
Excellent software engineering fundamentals and organizational skills
Comfortable with ambiguity and rapid change

Job Responsibility

Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security
Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient
Partner with researchers to deeply understand requirements and translate them into production-ready systems
Harden, optimize, and maintain critical data infrastructure systems that power multimodal training and evaluation

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Select Country

Machine Learning Systems Engineer

Job Description

Job Responsibility

Requirements

Looking for more opportunities?

Machine Learning Systems Engineer

Machine Learning Systems Engineer

Principal Machine Learning Systems Engineer

Senior Machine Learning Systems Engineer

Senior Machine Learning Systems Engineer

Research Engineer - Machine Learning and Systems

Machine Learning Systems Research Engineer, Agent Post-training - Enterprise GenAI

Machine Learning Data Engineer - Systems & Retrieval

Machine Learning Engineer, Distributed Data Systems

Our AI answers in your language