Senior Member of technical staff (Infrastructure) Job at H Company (London)

Senior Staff Machine Learning Engineer

Help design our AI platform and develop our next generation of machine learning ...

Location

United States , San Francisco

Salary:

216500.00 - 324500.00 USD / Year

GoFundMe

Expiration Date

Until further notice

Requirements

9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields
Experience emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation
Extensive experience designing, developing, and operating scalable backend systems
Experience applying software engineering best practices such as domain-driven design, event-driven architectures, and microservices
Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices
Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design
Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices
Strong leadership skills, including effective planning and management of complex projects, mentoring of team members, and fostering a collaborative, high-performing engineering culture
Excellent communicator, able to articulate complex technical concepts clearly to both technical and non-technical stakeholders
Bachelor's degree in Computer Science, Software Engineering, or a related technical field (preferred)

Job Responsibility

Design and implement AI platforms to enable scalable and secure access to LLMs from multiple model providers for diverse use cases
Design and implement agentic workflows, agentic tool ecosystems, and LLM prompt management solutions
Design, build, and optimize scalable model training, fine tuning, and inference pipelines, ensuring robust integration with production systems
Influence technical strategy and approach to developing embedding stores, vector databases, and other reusable assets
Lead initiatives to streamline ML and AI workflows, improve operational efficiency, and establish standardized procedures to achieve consistent, high-quality results across our AI systems
Design and develop backend services and RESTful APIs using Python and FastAPI, integrating seamlessly with ML pipelines and services
Take operational responsibility for team-owned services, including performance monitoring, optimization, troubleshooting, and participation in an on-call rotation
Collaborate with both technical and non-technical colleagues, including data and applied scientists, software engineers, product managers, and business stakeholders, to deliver reliable and scalable ML-driven products
Coach and mentor fellow ML engineers, promoting a culture of collaboration, continuous improvement, and engineering excellence within the team
Employ a diverse set of tools and platforms including Python, AWS, Databricks, Docker, Kubernetes, FastAPI, Terraform, Snowflake, Coralogix, and GitHub to build, deploy, and maintain scalable, highly available machine learning infrastructure

What we offer

Competitive pay
Comprehensive healthcare benefits
Financial assistance for things like hybrid work, family planning
Generous parental leave
Flexible time-off policies
Mental health and wellness resources
Learning, development, and recognition programs

Fulltime

Member of Technical Staff, Infrastructure Data & Analytics

We are seeking experienced Infrastructure Data & Analytics Engineers to join our...

Location

United States , Multiple Locations; Mountain View; San Francisco Bay area; New York City metropolitan area

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, or related technical field AND 8+ years technical engineering experience with data engineering, analytics, or data science, with increasing technical ownership in startup environment AND 6+ years experience with distributed data processing frameworks and large-scale data systems
OR equivalent experience
Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with technical engineering experience with data engineering, analytics, or data science, with increasing technical ownership in startup environment AND 10+ years experience with distributed data processing frameworks and large-scale data systems
OR equivalent experience
Proven technical leadership in data engineering, analytics platforms, or large-scale telemetry systems
Hands-on experience with ETL orchestration frameworks such as Airflow, Dagster, or similar
Strong communication skills
can explain complex systems clearly to senior leader

Job Responsibility

Act as the technical lead and owner for infrastructure analytics across compute, storage, and networking
Design and build durable, scalable data pipelines that ingest telemetry from clusters, schedulers, health systems, and capacity trackers into Data Warehouse
Define and standardize core metrics and semantics (e.g., utilization, occupancy, MFU, goodput, capacity readiness, delivery-to-production)
Architect and maintain self-service dashboards and APIs for fleet, cluster, and squad-level visibility
Partner closely with stakeholders across Supercomputing Infra, Researchers, Strategy and Executives to ensure metrics reflect operational and business reality
Implement robust and fault-tolerant systems for data ingestion and processing
Lead data architecture and engineering decisions, applying strong technical judgment to proactively shape executive-level discussions and decisions
Identify data gaps and instrumentation issues
drive fixes by influencing upstream engineering teams
Establish data quality, validation, documentation, and governance so metrics are trusted and repeatable

Fulltime

Senior Member of Technical Staff - Sys

We are seeking an experienced DevOps & Infrastructure Engineer to lead our infra...

Location

India , Chennai

Salary:

Not provided

Aptiv plc

Expiration Date

Until further notice

Requirements

7-10 years of experience in DevOps, infrastructure management, and CI/CD implementation
Strong programming skills in Python or Go
Experience with Terraform/OpenTofu for infrastructure as code
Proficiency with AWS services and architecture patterns
Experience with configuration management tools such as Puppet and Ansible
Strong understanding of networking concepts and network security principles
Experience implementing security hardening for public-facing infrastructure
Ability to balance technical requirements with business needs
Knowledge of containerization and orchestration (Docker, Kubernetes)

Job Responsibility

Design, implement, and maintain infrastructure across cloud providers (primarily AWS) and physical environments
Develop and enhance GitLab or Jenkins CI/CD pipelines
Create and maintain Infrastructure as Code using Terraform/OpenTofu
Implement automation solutions to streamline development workflows
Ensure security best practices across all infrastructure components
Support development teams with infrastructure needs and performance optimization
Develop infrastructure documentation and knowledge sharing

What we offer

Named Top Workplace for the 10th year in a row
Wind River’s commitment to DEIB
Birthday and Volunteer Time off
Competitive Salary & Benefits Package
Extensive Learning Programs
Wellness Benefits through Unmind

Fulltime

Member of technical staff - Research - Agent

About H: H exists to push the boundaries of superintelligence with agentic AI. B...

Location

France; United Kingdom , Paris; London

Salary:

Not provided

H Company

Expiration Date

Until further notice

Requirements

Senior Experience: Previous demonstrable role(s) as a Staff, Principal, or Senior Engineer (or equivalent Research Scientist) in a Frontier AI Lab with a proven track record of leading complex, end-to-end AI/ML projects from conception to production
Education / Publication: Preferably PhD (or equivalent research experience) in Machine Learning, Computer Science, or a related field, preferably with a strong publication record (e.g., NeurIPS, ICML, ICLR) in Computer Science
Core Expertise: Deep theoretical and practical expertise in Agentic AI and proven experience building, scaling, and shipping solutions involving foundation models (LLMs/VLMs)
Soft Skills: Collaborative: Enjoys collaboration and thrives in a teamwork-oriented, fast-paced research environment
High-Impact Communicator: Possesses impactful communication skills, with the ability to bridge the gap between research and engineering and articulate complex ideas clearly
Mission-Driven: Genuinely eager to explore and solve the new engineering and research challenges at the frontier of agentic AI

Job Responsibility

Research & Leadership: Design and develop new agents, proposing new research directions, e.g., combining state-of-the-art RL with foundation models (LLMs/VLMs)
Algorithm & Systems Design: Design, implement, and scale complex, high-performance systems for training large-scale agents. This includes both the foundational infrastructure and the novel algorithms, reward models, and sophisticated training environments
Research-to-Production: Collaborate closely with researchers and engineers to implement, test, and productionize new agent logics, learning algorithms, and system architectures
Evaluation & Reliability: Create, manage, and scale massive benchmarks and evaluation systems to rigorously track agent capabilities. You will own system reliability, scalability, and observability for our entire research infrastructure
Mentorship & Standards: Mentor and guide other engineers and researchers on the team, fostering technical excellence. You will establish and enforce engineering standards, tooling, and best practices for both code and research design
Innovation: Conduct thorough code and design reviews, champion technical innovation, and proactively address technical debt to accelerate the R&D lifecycle

What we offer

Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startups
Collaborate with a fun, dynamic, and multicultural team, working alongside world-class AI talent in a highly collaborative environment
Enjoy a competitive salary
Unlock opportunities for professional growth, continuous learning, and career development

Fulltime

Senior Member of Technical Staff, Multimodal AI

At Cohere, we believe in the power of multimodal AI to revolutionise the way we ...

Location

Salary:

Not provided

Cohere

Expiration Date

Until further notice

Requirements

Exceptional software engineering skills with a proven track record of building robust and scalable systems
Strong command of Python and well-versed in popular deep learning frameworks like JAX, PyTorch, and TensorFlow, with an understanding of their multimodal capabilities
Knowledge of distributed training strategies, especially for large-scale multimodal models
Familiarity with autoregressive models, particularly their application in multimodal tasks such as image or video captioning, speech-to-text generation

Job Responsibility

Design and develop cutting-edge multimodal AI systems, integrating various modalities such as text, speech, and vision
Conduct research and experiments on our advanced compute infrastructure, exploring novel ideas in multimodal representation learning, transfer learning, and more
Collaborate closely with our world-class teams, learning from and contributing to their expertise in the field

What we offer

An open and inclusive culture and work environment
Work closely with a team on the cutting edge of AI research
Weekly lunch stipend, in-office lunches & snacks
Full health and dental benefits, including a separate budget to take care of your mental health
100% Parental Leave top-up for up to 6 months
Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
6 weeks of vacation (30 working days!)

Fulltime

Member of Technical Staff, Software Co-Design AI HPC Systems

Our team’s mission is to architect, co-design, and productionize next-generation...

Location

United States , Mountain View

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Strong background in one or more of the following areas: AI accelerator or GPU architectures
Distributed systems and large-scale AI training/inference
High-performance computing (HPC) and collective communications
ML systems, runtimes, or compilers
Performance modeling, benchmarking, and systems analysis
Hardware–software co-design for AI workloads
Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.

Job Responsibility

Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.

Fulltime

Staff Data Scientist

Neo Financial seeks an experienced and strategic Staff Data Scientist to provide...

Location

Canada , Calgary

Salary:

Not provided

Neo Financial

Expiration Date

Until further notice

Requirements

10+ years of experience deploying impactful ML models in production, driving commercial outcomes, and leading technical teams
Profound technical expertise in Python (pandas, scikit-learn), XGBoost (tuning, custom objectives, SHAP), and AWS (SageMaker, ECS)
Exceptional ML validation skills, including mitigating data leakage, rigorous OOT testing, and robust backtesting framework design
Extensive hands-on experience with Snowflake / Databricks
strong familiarity with MLflow & dbt, with proven ability to architect and optimize data workflows
A strategic, business-centric mindset
adept at navigating ambiguity, prioritizing in a fast-paced environment, and delivering high-value solutions on tight timelines
Demonstrated experience managing the technical development of data science teams of 5+ members, including mentoring staff and fostering a culture of technical excellence
Excellent communication and stakeholder management skills, able to articulate complex technical concepts to diverse audiences, including executive leadership

Job Responsibility

Spearheading technical strategy and end-to-end delivery of sophisticated ML models across marketing and loyalty
Managing the technical development of data scientists, guiding complex projects, fostering skill growth, and ensuring high-quality model implementation
Championing and evolving model explainability and business trust via advanced SHAP insights, validation reports, and clear cross-functional communication with senior stakeholders
Architecting and enhancing MLOps infrastructure, including automating model pipelines, implementing advanced versioning/drift detection, and streamlining auto-retraining
Establishing and enforcing rigorous model validation frameworks (e.g., advanced OOT validation, sophisticated temporal splits, comprehensive cross-validation) for exceptional model quality, generalization, and compliance
Mentoring and developing data scientists at all levels, leading technical design, reviewing code/model logic, and spearheading knowledge-sharing
Collaborating with executive and business leaders to identify and prioritize high-value data science initiatives, ensuring models address strategic problems and deliver commercial impact
Staying current with data science, ML, and MLOps advancements, and driving adoption of innovative technologies within the team

What we offer

All team members have a stake in Neo’s success and earn meaningful equity through stock options

Fulltime

Cloud Architect

The Cloud Architect role at NTT DATA involves leading the Managed Services portf...

Location

Canada , Remote

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

8–12+ years in Modern Workplace / Endpoint / EUC engineering/Azure, including 3–5+ years in technical leadership for production services
Deep knowledge of Entra ID and Conditional Access, Zero Trust device access patterns, and integration with endpoint security tooling
Strong ITSM background (Incident/Problem/Change), CAB governance, change risk assessment, and RCA/postmortem leadership
Demonstrated SRE mindset: monitoring/alerting, reliability improvements, and automation-driven operational excellence
Azure platform operational experience (governance, RBAC, policy, monitoring, reliability)

Job Responsibility

Partner directly with clients to define managed services strategy, roadmap, and service catalog across cloud platforms, EUC/Intune, identity, security, and infrastructure
Own the managed services operating model, including organizational structure, roles, and service delivery framework
Lead and inspire service managers, technical leads, engineers, and support teams to drive operational excellence and consistent achievement of SLAs and SLOs
Define, implement, and own KPIs and performance metrics across managed services contracts, ensuring transparency and continuous improvement
Establish clear accountability for technical leaders and managers
define, maintain, and evolve role definitions and responsibilities
Partner with technical leadership to design and maintain standardized processes, runbooks, automation patterns, and governance frameworks that enable consistent, repeatable service delivery at scale
Oversee client onboarding, quarterly business reviews, service escalations, customer retention, and long‑term account health
Mentor and develop service leaders, team leads, and technical staff, building strong leadership pipelines and technical depth across the organization
Actively contribute as a senior member of the Cloud Center of Excellence (CCoE), influencing standards, best practices, and strategic direction

Select Country

Senior Member of technical staff (Infrastructure)

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?