CrawlJobs Logo

Member of Technical Staff, Compute Orchestration & Scheduling

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Mountain View

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

139900.00 - 274800.00 USD / Year

Job Description:

Microsoft AI is looking for a Member of Technical Staff, Compute Orchestration & Scheduling to help build the next wave of capabilities of our personalized AI assistant, Copilot. We’re looking for someone who will bring an abundance of positive energy, empathy, and kindness to the team every day, in addition to being highly effective. The right candidate enjoys building world-class consumer experiences and products in a fast-paced environment. You will actively contribute to the development of AI models that are powering our innovative products. You will wear multiple hats and work on engineering, research, and everything in between. Your contributions will span model architecture, data curation, training and inference infrastructures, evaluation protocols, alignment and reinforcement learning from human feedback (RLHF), and many other exciting topics at the cutting edge of AI. Microsoft AI is building foundational models to develop novel responsible and efficient artificial general intelligence. The foundational models require large compute-capacity, and as a Member of Technical Staff, Compute Orchestration & Scheduling you would be responsible for designing and building our compute orchestration and scheduling layer on top of Kubernetes and Ray, working on everything from workload placement and scaling to reliability and developer experience. You’ll work closely with research and framework teams to turn their requirements into scalable abstractions, improve cluster efficiency, and ensure our compute platform is observable, and easy to operate in production. As a contributing member of the core group of engineers, you would also bring to the table best practices driving architectural changes and influence roadmap of relevant software and hardware components. Your work will directly impact the business goals of a wide range of users and facilitate the next wave of growth and innovation in AI.

Job Responsibility:

  • Develop and tune the pretraining scalable software for Nvidia GB200 72NVL CX8 and AMD MIxxx architectures
  • Benchmark GB200 and AMD MIxxx GPU clusters
  • Gather data and insights to develop the pretraining compute roadmap
  • Care deeply about conversational AI and its deployment
  • Actively contribute to the development of AI models that are powering our innovative products
  • Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively
  • Enjoy working in a fast-paced, design-driven, product development cycle
  • Embody our Culture and Values

Requirements:

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience

Additional Information:

Job Posted:
April 01, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Member of Technical Staff, Compute Orchestration & Scheduling

Member of Technical Staff, Cloud Infrastructure

As a Software Engineer on our Cloud Infrastructure team, you'll be at the forefr...
Location
Location
United States , New York, NY; San Mateo, CA; Redwood City, CA
Salary
Salary:
175000.00 - 220000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • 5+ years of experience designing and building backend infrastructure in cloud environments (e.g., AWS, GCP, Azure)
  • Proven experience in ML infrastructure and tooling (e.g., PyTorch, TensorFlow, Vertex AI, SageMaker, Kubernetes, etc.)
  • Strong software development skills in languages like Python, or C++
  • Deep understanding of distributed systems fundamentals: scheduling, orchestration, storage, networking, and compute optimization
Job Responsibility
Job Responsibility
  • Architect and build scalable, resilient, and high-performance backend infrastructure to support distributed training, inference, and data processing pipelines
  • Lead technical design discussions, mentor other engineers, and establish best practices for building and operating large-scale ML infrastructure
  • Design and implement core backend services (e.g., job schedulers, resource managers, autoscalers, model serving layers) with a focus on efficiency and low latency
  • Drive infrastructure optimization initiatives, including compute cost reduction, storage lifecycle management, and network performance tuning
  • Collaborate cross-functionally with ML, DevOps, and product teams to translate research and product needs into robust infrastructure solutions
  • Continuously evaluate and integrate cloud-native and open-source technologies (e.g., Kubernetes, Ray, Kubeflow, MLFlow) to enhance our platform’s capabilities and reliability
  • Own end-to-end systems from design to deployment and observability, with a strong emphasis on reliability, fault tolerance, and operational excellence
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary
  • Comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Infrastructure Data & Analytics

We are seeking experienced Infrastructure Data & Analytics Engineers to join our...
Location
Location
United States , Multiple Locations; Mountain View; San Francisco Bay area; New York City metropolitan area
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, or related technical field AND 8+ years technical engineering experience with data engineering, analytics, or data science, with increasing technical ownership in startup environment AND 6+ years experience with distributed data processing frameworks and large-scale data systems
  • OR equivalent experience
  • Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with technical engineering experience with data engineering, analytics, or data science, with increasing technical ownership in startup environment AND 10+ years experience with distributed data processing frameworks and large-scale data systems
  • OR equivalent experience
  • Proven technical leadership in data engineering, analytics platforms, or large-scale telemetry systems
  • Hands-on experience with ETL orchestration frameworks such as Airflow, Dagster, or similar
  • Strong communication skills
  • can explain complex systems clearly to senior leader
Job Responsibility
Job Responsibility
  • Act as the technical lead and owner for infrastructure analytics across compute, storage, and networking
  • Design and build durable, scalable data pipelines that ingest telemetry from clusters, schedulers, health systems, and capacity trackers into Data Warehouse
  • Define and standardize core metrics and semantics (e.g., utilization, occupancy, MFU, goodput, capacity readiness, delivery-to-production)
  • Architect and maintain self-service dashboards and APIs for fleet, cluster, and squad-level visibility
  • Partner closely with stakeholders across Supercomputing Infra, Researchers, Strategy and Executives to ensure metrics reflect operational and business reality
  • Implement robust and fault-tolerant systems for data ingestion and processing
  • Lead data architecture and engineering decisions, applying strong technical judgment to proactively shape executive-level discussions and decisions
  • Identify data gaps and instrumentation issues
  • drive fixes by influencing upstream engineering teams
  • Establish data quality, validation, documentation, and governance so metrics are trusted and repeatable
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Software Engineer

Help build the infrastructure that powers training, evaluation, and data platfor...
Location
Location
Switzerland , Zürich
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong software engineering background building reliable, scalable production systems (Python preferred)
  • Hands‑on experience supporting large‑scale ML / LLM training, evaluation, or experimentation infrastructure
  • Operating GPU‑heavy workloads in cloud environments using Docker and Kubernetes (scheduling, utilization, isolation)
  • Designing and running data / compute pipelines and orchestration (e.g., Airflow, Argo) with object storage (Azure Blob / S3)
  • Platform reliability and operability: observability, metrics, logging, tracing, alerting (Prometheus, Grafana, OpenTelemetry)
Job Responsibility
Job Responsibility
  • Design and build core platform services for scalable training and evaluation, including cluster orchestration, job scheduling, data and compute pipelines, and artifact management
  • Standardize containerized workflows by maintaining Docker images, CI/CD, and runtime configurations
  • advocate for best practices in security, reproducibility, and cost efficiency
  • Implement end-to-end observability and operations through metrics, tracing, logging, dashboard development, monitoring, and automated alerts for model training and platform health (using Prometheus, Grafana, OpenTelemetry)
  • Architect and operate services on Azure cloud platforms, managing infrastructure-as-code (Terraform/Helm), secrets, networking, and storage
  • Enhance developer experience by creating tools, CLIs, and portals that simplify job submission, metrics analysis, and experiment management for generalist software engineering and research teams
  • Enforce security and compliance policies for data access, container hardening, and supply-chain integrity, and partner with security and privacy teams to maintain robust practices in multi-tenant environments and secret management
  • Collaborate cross-functionally with data, model, and product teams to align infrastructure roadmaps with training needs, evaluation protocols, and Copilot product goals
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Site Reliability Engineer (HPC)

As Microsoft continues to push the boundaries of AI, we are on the lookout for p...
Location
Location
United States , Mountain View
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
  • OR equivalent experience
  • Strong proficiency in Kubernetes, Docker, and container orchestration
  • Knowledge of CI/CD pipelines for Inference and ML model deployment
  • Hands-on experience with public cloud platforms like Azure/AWS/GCP and infrastructure-as-code
  • Expertise in monitoring & observability tools (Grafana, Datadog, OpenTelemetry, etc.)
  • Strong programming/scripting skills in Python, Go, or Bash
  • Solid knowledge of distributed systems, networking, and storage
  • Experience running large-scale GPU clusters for ML/AI workloads (preferred)
Job Responsibility
Job Responsibility
  • Reliability & Availability: Ensure uptime, resiliency, and fault tolerance of HPC clusters powering MAI model training and inference
  • Observability: Design and maintain monitoring, alerting, and logging systems to provide real-time visibility into all aspects of HPC systems including GPU, clusters, storage and networking
  • Automation & Tooling: Build automation for deployments, incident response, scaling, and failover in CPU+GPU environments
  • Incident Management: Lead on-call rotations, troubleshoot production issues, conduct blameless postmortems, and drive continuous improvements
  • Security & Compliance: Ensure data privacy, compliance, and secure operations across model training and serving environments
  • Collaboration: Partner with ML engineers and platform teams to improve developer experience and accelerate research-to-production workflows
What we offer
What we offer
  • Competitive compensation, equity options, and comprehensive benefits
  • Fulltime
Read More
Arrow Right

Staff Systems Software Engineer, Infrastructure Platform

The Infrastructure Engineering organisation at GM is building a cloud-native pla...
Location
Location
United States , Austin; Mountain View; Warren
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science or related field, or equivalent work experience
  • 8+ years of software engineering experience with a strong track record of building and operating production distributed systems
  • Deep platform or infrastructure engineering experience, with hands-on work building APIs, schedulers, orchestrators, or similar systems at scale
  • Strong proficiency in Go, with ability to write clean, maintainable, and performant production code for backend services
  • Solid understanding of distributed systems fundamentals including consistency models, failure handling, idempotency, retry patterns, and circuit breakers
  • Experience with cloud-native technologies such as Kubernetes, Nomad, Consul, or similar orchestration and service discovery platforms
  • Strong API design skills with understanding of RESTful patterns, authentication and authorisation models (OIDC, RBAC), versioning strategies, and error handling
  • Deep experience with relational databases, particularly PostgreSQL, including schema design, indexing strategies, query optimisation, and migration management
  • Architectural thinking with ability to evaluate trade-offs, balance simplicity with flexibility, design for current requirements and future growth, and document decisions effectively
  • Strong communication skills with ability to explain complex technical concepts to both engineering and business stakeholders
Job Responsibility
Job Responsibility
  • Design and implement core platform services including the API gateway, scheduler, lifecycle orchestrator, and synchronisation services using Go and cloud-native patterns
  • Build RESTful APIs with authentication (OIDC, RBAC), authorisation, versioning, and observability, architecting the inventory database system using PostgreSQL for resource metadata, capabilities, and state management
  • Develop intelligent scheduling and orchestration logic that matches workload requirements to resource capabilities with support for automated pooling, reservation modes, and hybrid allocation strategies
  • Build developer CLI tooling and integrate with the control plane, enabling developers to discover, allocate, and manage infrastructure resources through intuitive commands
  • Implement provisioning workflows that coordinate firmware flashing, health checks, power cycling, and resource validation across diverse automotive hardware configurations
  • Collaborate with stakeholders across Infrastructure Engineering, Quality Engineering, and Hardware Infrastructure to understand workflows and integrate with existing systems
  • Lead architectural discussions, conduct code reviews, document technical decisions, and mentor team members on distributed systems patterns and Go development
  • Work with tools and technologies including Go, PostgreSQL, Kubernetes, Nomad, Consul, RESTful APIs with OIDC authentication and RBAC authorisation, Datadog, S3-compatible object storage (MinIO), CI/CD pipelines, and Git/GitHub
What we offer
What we offer
  • From day one, we're looking out for your well-being–at work and at home–so you can focus on realizing your ambitions
  • Fulltime
Read More
Arrow Right
New

Sales assistant

We are looking for a salesperson whose main task is to help our customers. A tru...
Location
Location
Switzerland , Geneva
Salary
Salary:
Not provided
inditex.com Logo
Inditex
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Passionate about fashion
  • Enjoys teamwork
  • Enthusiastic, ambitious and flexible
  • Always up to date on the latest trends
  • Commercial and customer service oriented
  • Committed, reliable, completes tasks quickly and efficiently
Job Responsibility
Job Responsibility
  • Advise and welcome customers
  • Guarantee the brand's image
  • Promote customer loyalty
  • Know trends, products and their characteristics
  • Ensure merchandise flow (stocks, deliveries, new arrivals, restocking, etc.)
  • Maintain order and cleanliness in the store
  • Manage fitting room procedures
  • Parttime
Read More
Arrow Right
New

IT Expert for Data Integration

EU funded project: 'Technical Assistance for Strengthening the Capacity of Minis...
Location
Location
Türkiye , Ankara
Salary
Salary:
Not provided
agrotec-spa.net Logo
Agrotec Spa
Expiration Date
April 30, 2026
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or higher in Computer Science, Information Systems, Software Engineering, or a related field
  • Minimum 5 years of professional experience in data integration, database management, or health information systems
  • Proven experience in integrating multiple data sources and ensuring interoperability
  • Experience with database design, APIs, and data exchange protocols
  • Strong problem-solving and analytical skills
  • Ability to work with both technical and non-technical stakeholders
  • Excellent command of English (spoken and written)
  • Fulltime
Read More
Arrow Right
New

District Support Pharmacist

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States , North Las Vegas
Salary
Salary:
60.00 - 76.00 USD / Hour
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
October 18, 2026
Flip Icon
Requirements
Requirements
  • Active Pharmacist License in the state where the Store is located
  • Active National Provider Identifier (NPI)
  • Not on the DEA Excluded Parties list
  • Ability to travel within a reasonable radius to support market staffing as business needs require
  • Regular and predictable attendance, including nights and weekends
  • Ability to complete required training within designated timeframe
  • Attention and Focus
  • Customer Service and Team Orientation
  • Communication Skills
  • Mathematical Reasoning
Job Responsibility
Job Responsibility
  • Traveling the district to fill pharmacist shifts as scheduled by the District Performance Coordinator (DPC)
  • overseeing the pharmacy and serving as the Pharmacy Manager’s proxy during bench shifts without overlap
  • Supporting safe and accurate prescription fulfillment
  • Assumes Pharmacy Manager’s day-to-day duties when serving as the only or the primary pharmacist-on-duty
  • Contributing to positive patient experiences
  • Proactively offering and delivering immunizations
  • Supporting the effective management of pharmacy inventory
  • Remaining flexible for both scheduling and business needs
  • Maintaining relevant clinical and technical skills
  • Supporting access to care and helping to improve patient outcomes through pharmacist delivered clinical care
What we offer
What we offer
  • Affordable medical plan options
  • a 401(k) plan (including matching company contributions)
  • an employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • paid time off
  • flexible work schedules
  • family leave
  • dependent care resources
  • colleague assistance programs
  • tuition assistance
  • Parttime
Read More
Arrow Right