CrawlJobs Logo

Member of Technical Staff, Capacity & Efficiency Infrastructure

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Mountain View

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

119800.00 - 234700.00 USD / Year

Job Description:

Microsoft AI is looking for a Member of Technical Staff – Capacity & Efficiency Infrastructure, to help us improve manage, and improve the efficiency of, our compute fleet. We’re seeking someone who brings an abundance of positive energy, empathy, and kindness to the team every day, in addition to being highly effective. The ideal candidate enjoys building world-class consumer experiences and products in a fast-paced environment. You will actively contribute to the development of AI models powering our innovative products. Expect to wear multiple hats and work across engineering, research, and everything in between. Your contributions will span model architecture, data curation, training and inference infrastructure, evaluation protocols, alignment and reinforcement learning from human feedback (RLHF), and many other exciting topics at the cutting edge of AI. Microsoft AI is building the training infrastructure that powers frontier-scale models and advances research toward humanist superintelligence. As a Member of Technical Staff – Capacity & Efficiency, you will contribute to a fast-moving codebase that enables training at an unprecedented scale. This role will require building software and mathematical models for measuring the effectiveness of our capacity usage and then developing tools and techniques to help us improve. This will require you to partner with ML researchers to scale up the latest research recipes, implement new forms of distributed training parallelism, and ensure the reliability and performance of thousands of GPUs across our supercomputing fleet. Profiling, benchmarking, debugging, and fine-grained optimization are core to this role, demanding both engineering rigor and creativity.

Job Responsibility:

  • Design, implement, test, and optimize distributed training infrastructure in Python and C++ for large-scale GPU clusters
  • Build and evolve telemetry systems to provide visibility into infrastructure & ML model performance, utilization, and cost related metrics
  • Profile, benchmark, and debug performance bottlenecks across compute, memory, networking, and storage subsystems
  • Drive architectural improvements across various ML services which deliver measurable efficiency improvements
  • Build and evolve tools to automatically provide insights and recommendations to improve fleet-wide efficiency
  • Optimize collective communication libraries (e.g., NCCL) for emerging NVLink and InfiniBand topologies
  • Partner with ML researchers and infrastructure engineers to understand their plans and future needs and develop plans to balance growth with efficiency
  • Collaborate with hardware teams to optimize for next-generation accelerators (NVIDIA, MAIA, and beyond)
  • Embody our Culture and Values

Requirements:

  • Bachelor’s Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Deep understanding of the fundamentals of GPU architectures and DL/LLM architectures
  • Deep experience in profiling and analyzing performance in large-scale distributed computing systems
  • Deep experience in profiling and analyzing performance in ML models especially GenAI models
  • Experience with low-level GPU programming (CUDA, Triton, NCCL) and frameworks such as PyTorch or JAX
  • Experience in leading technical projects and supporting architectural decisions with data
  • Experience building infrastructure for large-scale machine learning or generative AI workloads
  • Experience in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms
  • Track record of contributing to high-performance computing or large-scale AI infrastructure projects

Nice to have:

Bachelor’s Degree in Computer Science or related technical field AND 10+ years technical engineering experience with coding in languages including, but not limited to, C++ or Python OR Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C++ or Python OR equivalent experience

Additional Information:

Job Posted:
March 26, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:
PREMIUM
More languages and countries
+ Unlock 31694 hidden job offers
Languages
English Čeština Deutsch Ελληνικά Español Français +15
Countries
United States United Kingdom India Canada Australia +
See plans
Plans from $2.99 / month

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Member of Technical Staff, Capacity & Efficiency Infrastructure

Member of Technical Staff, Full Stack - ML Efficiency & Observability

Microsoft AI is looking for a Member of Technical Staff - Full Stack Engineer, M...
Location
Location
United States , Mountain View
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling or data engineering work
  • OR Master’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ year(s) experience in business analytics, data science, software development, or data engineering work
  • OR equivalent experience.
  • Experience with Capacity Management, Efficiency Management, ML Training and/or Inference
  • Solid expertise in JavaScript / TypeScript, React, HTML, CSS and browser internals
  • Solid understanding of web performance, accessibility, and cross‑browser compatibility
  • Experience with Development & Debugging with dev environments like Visual Studio or Visual Studio Code
  • Software development experience with Generative AI tools
  • Experience in leading technical projects and supporting architectural decisions with data.
Job Responsibility
Job Responsibility
  • Design and develop features for our capacity management portal
  • Design and develop features to provide visibility into model performance and quality across our fleet
  • Partner with ML researchers and PMs to translate functional requirements into highly functional, intuitive and appealing interfaces
  • Integrate with backend APIs from schedulers to training frameworks to build visibility across the training life cycle
  • Explore, develop, and adapt new innovations to the software development process
  • Contribute to the development of internal tooling and infrastructure
  • Implement best software development practices to ensure code quality. Hold a high quality bar.
  • Embody our culture and values.
  • Fulltime
Read More
Arrow Right

Tech. Centers&Energy Syst. Ser. Expert

At Vodafone, we’re not just shaping the future of connectivity for our customers...
Location
Location
Türkiye , İstanbul
Salary
Salary:
Not provided
vodafone.com Logo
Vodafone
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Electrical/Mechanical Engineer
  • Good Knowledge and experience on Energy infrastructure and power systems, high-capacity Air Conditioners systems, cabling and layout planning of Technology Center
  • Customer oriented, result oriented, emphatic, good relationship with customers, customer satisfaction oriented
  • Team work, innovative, adaptable, Flexible working (night-weekends-7/24 shift)
  • Min 5 years similar experience
  • Very Good English (at least Upper intermediate level)
  • Mainly GSM/IT environment experience
  • Experience at Telecom Operator or Mission Critical building
  • Candidate home close to Esenyurt Technology Center
  • Flexibility
Job Responsibility
Job Responsibility
  • Keep Esenyurt Technology Center up and running 7/24 basis
  • Ensure the 7/24 continuity of Esenyurt Technology Center operation
  • Conclude internal and external customer requests in a timely and complete manner within the framework of roles and responsibilities
  • Operate Esenyurt Technology Center in an energy efficient and sustainable manner
  • To apply Technical Infrastructure Maintenance and Operations Processes and defined actions via operation maintenance Subcontractor according to SLA KPIs. (L2 level operation)
  • To apply layout allocations for Network/IT Equipment at Technology Center
  • To make a periodical system checks and test of all infrastructure equipment, support independent audits
  • To fulfil commissioning and integration of new energy systems and cooling equipment.
  • To update & report electrical infrastructure layouts and drawings of Technology Center
  • To warn and inform planning team in case of any capacity shortage and performance degradation at infrastructure equipment in details
What we offer
What we offer
  • Vflexy: Flexible Benefits Program
  • Hybrid working kit
  • Ergonomic kit allowance
  • Digital meal voucher
  • Flexible transportation allowance.
  • Employee assistance hotline & counselling
  • Comprehensive and flexible private health insurance
  • Discounted price deals for wide range of products & services
  • Fulltime
Read More
Arrow Right

Data Center Senior Specialist

At Vodafone, we’re not just shaping the future of connectivity for our customers...
Location
Location
Türkiye , İstanbul
Salary
Salary:
Not provided
vodafone.com Logo
Vodafone
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Electrical/Mechanical Engineer
  • Good Knowledge and experience on Energy infrastructure and power systems, high-capacity Air Conditioners systems, cabling and layout planning of Technology Center
  • Customer oriented, result oriented, emphatic, good relationship with customers, customer satisfaction oriented
  • Team work, innovative, adaptable, Flexible working (night-weekends-7/24 shift)
  • Min 3 years similar experience
  • Very Good English (at least Upper intermediate level)
  • Mainly GSM/IT environment experience
  • Experience at Telecom Operator or Mission Critical building
  • Candidate home close to Esenyurt Technology Center
Job Responsibility
Job Responsibility
  • Keep Esenyurt Technology Center up and running 7/24 basis
  • Ensure the 7/24 continuity of Esenyurt Technology Center operation
  • Conclude internal and external customer requests in a timely and complete manner within the framework of roles and responsibilities
  • Operate Esenyurt Technology Center in an energy efficient and sustainable manner
  • To apply Technical Infrastructure Maintenance and Operations Processes and defined actions via operation maintenance Subcontractor according to SLA KPIs. (L2 level operation)
  • To apply layout allocations for Network/IT Equipment at Technology Center
  • To make a periodical system checks and test of all infrastructure equipment, support independent audits
  • To fulfil commissioning and integration of new energy systems and cooling equipment
  • To update & report electrical infrastructure layouts and drawings of Technology Center
  • To warn and inform planning team in case of any capacity shortage and performance degradation at infrastructure equipment in details
  • Fulltime
Read More
Arrow Right

Data Center Expert

At Vodafone, we’re not just shaping the future of connectivity for our customers...
Location
Location
Türkiye , Adana
Salary
Salary:
Not provided
vodafone.com Logo
Vodafone
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Electrical/Mechanical Engineer
  • Good Knowledge and experience on Energy infrastructure and power systems, high-capacity Air Conditioners systems, cabling and layout planning of Technology Center
  • Customer oriented, result oriented, emphatic, good relationship with customers, customer satisfaction oriented
  • Team work, innovative, adaptable, Flexible working (night-weekends-7/24 shift)
  • Min 5 years similar experience
  • Very Good English (at least Upper intermediate level)
  • Mainly GSM/IT environment experience
  • Experience at Telecom Operator or Mission Critical building
  • Candidate home close to Adana Technology Center
  • Flexibility
Job Responsibility
Job Responsibility
  • Keep Adana Technology Center up and running 7/24 basis, and ensure continuity of operation
  • Conclude internal and external customer requests in a timely and complete manner within the framework of roles and responsibilities
  • Operate Adana Technology Center in an energy efficient and sustainable manner
  • Apply Technical Infrastructure Maintenance and Operations Processes and defined actions via operation maintenance Subcontractor according to SLA KPIs. (L2 level operation)
  • Apply layout allocations for Network/IT Equipment at Technology Center
  • Make periodical system checks and test of all infrastructure equipment, support independent audits
  • Fulfil commissioning and integration of new energy systems and cooling equipment
  • Update & report electrical infrastructure layouts and drawings of Technology Center
  • Warn and inform planning team in case of any capacity shortage and performance degradation at infrastructure equipment in details
  • Follow up daily status of all infrastructure equipment in relation with Telecom and IT equipment
  • Fulltime
Read More
Arrow Right

Associate Research Scientist (Applied AI)

In the context of the International Consortium for Scientific Computing (ICOMP),...
Location
Location
Italy , Trieste
Salary
Salary:
72747.00 EUR / Year
unesco.org Logo
UNESCO
Expiration Date
May 27, 2026
Flip Icon
Requirements
Requirements
  • Advanced university degree (PhD or equivalent) in Physics, Computer Sciences, Mathematics, or related disciplines
  • At least 2 years of relevant professional experience in Applied AI and advanced parallel programming, preferably in an international environment, and familiarity with large-scale computational infrastructures
  • Ability to work quickly and efficiently under pressure, with minimum supervision, and to sustain periods with workload peaks
  • Discretion and capacity to deal efficiently and tactfully with visitors and staff members of different nationalities and cultural backgrounds
  • Excellent Innovational and Technological Awareness skills
  • Documented record of excellence in research, teaching, and organization of advanced training events
  • Strong analytical and organizational skills
  • demonstrated ability to work effectively in a multidisciplinary and multicultural environment
  • Excellent communication skills
  • Excellent knowledge (spoken and written) of English
Job Responsibility
Job Responsibility
  • Undertake front line research in Applied AI
  • Contribute to the development and implementation of research projects related to Applied AI, including applying to and managing individual research grants
  • Provide guidance and mentoring to students and post-doctoral fellows and guidance to any Associates, visitors or fellows that collaborate on these research projects and serve as their scientific supervisor as needed
  • Report on the results of the research at major international meetings
  • Establish collaborations with scientists from developing countries and identify their scientific needs in areas related to Applied AI
  • Formulate appropriate recommendations for incorporation within ICTP programmes
  • Organize and contribute to ICTP seminars, schools, workshops, conferences in Applied AI or in closely related fields, either at the ICTP or in developing countries
  • Provide technical expertise to the MHPC Scientific Council, coordinate the MHPC curricula of all courses related to applied AI
  • Assist the ICTP scientists and the ICTP scientific community on issues related to Applied AI and efficient deployment of AI workflows
  • Collaborate with ICTP scientists in the preparation of research grant proposals, providing the relevant input with respect to large scale data analysis and machine learning methods
What we offer
What we offer
  • 30 days annual leave
  • family allowance
  • medical insurance
  • pension plan
  • Fulltime
!
Read More
Arrow Right

Cloud Architect Lead II

We are looking for an experienced Cloud Architect Lead II to strengthen cloud se...
Location
Location
United States , Providence
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Information Security, or a related discipline, or equivalent experience
  • At least 10 years of cybersecurity and cloud experience supporting the configuration, protection, and operation of cloud applications and infrastructure
  • Strong hands-on background in Azure scripting and cloud automation with a coding-focused approach to security engineering
  • Demonstrated experience designing secure cloud environments and conducting assessments against recognized compliance and security frameworks
  • Knowledge of Microsoft cloud security technologies, including Microsoft 365 services, Microsoft Defender for Cloud, and identity or endpoint management tools
  • Practical expertise in cloud access governance, privileged account management, and security monitoring within enterprise environments
  • Experience working with container security controls and modern cloud security practices such as secure configuration management and incident response
  • Strong communication, planning, and leadership abilities with the capacity to collaborate across technical teams and support entry-level staff
Job Responsibility
Job Responsibility
  • Build and refine secure cloud architecture solutions across enterprise platforms with an emphasis on resilient design, strong configuration standards, and scalable security controls
  • Develop and maintain Azure-based automation and scripting solutions that improve cloud security operations, enforcement, and response capabilities
  • Perform detailed security reviews of cloud infrastructure, applications, and platform components, then translate findings into practical remediation plans
  • Partner with engineering, development, and DevOps teams to embed secure deployment and configuration practices throughout the cloud lifecycle
  • Direct cloud-focused incident response activities, support readiness exercises, and help strengthen response procedures for emerging threats
  • Oversee access control and privileged account practices for cloud resources to align with established security frameworks and industry guidance
  • Advise internal stakeholders on cloud security standards, governance expectations, and architectural decisions that reduce organizational risk
  • Support container and platform security initiatives through hands-on administration, monitoring, and control implementation
  • Identify opportunities to improve cloud resource efficiency while balancing performance, security, and cost considerations
  • Mentor less experienced team members by sharing technical guidance, reviewing work, and encouraging sound security engineering practices
What we offer
What we offer
  • Medical insurance
  • Vision insurance
  • Dental insurance
  • Life and disability insurance
  • 401(k) plan
  • Fulltime
Read More
Arrow Right

Staff Software Engineer

This SaaS product connects millions of JVM runtimes, collects and aggregates det...
Location
Location
Serbia , Belgrade
Salary
Salary:
Not provided
azul.com Logo
Azul Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in Java/Kotlin covering technical architecture, algorithms, design, network management, application development, middleware, AWS/GCP, RDBMS, NoSQL, messaging
  • 5+ years of experience in one or more of the following areas: scalable distributed systems, cloud optimizations and costs, monitoring and alerting, reliable and fault-tolerant systems with performance in mind
  • Experience as an architect or technical lead with customer-facing large-scale products
  • Passionate about simplicity and efficiency, hate for complexity
  • Strong technical problem-solver
  • Positive, enjoys collaborating and communicating with others
  • Experienced in communicating and working across functions to drive solutions
  • Holds BS/MS degree in Computer Science, Engineering, Mathematics or a related field or equivalent experience
Job Responsibility
Job Responsibility
  • Implement new features, fix issues and perform code reviews in Java
  • Participate in designs and architecture decisions
  • Provide unique insights into cloud architecture
  • Translation of complex functional, technical, and business requirements into designs
  • Understanding risk-driven/spiral development approach and enforcing proofs-of-concept and prototypes to validate and compare design alternatives
  • Performing cost/benefit and trade-off analyses of design alternatives
  • Defining high-level development tasks, providing estimates, and identifying skills necessary for implementation
  • Recommending strategies for SaaS monitoring, performance improvements, and capacity planning
  • Being a charismatic team player with exceptional collaboration and communication skills
  • Driving the team's goals & technical direction to pursue opportunities that make the larger organization more efficient
What we offer
What we offer
  • Equity Program
  • Annual bonus based on company performance
  • Referral Program
  • IT Equipment - MacBook Pro or any other HW according to your preferences
  • Work-life balance - 5 weeks of holidays, 5 sick days, flexible working hours, 100% work from home also possible
  • Offices in Belgrade City Centre - if you prefer
  • Work with top experts worldwide who contribute to the Java ecosystem
  • Fulltime
Read More
Arrow Right

Staff Software Engineer

This SaaS product connects millions of JVM runtimes, collects and aggregates det...
Location
Location
Czech Republic , Prague
Salary
Salary:
Not provided
azul.com Logo
Azul Systems
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience in Java/Kotlin covering technical architecture, algorithms, design, network management, application development, middleware, AWS/GCP, RDBMS, NoSQL, messaging
  • 5+ years of experience in one or more of the following areas: scalable distributed systems, cloud optimizations and costs, monitoring and alerting, reliable and fault-tolerant systems with performance in mind
  • Experience as an architect or technical lead with customer-facing large-scale products
  • Passionate about simplicity and efficiency, hate for complexity
  • Strong technical problem-solver
  • Positive, enjoys collaborating and communicating with others
  • Experienced in communicating and working across functions to drive solutions
  • Holds BS/MS degree in Computer Science, Engineering, Mathematics or a related field or equivalent experience
Job Responsibility
Job Responsibility
  • Implement new features, fix issues and perform code reviews in Java
  • Participate in designs and architecture decisions
  • Provide unique insights into cloud architecture
  • Translation of complex functional, technical, and business requirements into designs
  • Understanding risk-driven/spiral development approach and enforcing proofs-of-concept and prototypes to validate and compare design alternatives
  • Performing cost/benefit and trade-off analyses of design alternatives
  • Defining high-level development tasks, providing estimates, and identifying skills necessary for implementation
  • Recommending strategies for SaaS monitoring, performance improvements, and capacity planning
  • Being a charismatic team player with exceptional collaboration and communication skills
  • Driving the team's goals & technical direction to pursue opportunities that make the larger organization more efficient
What we offer
What we offer
  • Equity Program
  • Annual bonus based on company performance
  • Referral Program
  • IT Equipment - MacBook Pro or any other HW according to your preferences
  • Work-life balance - 5 weeks of holidays, 5 sick days, flexible working hours, 100% work from home also possible
  • Offices in Prague City Centre - if you prefer
  • Work with top experts worldwide who contribute to the Java ecosystem
  • Fulltime
Read More
Arrow Right