CrawlJobs Logo

AI/HPC System Performance Engineer

meta.com Logo

Meta

Location Icon

Location:
United States , Austin

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

219000.00 - 301000.00 USD / Year

Job Description:

Meta's AI Training and Inference Infrastructure is growing exponentially to support ever increasing use cases of AI. This results in a dramatic scaling challenge that our engineers have to deal with on a daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like GPUs together. In addition, we need to ensure that the network is running smoothly and meets stringent performance and availability requirements of RDMA workloads. These workloads expect a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look for opportunities across stack: network fabric and host networking, communications lib and scheduling infrastructure.

Job Responsibility:

  • Lead multi-disciplinary teams to develop solutions for large scale training systems. Assess trade-offs of various solutions and make pragmatic decisions
  • Ensure timely milestone delivery with teamwork and close collaboration
  • Responsible for the overall performance of the communication system, including performance benchmarking, monitoring and troubleshooting production issues
  • Defining technical vision and driving a multi-year roadmap to make progress towards the related objectives
  • Work with cross functional teams and provide guidance on the AI network architecture including topologies, transport, congestion control techniques

Requirements:

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Experience with developing, evaluating and debugging host networking protocols such as RDMA
  • 10+ years of experience in designing, deploying and operating networks
  • Experience with triaging performance issues in complex scale-out distributed applications

Nice to have:

  • Experience with developing communication libraries, such as Message Passing Interface, NCCL, and UCX
  • Understanding of AI training workloads and demands they exert on networks
  • Understanding of RDMA congestion control mechanisms on InfiniBand and RoCE Networks
  • Understanding of the latest artificial intelligence (AI) technologies
  • Experience with machine learning frameworks such as PyTorch and TensorFlow
  • Experience in developing systems software in languages like C++
What we offer:
  • bonus
  • equity
  • benefits

Additional Information:

Job Posted:
January 23, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI/HPC System Performance Engineer

Sr AI/HPC Applications and Performance Engineer

Sr AI/HPC Applications and Performance Engineer role at Hewlett Packard Enterpri...
Location
Location
United States
Salary
Salary:
161500.00 - 370500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years' experience
  • Deep expertise in AI and HPC applications and performance engineering including simulation, modeling and emulation capabilities
  • Expertise in large-scale AI and HPC systems
  • Experience architecting, designing, and developing innovative software system design tools and languages
  • Excellent analytical and problem-solving skills
  • Experience in leading overall architecture of software systems for products and solutions
  • Designing and integrating efficient and scalable software systems running on multiple platform types into overall architecture
  • Evaluating and selecting forms and processes for software systems testing and methodology
  • History of innovation with multiple patents or deployed solutions in the field of software design
  • Excellent written and verbal communication skills
Job Responsibility
Job Responsibility
  • Develops organization-wide architectures, strategies, and methodologies for software systems design and development across multiple platforms and organizations
  • Identifies and makes informed recommendations regarding new technologies, innovations, and outsourced development partner relationships
  • Reviews, evaluates, and influences designs and project activities for compliance with development guidelines and standards
  • Provides tangible solutions that improve product quality and mitigate failure risk
  • Contributes to domain expertise, business acumen, and experience to influence decisions of executive business leadership
  • Brings creativity and innovation to the organization
  • Provides guidance and mentoring to less-experienced team members
  • Acts as an internal authority on software systems design
  • Contributes to the external technical community through whitepapers, patents, or other significant innovations
What we offer
What we offer
  • Health & Wellbeing benefits
  • Personal & Professional Development programs
  • Unconditional Inclusion environment
  • Comprehensive benefits suite supporting physical, financial and emotional wellbeing
  • Fulltime
Read More
Arrow Right
New

Software Engineer - AI/HPC Specialist

We are looking for software engineers to help scale and improve the efficiency o...
Location
Location
Norway , Oslo
Salary
Salary:
Not provided
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience developing in C++/C and Python
  • Experience with High Performance Computing/Networking or AI systems applications frameworks
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Specialized experience in one or more of the following machine learning/deep learning domains: Hardware accelerators, AI Infrastructure, or high performance networking
  • Solid experience in debugging of distributed systems, revision control systems, testing, and CI pipelines
Job Responsibility
Job Responsibility
  • Work on collective communications stacks to optimise networking operations, leading to improved AI inference and training model performance
  • Drive implementation of latency and bandwidth critical networking operations, as well as out-of-band signalling
  • Debug custom and third party multi-host, accelerator enabled AI platforms
  • Software development using C++/C and Python
  • Work closely with other teams to deliver impact
  • develop & improve features and innovations
  • Extend and optimize large scale learning collective operations
Read More
Arrow Right

AI Research Lab Research Associate

We are currently seeking highly qualified interns to accelerate research towards...
Location
Location
United States , Milpitas
Salary
Salary:
43.27 - 93.15 USD / Hour
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
May 26, 2026
Flip Icon
Requirements
Requirements
  • Pursuing PhD degree (or other degree with significant research and innovation experience) in a relevant discipline (e.g. machine learning, computer science, electrical engineering, math, statistics, etc.)
  • Track record of world-class innovative contributions and ideas in machine learning
  • Experience with innovative solution development, such as developing proofs-of-concept, first-of-a-kind solutions, and/or technology transfer
  • Experience in deep learning research
  • Experience in developing deep learning software with high proficiency in data structures and algorithms
  • Strong programming skills and experience with Python, C/C++, and preferably Java
  • Software development experience in Deep Learning, GPU acceleration, and Model Optimization
  • Experience in Deep Learning and Machine Learning frameworks and models like Tensorflow, PyTorch
  • Experience in Transformer Neural Network architectures for Generative AI and natural language processing
  • Experience with Agentic AI and Generative AI workflows - desired
Job Responsibility
Job Responsibility
  • Conduct research and come up with solutions with a fast turnaround time
  • Build the software and applications for Neural Networks and Machine Learning
  • Work with system programming, Deep Learning frameworks and models, GPU acceleration, Model optimization, real-time streaming data, distributed computing, and deployment
  • Provide thought leadership and technical influence both internally and externally to HPE
  • Collaborate with HPE Labs research teams as well as external partners
  • Work in alignment with HPE's broader innovation community.
What we offer
What we offer
  • Health & Wellbeing benefits including physical, financial and emotional wellbeing support
  • Personal and professional development programs
  • Unconditional inclusion and flexibility to manage work and personal needs.
  • Fulltime
Read More
Arrow Right

Senior Solutions Architect - Data Infrastructure

NetApp is the intelligent data infrastructure company, turning a world of disrup...
Location
Location
United States
Salary
Salary:
205700.00 - 266200.00 USD / Year
netapp.com Logo
NetApp
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years in solution architecture, systems engineering, or enterprise pre-sales for storage or data infrastructure platforms, with a strong track record of driving technical wins and customer outcomes
  • Executive Presence & Communication. Exceptional presentation, storytelling, and whiteboarding skills, with the ability to lead technical workshops and executive briefings
  • Technical Depth. Expertise across NFS, SMB, iSCSI, FC, NVMe, and S3
  • experience with virtualization and container platforms (e.g., VMware, Kubernetes)
  • and strong understanding of security, cyber resilience, and AI-adjacent technologies
  • Hybrid Cloud Knowledge. Practical experience with hyperscaler file and object services, data mobility, and replication strategies
  • Solution Design Skills. Comfortable producing reference architectures and integration plans spanning compute, networking, and storage
Job Responsibility
Job Responsibility
  • Own Technical Win Plans. Partner with enterprise sales and field leadership on priority opportunities. Lead discovery, shape solution strategy, differentiate competitively, and drive the technical win for large, complex deals
  • Design End-to-End Architectures. Create scalable, resilient, and future-ready architectures across on-prem, cloud-adjacent, and public cloud environments, aligned to customer requirements for performance, availability, security, and total cost of ownership
  • Act as a Portfolio Evangelist. Represent NetApp’s full data infrastructure vision to customers, partners, and internal stakeholders, connecting portfolio capabilities to real-world customer outcomes
  • Build Trusted Executive Relationships. Develop and sustain deep relationships with customer technical and business leaders, partners, and alliances. Drive engagement across executive, architecture, and engineering communities
  • Generate Pipeline with Marketing. Lead webinars, workshops, and Executive Briefing Center sessions
  • contribute to blogs and video content
  • present at NetApp INSIGHT
  • and support regional demand-generation events to open new workloads and buying centers
  • Mentor and Upskill the Field. Coach Solutions Engineers and partner technical teams on solution domains, reference architectures, and repeatable best practices
  • Stay Ahead of the Market. Track industry trends, competitive dynamics, and portfolio evolution to provide timely guidance to customers, sales leadership, and field teams
What we offer
What we offer
  • Volunteer time off
  • 40 hours of paid volunteer time each year
  • Well-being
  • Employee Assistance Program, fitness, and mental health resources to help employees be their best
  • Time away
  • Paid time off for vacation and to recharge
  • Health Insurance
  • Life Insurance
  • Retirement or Pension Plans
  • Paid Time Off
  • Fulltime
Read More
Arrow Right
New

English instructor

English teaching job in Boulogne-Billancourt (92100), France. Work with 2 childr...
Location
Location
France , Boulogne-Billancourt
Salary
Salary:
13.39 EUR / Hour
job-in-france.babylangues.com Logo
Babylangues
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Energetic and creative personality
  • Some childcare experience (babysitting, tutoring, etc.)
  • Native or strong proficiency in the language
What we offer
What we offer
  • Paid leave: 1.34€ /h
  • Parttime
Read More
Arrow Right
New

Planner II, Raw Material

Plan raw material inventories in accordance with defined policies to support the...
Location
Location
Costa Rica , Cartago
Salary
Salary:
Not provided
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in business administration, Industrial Engineering or related career
  • More than 2 years of experience in similar positions
  • English B2 or higher
  • Inventory management
  • production order flow
  • part numbers
  • GME and suppliers End to End of the product
  • Logistics: customs knowledge, routing, Baxter transportation function
  • Financial knowledge
  • Lean Six Sigma
Job Responsibility
Job Responsibility
  • Control and administration of inventories
  • Ensure adequate control of activities that may affect the quality of customer service
  • Generate savings generation projects
  • Coordination of activities with support areas (logistics, warehouse, income, manufacturing, etc.)
  • Generation of reports for analysis of results and to support decision-making
  • Support regulatory processes and various projects of the departments that require it
  • Attend in a timely manner to the different conflicts that may arise in the process of purchasing and receipt of material (rejections, damaged material, losses, unpaid invoices, etc.)
  • Fulltime
Read More
Arrow Right
New

Hr Director

We are seeking an experienced HR Director to partner with technical leaders at F...
Location
Location
United States , Menlo Park
Salary
Salary:
230000.00 - 293000.00 USD / Year
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years of experience operating in an HRBP/business-facing role
  • 15+ years of experience as a Human Resource Leader working at both a strategic and executional level
  • Operator and track record of scaling global teams
  • Experience as a people manager and organizational leader, with business acumen and experience understanding strategic organizational issues
  • Experience operating in a matrix organization, and cultivating relationships globally
Job Responsibility
Job Responsibility
  • Manage a team of Human Resource Business Partners
  • Partner with business and functional HR leaders globally to develop and lead effective people strategies and programs that enable Meta Facebook to scale effectively globally
  • Lead the delivery of the Company’s people practices including talent assessment and planning, organization design and development, performance management, and leadership development, and contribute as a global HR leader to the ongoing development of these practices
  • Use data insights to make evidence-based people decisions for attraction, performance, and engagement
  • Provide direct support and coaching to the Meta Facebook leadership team, and work closely with each of them and their respective teams as a trusted partner, bringing insight and advice relating to people, teams and the development of their organizations as they scale
  • Together with business leaders create an environment which is open and connected and which expects the highest standard of behavior and ethical conduct throughout our company
  • Provide leadership in development, execution and facilitation of employee relations efforts
  • Lead an HR Business Partner team, developing market leading HR capabilities and careers by creating an environment that stimulates creativity and supports their ongoing development
  • Provide global leadership to deliver against the people needs of the company
  • Work closely on critical cross-functional initiatives with cross-functional leaders of teams
What we offer
What we offer
  • bonus
  • equity
  • benefits
Read More
Arrow Right
New

Child Outpatient Clinician

Apply your clinical skills and creativity as part of a dedicated, multi-discipli...
Location
Location
United States , Marshfield; Quincy
Salary
Salary:
31.25 - 50.48 USD / Hour
aspirehealthalliance.org Logo
Aspire Health Alliance
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Independent licensure is strongly preferred
  • A Master's Degree in Social Work, Mental Health Counseling, or Marriage and Family Therapy
  • Previous experience working with children and families in an outpatient setting
  • Experience with Electronic Health Records is preferred
  • CANS certification is required and can be obtained through online training
  • Strong organizational and time management skills
  • Proficiency in computer skills and the ability to complete clinical documentation in an electronic health record
Job Responsibility
Job Responsibility
  • Develop and manage a diverse caseload of individual and group clients
  • Provide evidence-based therapies to children and youth
  • Actively collaborate within a multidisciplinary team to enhance client care
What we offer
What we offer
  • Competitive package of compensation and benefits
  • Monthly training sessions and various educational opportunities
  • Work environment designed to support an optimal work/life balance
  • Fulltime
Read More
Arrow Right