Infrastructure Engineer (Performance Optimization) Job at Randstad (London)

Performance Infrastructure Engineer- Data Center GPU

You will be part of a small, but dedicated team driving discrete GPU products’ p...

Location

United States , Santa Clara

Salary:

192000.00 - 288000.00 USD / Year

AMD

Expiration Date

Until further notice

Requirements

Strong development experience in Python and/or Bash (or equivalent scripting languages)
Experience with Github, Jenkins, or similar CI/CD and code review systems
Linux system administration experience preferred
Experience developing automated test infrastructure and orchestrating multisystem workflows is preferred
Ansible experience is a bonus
Strong analytical, problem solving, and debugging skills
Excellent communication skills
must be a critical thinker and self-starter
Ability to quickly learn and apply new tools, technologies, and frameworks
Networking experience preferred, including common protocols and basic debugging

Job Responsibility

Technical team lead for a team of 5-6 engineers
Assess and understand the current automation and performance analysis infrastructure, identifying strengths, gaps, and opportunities for improvement
Collaborate with internal teams to gather technical requirements and understand evolving needs
Develop a forward looking plan that balances reusing existing systems with building new infrastructure where appropriate
Design, develop, and maintain automation and performance analysis tooling using Python, Bash, Make, and related technologies
Build and enhance workflow automation solutions using internally developed tools to orchestrate ML workloads
Develop new techniques and tooling to optimize ML workload execution, profiling, and analysis at scale

Performance & Capacity Engineer - Planning Optimization

Meta is seeking a Performance & Capacity Engineer to join the Capacity Engineeri...

Location

United States , Bellevue

Salary:

154000.00 - 217000.00 USD / Year

Meta

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
6+ years experience in any coding language and designing software systems
4+ years experience in capacity, performance, software, or reliability engineering
Proven experience to manage ambiguity, experience frequently learning new technical and business concepts

Job Responsibility

Own both technical as well as business outcomes for capacity planning for all of Meta: all software products/services and plans for how to scale server and data center resources most efficiently
Use the tools you build to own the business outcomes: develop and analyze variety of business and technical scenarios to drive the highest levels of executive decision making around infrastructure/product, up to the CxO level
Partner across the engineering technical landscape to optimize at the intersection of hardware, infrastructure, and software. Work closely with software service owners, Production Engineering, Server Hardware Engineering, Server Supply Chain, Network Engineering, Data Center Design, Operations, and Planning teams to find the most optimal ways to scale our infrastructure and place our services
Design and help build software systems to build scalable, reliable planning systems to connect business strategy with detailed technical execution including regional and temporal bin-packing, optimal service placement, traffic shifts and service migrations, efficient hardware refresh, etc
Partner with Finance to balance cost efficiency with technical and product considerations
Greenfield work: Work cross-functionally to define problem statements, collect data, build analytical models and make recommendations to drive change and optimization at the most strategic levels
A lot of other cool work: Identify capacity-related issues proactively and work across technical and business teams to define and implement solutions

What we offer

bonus
equity
benefits

Fulltime

Performance & Capacity Engineer - Planning Optimization

Meta is seeking a Performance & Capacity Engineer to join the Capacity Engineeri...

Location

United States , Bellevue

Salary:

184000.00 - 257000.00 USD / Year

Meta

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
8+ years experience in any coding language and designing software systems
8+ years experience in capacity, performance, software, or reliability engineering
Proven experience to manage ambiguity, experience to frequently learn new technical and business concepts

Job Responsibility

Own both technical as well as business outcomes for capacity planning for all of Meta: all software products/services and plans for how to scale server and data center resources most efficiently
Build automated, scalable data and analytics solutions by developing state-of-the-art automation, mathematical optimization, and/or AI models using Meta’s unparalleled data infrastructure
Use the tools you build to own the business outcomes: develop and analyze variety of business and technical scenarios to drive the highest levels of executive decision making around infrastructure/product, up to the CxO level
Design and help build software systems to build scalable, reliable planning systems to connect business strategy with detailed technical execution including regional and temporal bin-packing, optimal service placement, traffic shifts and service migrations, efficient hardware refresh, etc
Partner across the engineering technical landscape to optimize at the intersection of hardware, infrastructure, and software. Work closely with software service owners, Production Engineering, Server Hardware Engineering, Server Supply Chain, Network Engineering, Data Center Design, Operations, and Planning teams to find the most optimal ways to scale our infrastructure and place our services
Partner with Finance to balance cost efficiency with technical and product considerations
Greenfield work: Work cross-functionally to define problem statements, collect data, build analytical models and make recommendations to drive change and optimization at the most strategic levels
A lot of other cool work: Identify capacity-related issues proactively and work across technical and business teams to define and implement solutions

What we offer

bonus
equity
benefits

Fulltime

Performance & Capacity Engineer - Planning Optimization

Meta is seeking a Performance & Capacity Engineer to join the Capacity Engineeri...

Location

United States , Bellevue

Salary:

117000.00 - 181000.00 USD / Year

Meta

Expiration Date

Until further notice

Requirements

Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
Minimum 4 years of experience working with distributed systems at scale
Proficient in any coding language and designing software systems
Desire to learn about capacity planning and optimization
Experience managing ambiguity. Experience learning and applying new business and technical concepts

Job Responsibility

Own both technical as well as business outcomes for capacity planning for all of Meta: all software products/services and plans for how to scale server and data center resources most efficiently
Build automated, scalable data and analytics solutions by developing state-of-the-art automation, mathematical optimization, and/or AI models using Meta’s unparalleled data infrastructure
Use the tools you build to own the business outcomes: develop and analyze variety of business and technical scenarios to drive the highest levels of executive decision making around infrastructure/product, up to the CxO level
Partner across the engineering technical landscape to optimize at the intersection of hardware, infrastructure, and software. Work closely with software service owners, Production Engineering, Server Hardware Engineering, Server Supply Chain, Network Engineering, Data Center Design, Operations, and Planning teams to find the most optimal ways to scale our infrastructure and place our services
Partner with Finance to balance cost efficiency with technical and product considerations
Greenfield work: Work cross-functionally to define problem statements, collect data, build analytical models and make recommendations to drive change and optimization at the most strategic levels
A lot of other cool work: Identify capacity-related issues proactively and work across technical and business teams to define and implement solutions

What we offer

bonus
equity
benefits

Fulltime

Performance & Capacity Engineer - Capacity Planning Optimization

Meta is seeking a Performance & Capacity Engineer to join the Capacity Planning ...

Location

United States , Menlo Park

Salary:

219000.00 - 301000.00 USD / Year

Meta

Expiration Date

Until further notice

Requirements

10+ years of experience in Performance, Capacity, or software engineering
Proficient in Python, C++, or other coding languages and designing large scale software systems
Demonstrated success leading large engineering projects and initiatives. Defining goals, managing ambiguity, inspiring and leading other engineers and non-technical contributors
Experience with large-scale technical infrastructure and distributed systems

Job Responsibility

Own infrastructure capacity planning for all of Meta: all software products/services and plans for how to scale server and data center resources most efficiently
Partner across the engineering technical landscape to optimize at the intersection of hardware, infrastructure, and software. Work closely with software service owners, Production Engineering, Server Hardware Engineering, Server Supply Chain, Network Engineering, Data Center Design, Operations, and Planning teams to find the most optimal ways to scale our infrastructure and place our services
Design and help build software systems to build scalable, reliable planning systems to connect business strategy with detailed technical execution including regional and temporal bin-packing, optimal service placement, traffic shifts and service migrations, efficient hardware refresh, etc
Effectively lead large engineering efforts while implementing the most complex parts of the system and process design yourself
Partner with Finance and business teams to balance cost efficiency with technical and product considerations
Work cross-functionally to define problem statements, collect data, build software driven models and make recommendations to drive change and optimization at the most strategic levels

What we offer

bonus
equity
benefits

Fulltime

Research Engineer / Software Engineer (platform/core infrastructure)

Build the future of offensive security with XBOW. Attackers are already using AI...

Location

United States

Salary:

150000.00 - 350000.00 USD / Year

Xbow

Expiration Date

Until further notice

Requirements

Strong experience building and operating scalable, distributed systems on cloud infrastructure such as AWS or similar
Comfortable working with infrastructure as code (e.g., Terraform, CDK)
A track record of performance tuning across cloud services, databases, and compute layers
Eager to learn new tools, languages, and technologies as needed
A thoughtful communicator who values clarity and simplicity and is comfortable working in a fast-paced startup and navigating ambiguity
Strong problem-solving skills and the ability to work with incomplete information
Curious, practical, and eager to work across layers of the stack when needed
You think proactively about failure modes and bring experience implementing disaster recovery and business continuity plans that keep critical systems running

Job Responsibility

Design and implement infrastructure systems that scale reliably and securely, and can be deployed across multiple cloud environments (AWS, Azure, OCI etc.) and contexts (SaaS, on prem)
Tune and optimize cloud services across compute, storage, networking, and observability to drive performance, reliability and maintainability of core services
Develop our core services, written in TypeScript, Kotlin and Go
Support large-scale systems with event driven architectures
Own problems end-to-end—from design through deployment to production support
Navigate ambiguity and help define how we build as much as what we build
Partner closely with other engineers, AI researchers and Security researchers to enable high-quality, high-velocity product development
Design for resilience by implementing disaster recovery and business continuity strategies that ensure uptime, even when things break
Improve how we build, deploy, and monitor services at scale

What we offer

Competitive salary and a generous equity package
Career Growth: Shape your role, lead the function, and grow with the company
Meaningful Work: You will tackle technically complex challenges and play a pivotal role in the growth of our business

Fulltime

New