CrawlJobs Logo

Senior Manager, Performance AI/ML Network Deployment Engineering

amd.com Logo

AMD

Location Icon

Location:
United States , Santa Clara

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

210400.00 - 315600.00 USD / Year

Job Description:

The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering is a leadership position designed to optimize the design, roll-out and post-rollout management of AI/ML Fabrics. The candidate will be the technical interface between the customers and various internal engineering groups, field application engineers Leveraging extensive experience in large network architecture, Storage, AI/ML network deployments, and performance tuning, this role requires a disciplined approach to system triage, at-scale debug, and infrastructure optimization to ensure robust performance and efficient transitions from GPU production qualification to at-scale datacenter deployment.

Job Responsibility:

  • Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models
  • Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability
  • Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads
  • Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations
  • Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins
  • Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences

Requirements:

  • Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements
  • Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains, namely compute, network, storage
  • Experience in working with large customers such as Cloud Service Providers and global enterprise customers
  • Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc
  • Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
  • Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
  • Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends
  • Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista experience is preferred
  • Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market
  • Excellent communication level from engineer to mid-management to C-level of audience
  • Bachelors, master's in computer science, Engineering or related subjects of experience
  • This is a Senior level role
  • no recent college graduates will be considered
  • Ability to work well in a geographically dispersed team
  • Certifications in Networking, AI/ML, or Cloud Technologies

Additional Information:

Job Posted:
December 17, 2025

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Manager, Performance AI/ML Network Deployment Engineering

Senior DevOps Engineer (GCP)

Our client is a global UK-based financial services and investment banking organi...
Location
Location
Salary
Salary:
Not provided
n-ix.com Logo
N-iX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, Cloud Engineering, or SRE roles
  • Strong hands-on experience with Google Cloud Platform, including: GKE / Kubernetes, Cloud Run, Cloud Functions, Pub/Sub, Cloud Storage, VPC, IAM, networking, security
  • Expertise in Terraform, Helm, or other IaC tools
  • Experience building CI/CD pipelines (GitHub Actions, GitLab CI, CircleCI, Jenkins, etc.)
  • Strong understanding of containerization and orchestration: Docker, Kubernetes
  • Solid experience with monitoring, observability, and logging stacks
  • Familiarity with networking, load balancing, security hardening, and zero-trust principles
  • Experience supporting production systems in high-availability, distributed environments
  • Strong scripting skills (Python, Bash, or similar)
  • Experience working with agile engineering teams
Job Responsibility
Job Responsibility
  • Design, implement, and maintain cloud infrastructure on Google Cloud (GKE, Cloud Run, Cloud Functions, Pub/Sub, Cloud Storage)
  • Build and optimize CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or similar)
  • Develop infrastructure-as-code using Terraform or similar tools
  • Set up and maintain container orchestration (Kubernetes, GKE) and automated deployment workflows
  • Implement monitoring, alerting, and observability using tools such as Prometheus, Grafana, ELK/Elastic, Stackdriver, or OpenTelemetry
  • Ensure compliance with security and governance standards across all environments
  • Collaborate closely with engineering teams to ensure scalable, high-performance deployment architectures
  • Support AI/ML and GenAI workloads (Vertex AI pipelines, model hosting, GPU workloads, inference optimization)
  • Manage environment strategies, release pipelines, configuration management, and secrets management
  • Optimize cloud costs and recommend improvements for performance and reliability
What we offer
What we offer
  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits
Read More
Arrow Right

Senior Devops & AI Engineer

This role presents a unique opportunity to contribute to the future of impactful...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
fissionlabs.com Logo
Fission Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field
  • 6+ years of experience in Infrastructure Mgmt. roles, with a focus on cloud platforms (Azure and AWS Preferred)
  • Hands-on experience with operations (DevSecOps) principles and best practices
  • Proficiency in scripting languages such as Python, PowerShell, or Bash
  • Excellent communication and collaboration skills
  • In-depth knowledge of Linux operating systems, including CentOS, Ubuntu, and Red Hat, with expertise in shell scripting, package management, and system administration
  • Hands-on experience with a wide range of AWS and Azure services
  • Develop and maintain Infrastructure as Code (IAC) templates using tools such as Terraform or AWS CloudFormation
  • Experience setting up cloud infrastructure stack, databases, service endpoints, GPU as well as CPU resource scaling, optimization etc.
  • Should have worked AIOps/MLOP
Job Responsibility
Job Responsibility
  • Configure and optimize Linux-based servers for performance, security, and resource utilization, including kernel tuning, file system management, and network configuration
  • Architect cloud solutions leveraging best practices and services offered by AWS and Azure, optimizing for scalability, reliability, and cost-effectiveness
  • Implement and manage hybrid cloud environments, facilitating seamless integration and interoperability between AWS and Azure services
  • Establish version control practices for IAC templates, ensuring traceability, auditability, and reproducibility of infrastructure changes
What we offer
What we offer
  • Opportunity to work on impactful technical challenges with global reach
  • Vast opportunities for self-development, including online university access and knowledge sharing opportunities
  • Sponsored Tech Talks & Hackathons to foster innovation and learning
  • Generous benefits packages including health insurance, retirement benefits, flexible work hours, and more
  • Supportive work environment with forums to explore passions beyond work
  • Fulltime
Read More
Arrow Right

Engineering Director

We are seeking a seasoned Engineering Director who thrives in challenging and fa...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Significant work experience as a director or similar position working across multiple stakeholder organizations, with at least 10+ years of people leadership experience specific to SW and Cloud engineering
  • Solid experience leading SW development across storage, networking, on-prem, and SaaS is a must
  • Experience in setting up geographically distributed sites
  • Must have a strong background in software development lifecycle including cloud infrastructure
  • Familiarity with agile methodologies and tools like JIRA
  • Prior experience in cloud product development and deployments
  • end to end ownership and accountability
  • Solid understanding of fundamental AI and machine learning concepts, including supervised and unsupervised learning, deep learning, reinforcement learning, natural language processing, computer vision, and statistical modeling
  • Extensive business acumen, technical knowledge, and industry experience encompassing one or more engineering, technology, and product domains
  • Demonstrated abilities to drive transformation across a business with exceptional skills in the management of change
Job Responsibility
Job Responsibility
  • Oversee the Puerto Rico Site daily operations, strategic planning and cross-functional team leadership for Hybrid Cloud
  • Recruit, mentor, and manage teams of AI/ML engineers, QA Engineers, Design Engineers and innovation specialists to deliver cutting-edge solutions
  • Continuously evaluate new tools, platforms, and frameworks in AI/ML to drive competitive advantage and operational efficiency
  • Ensure alignment with corporate goals while fostering a high-performance culture, operational efficiency, and employee engagement
  • Lead the development and execution of AI/ML strategies that align with business goals and drive innovation across products, services, or operations
  • Create strategic and tactical operations and resource plans, goals, and priorities for assigned organization based on business and technology roadmap and functional objectives
  • Engage with various senior leaders across the organization, program managers, R&D, support, Quality, product managers, technical leaders and executives to communicate program status, escalate issues, and guide and influence strategic decision-making
  • Manage senior relationships and escalated issues with outsourced partners and suppliers, including setting expectations regarding deliverables, product quality, schedules, and costs
  • ensures that organization is effectively leveraging outsourced resources
  • Identify opportunities for and drive organizational initiatives and programs to support business process improvements and cost reductions
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Go (Next-Gen Firewall)

We are seeking a Senior Backend Engineer (Go) to join our core engineering team ...
Location
Location
Vietnam , Ho Chi Minh City
Salary
Salary:
Not provided
qualgo.net Logo
Qualgo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master’s degree in Computer Science, Cybersecurity, Network Engineering, or related field
  • Deep understanding of goroutines, channels, memory management, and profiling (pprof)
  • Strong grasp of the OSI model, TCP/IP, DNS, TLS/SSL, VPNs (WireGuard/IPsec), and Routing
  • Experience with Docker, Kubernetes, and deploying network appliances on AWS/GCP/Azure
  • Production experience with Kafka, RabbitMQ, or NATS
  • Good English skills (speaking and listening) to communicate with the global teams
  • Hands-on experience with Suricata, Snort, Zeek, or Squid Proxy
  • Familiarity with OPNsense or pfSense architecture is a huge plus
Job Responsibility
Job Responsibility
  • Design and implement high-performance Go services that interact with network subsystems (netfilter/nftables) and open-source security engines (Suricata, Squid, Zeek)
  • Design and implement routing functionalities on low resource gateway system
  • Develop custom plugins or sidecars to ingest, parse, and normalize IDS/IPS alerts (Suricata EVE logs) and Proxy logs for the AI engine
  • Build the "Action Engine" that translates AI threat verdicts into real-time blocking rules (firewall policies, BGP blackholing, or DNS sinkholing)
  • Deeply integrate with OPNsense APIs/plugins to orchestrate policy updates across distributed firewall nodes
  • Architect scalable gRPC and REST APIs to serve as the control plane for thousands of firewall agents
  • Write highly optimized, concurrent Go code to handle high-throughput log ingestion with minimal latency/GC overhead
  • Design distributed locking and consistency mechanisms to ensure firewall policies are synchronized globally across multi-tenant environments
  • Build low-latency pipelines using Kafka or NATS JetStream to stream network telemetry to our AI/ML inference engine
  • Implement WebSocket or HTTP/2 streaming for real-time threat visualization and alerting dashboards
What we offer
What we offer
  • Meaningful work & impact
  • Competitive rewards
  • Growth & well-being
  • People & workspace
  • Young & dynamic environment
Read More
Arrow Right

Senior Technical Program Manager, Infrastructure

Glean is seeking a Senior Infrastructure Technical Program Manager (TPM) to lead...
Location
Location
United States , Palo Alto
Salary
Salary:
198000.00 - 235500.00 USD / Year
glean.com Logo
Glean
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS/MS in Computer Science, Engineering, or a related technical field
  • 8-10+ years of experience in technical program management, infrastructure, or SRE, with at least 3-5 years managing infra or platform-scale programs
  • Proven success delivering cross-functional infrastructure programs in B2B or enterprise environments where scalability, uptime, and performance are critical
  • Experience working closely with Infra, SRE, and ML/AI teams on distributed systems or data infrastructure
  • Strong understanding of cloud infrastructure (AWS, GCP, or Azure) including compute, networking, storage, and orchestration systems
  • Ability to structure complex multi-quarter infrastructure programs with clear milestones and measurable impact
  • Strong written and verbal communication and ability to manage through ambiguity, anticipate scaling challenges, and align teams across priorities
  • Builder mindset with focus on automation, reliability, and efficiency
Job Responsibility
Job Responsibility
  • Lead end-to-end infra programs spanning compute, networking, storage, orchestration, and AI workloads
  • Partner with Engineering to define standards for environment provisioning, deployment automation, and configuration governance
  • Develop and operationalize frameworks for runtime health, scaling, and disaster recovery
  • Drive consistency and automation across deployment orchestration systems
  • Establish clear metrics for reliability, performance, and cost efficiency
  • Coordinate cross-team delivery of high-impact programs such as data pipeline scalability, LLM infrastructure expansion, or infra observability improvements
  • Communicate program status and technical risks effectively to leadership and stakeholders
  • Continuously identify process or system bottlenecks, and drive automation to improve speed and reliability of infra operations
What we offer
What we offer
  • Medical, Vision, and Dental coverage
  • generous time-off policy
  • opportunity to contribute to your 401k plan
  • home office improvement stipend
  • annual education and wellness stipends
  • vibrant company culture through regular events
  • healthy lunches daily
  • Fulltime
Read More
Arrow Right
New

Director - AI

The Director -AI in Delivery leads multiple teams and managers, ensuring each gr...
Location
Location
United States
Salary
Salary:
168400.00 - 252600.00 USD / Year
3cloudsolutions.com Logo
3Cloud
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of experience delivering solutions in a primary domain such as application development, cloud platform engineering, DevOps, or data and analytics, including significant work at enterprise scale
  • 7+ years of experience in solution architecture and leading development or engineering teams, including multi-team or multi-workstream efforts at program scale
  • Proven ability to oversee delivery of complex, enterprise-scale AI/ML programs
  • Deep expertise across most of the following: Agents & Orchestration – leveraging Semantic Kernel, LangChain, or similar frameworks to design intelligent workflows
  • applying tool/function calling, planner and agent-based patterns, and workflow automation engines
  • Search & Retrieval – implementing solutions with vector databases such as Azure AI Search, Pinecone, FAISS, or Milvus
  • applying hybrid search methods, re-ranking strategies, and selecting embedding models for accuracy and performance
  • Evaluation, Quality & LLM Operations – establishing offline and online evaluation approaches
  • using golden datasets, hallucination and groundedness validation, toxicity and safety testing
  • building telemetry and feedback loops
Job Responsibility
Job Responsibility
  • Maintain a clear view of each team member's delivery commitments, growth plans, and internal contributions
  • Develop and grow team members through regular coaching, clear expectations, and constructive feedback. Build a culture of trust, inclusion, and collaboration
  • Monitor team health, morale, and workload balance, acting early to address engagement or performance concerns in partnership with practice leadership
  • Maintain regular one-to-one connections with each team member, provide clear feedback, and help them remove blockers related to client work, pursuits, or internal initiatives
  • Oversee staffing and resource alignment across teams, balancing utilization, development goals, client needs, and sustainable workloads
  • Communicate practice priorities and organizational objectives clearly, ensuring teams understand how their work contributes to broader organizational goals
  • Define interview standards and mentorship strategies, training others to apply them consistently and improving hiring outcomes and talent growth at scale
  • Create and maintain development plans for team members, including stretch assignments, shadowing, certifications, and opportunities for external visibility
  • Lead or support performance reviews, promotion recommendations, and compensation input for your teams, using consistent standards that reflect both impact and behavior
  • Represent your teams in leadership forums, communicate expectations and decisions clearly back to the group, and bring forward patterns, risks, and successes that should shape practice strategy
What we offer
What we offer
  • Flexible work location with a virtual first approach to work!
  • 401(K) with match up to 50% of your 6% contributions of eligible pay
  • Generous PTO providing a minimum of 15 days in addition to 9 paid company holidays and 2 floating personal days
  • Two medical plan options to allow you the choice to elect what works best for you!
  • Option for vision and dental coverage
  • 100% employer paid coverage for life and disability insurance
  • Paid leave for birth parents and non-birth parents
  • Option for Healthcare FSA, HSA, and Dependent Care FSA
  • $67.00 monthly tech and home office allowance
  • Utilization and/or discretionary bonus eligibility based on role
  • Fulltime
Read More
Arrow Right
New

Bar Staff

Join our food and beverage team on the bar for a career with more fun! No experi...
Location
Location
United Kingdom , Highfield Grange, Clacton-on-Sea
Salary
Salary:
12.21 GBP / Hour
parkdeanresorts.co.uk Logo
Parkdean Resorts
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • No experience required
  • Pockets full of passion, positivity and Parkdean team spirit
Job Responsibility
Job Responsibility
  • Get the bar prepped for an unforgettable service
  • Keep an eye on stock levels and top up supplies whenever needed
  • Serving food & snacks, interacting enthusiastically with customers
  • Serve up drinks with a smile, always following licensing regulations and company standards
  • Handle payments efficiently
What we offer
What we offer
  • Employee Assistance Programme with 24/7 confidential helpline for counselling and support
  • 50% discount for you and 25% discount for friends and family when booking holiday with us
  • Team member discount of 30% on food, drinks and leisure activities
  • Discounts on brands like Hello Fresh and local gyms
  • Training and apprenticeship opportunities
  • Chance to develop skills and boost career across 66 parks
  • Parttime
Read More
Arrow Right
New

Design Lead

We are seeking a Design Lead to join a team of dedicated product designers and e...
Location
Location
United States , New York; San Francisco
Salary
Salary:
175000.00 - 250000.00 USD / Year
canarytechnologies.com Logo
Canary Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in product design, with a focus on developing software solutions across mobile and web surfaces
  • Proven experience leading impactful design projects, including expertise in user research, UX/UI design, and building scalable design systems
  • Proficiency in design tools such as Figma, Sketch, Adobe Creative Suite, and Miro
  • A portfolio showcasing high-quality, user-centered design work across the product lifecycle is essential
  • Experience mentoring designers and fostering team growth, with the ability to provide constructive feedback and drive high standards
  • Strong communication skills to articulate design decisions and collaborate with cross-functional teams
  • BA/BS in a relevant field or equivalent professional experience
Job Responsibility
Job Responsibility
  • Lead Design Execution: Manage the full design lifecycle, including research, UX flows, interaction design, visual design, and prototyping, delivering intuitive and visually compelling experiences
  • Mentor and Develop: Work with a team of designers, fostering their growth and ensuring the delivery of high-quality work. Promote a strong design-first culture
  • Establish Standards: Build and maintain scalable design systems and style guides that ensure consistency and enable innovation across products
  • Conduct Research: Leverage user research and data analysis to inform design decisions, creating user journeys, personas, and actionable insights aligned with product hypotheses
  • Collaborate Cross-Functionally: Partner closely with Product, Engineering, and other teams to deliver solutions that balance user needs, technical feasibility, and business objectives
  • Embrace Iteration: Work in an agile, iterative environment, delivering incremental improvements while progressing toward the ideal design vision
  • Communicate Effectively: Clearly articulate design concepts and strategies to align teams and stakeholders, fostering understanding and enthusiasm
What we offer
What we offer
  • Canary Days: As a company we want to ensure that the team has time to recharge. Each month we provide company wide days off to ensure there is at least one extended weekend or day off
  • Self Improvement Club: We meet each month and share our personal goals for the month. Each individual is provided a budget towards any purchases that help us achieve these goals
  • Professional Development Chats: We provide budget to help drive cross functional professional development conversations across the organization
  • Travel Reimbursement: Team members are able to visit our offices across New York, San Francisco or Dallas when they choose, and are provided a travel stipend for doing so. Spend time working with the team in their office, and use the rest of your time exploring a new city!
  • Personal Travel Reimbursement: If you stay at a hotel that Canary works with, we provide a credit towards your stay
  • Fulltime
Read More
Arrow Right