CrawlJobs Logo

Senior Manager, Performance AI/ML Network Deployment Engineering

amd.com Logo

AMD

Location Icon

Location:
United States , Santa Clara

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

210400.00 - 315600.00 USD / Year

Job Description:

The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering is a leadership position designed to optimize the design, roll-out and post-rollout management of AI/ML Fabrics. The candidate will be the technical interface between the customers and various internal engineering groups, field application engineers Leveraging extensive experience in large network architecture, Storage, AI/ML network deployments, and performance tuning, this role requires a disciplined approach to system triage, at-scale debug, and infrastructure optimization to ensure robust performance and efficient transitions from GPU production qualification to at-scale datacenter deployment.

Job Responsibility:

  • Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models
  • Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability
  • Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads
  • Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations
  • Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins
  • Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences

Requirements:

  • Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements
  • Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains, namely compute, network, storage
  • Experience in working with large customers such as Cloud Service Providers and global enterprise customers
  • Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc
  • Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
  • Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
  • Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends
  • Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista experience is preferred
  • Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market
  • Excellent communication level from engineer to mid-management to C-level of audience
  • Bachelors, master's in computer science, Engineering or related subjects of experience
  • This is a Senior level role
  • no recent college graduates will be considered
  • Ability to work well in a geographically dispersed team
  • Certifications in Networking, AI/ML, or Cloud Technologies

Additional Information:

Job Posted:
December 17, 2025

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Manager, Performance AI/ML Network Deployment Engineering

Senior DevOps Engineer (GCP)

Our client is a global UK-based financial services and investment banking organi...
Location
Location
Salary
Salary:
Not provided
n-ix.com Logo
N-iX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in DevOps, Cloud Engineering, or SRE roles
  • Strong hands-on experience with Google Cloud Platform, including: GKE / Kubernetes, Cloud Run, Cloud Functions, Pub/Sub, Cloud Storage, VPC, IAM, networking, security
  • Expertise in Terraform, Helm, or other IaC tools
  • Experience building CI/CD pipelines (GitHub Actions, GitLab CI, CircleCI, Jenkins, etc.)
  • Strong understanding of containerization and orchestration: Docker, Kubernetes
  • Solid experience with monitoring, observability, and logging stacks
  • Familiarity with networking, load balancing, security hardening, and zero-trust principles
  • Experience supporting production systems in high-availability, distributed environments
  • Strong scripting skills (Python, Bash, or similar)
  • Experience working with agile engineering teams
Job Responsibility
Job Responsibility
  • Design, implement, and maintain cloud infrastructure on Google Cloud (GKE, Cloud Run, Cloud Functions, Pub/Sub, Cloud Storage)
  • Build and optimize CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or similar)
  • Develop infrastructure-as-code using Terraform or similar tools
  • Set up and maintain container orchestration (Kubernetes, GKE) and automated deployment workflows
  • Implement monitoring, alerting, and observability using tools such as Prometheus, Grafana, ELK/Elastic, Stackdriver, or OpenTelemetry
  • Ensure compliance with security and governance standards across all environments
  • Collaborate closely with engineering teams to ensure scalable, high-performance deployment architectures
  • Support AI/ML and GenAI workloads (Vertex AI pipelines, model hosting, GPU workloads, inference optimization)
  • Manage environment strategies, release pipelines, configuration management, and secrets management
  • Optimize cloud costs and recommend improvements for performance and reliability
What we offer
What we offer
  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits
Read More
Arrow Right

Senior Devops & AI Engineer

This role presents a unique opportunity to contribute to the future of impactful...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
fissionlabs.com Logo
Fission Labs
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related field
  • 6+ years of experience in Infrastructure Mgmt. roles, with a focus on cloud platforms (Azure and AWS Preferred)
  • Hands-on experience with operations (DevSecOps) principles and best practices
  • Proficiency in scripting languages such as Python, PowerShell, or Bash
  • Excellent communication and collaboration skills
  • In-depth knowledge of Linux operating systems, including CentOS, Ubuntu, and Red Hat, with expertise in shell scripting, package management, and system administration
  • Hands-on experience with a wide range of AWS and Azure services
  • Develop and maintain Infrastructure as Code (IAC) templates using tools such as Terraform or AWS CloudFormation
  • Experience setting up cloud infrastructure stack, databases, service endpoints, GPU as well as CPU resource scaling, optimization etc.
  • Should have worked AIOps/MLOP
Job Responsibility
Job Responsibility
  • Configure and optimize Linux-based servers for performance, security, and resource utilization, including kernel tuning, file system management, and network configuration
  • Architect cloud solutions leveraging best practices and services offered by AWS and Azure, optimizing for scalability, reliability, and cost-effectiveness
  • Implement and manage hybrid cloud environments, facilitating seamless integration and interoperability between AWS and Azure services
  • Establish version control practices for IAC templates, ensuring traceability, auditability, and reproducibility of infrastructure changes
What we offer
What we offer
  • Opportunity to work on impactful technical challenges with global reach
  • Vast opportunities for self-development, including online university access and knowledge sharing opportunities
  • Sponsored Tech Talks & Hackathons to foster innovation and learning
  • Generous benefits packages including health insurance, retirement benefits, flexible work hours, and more
  • Supportive work environment with forums to explore passions beyond work
  • Fulltime
Read More
Arrow Right

Infrastructure Engineer - Network Security

The Network Security team ensures that Campbell’s business operations including ...
Location
Location
United States , Camden
Salary
Salary:
131400.00 - 188900.00 USD / Year
campbells.com Logo
THE VAIL CORPORATION
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum education required, with specialization as appropriate: Bachelors Degree or equivalent work experience in Information Technology or Information Security
  • 6+ of experience in IT or Information Security
  • 3+ Years of IT Systems Management (Plan/Build/Run)
  • 3+ years of Firewall policy management (Deployment/Operations) for one or more of the leading NGFW companies (Palo Alto, Fortinet, Checkpoint)
  • Previous experience working in outsourced IT environments
  • Ability to translate business needs into implementation plans and can articulate cost implications of options
  • Extensive knowledge & understanding of NGFWs, SSL VPN, NAC & RBAC, Privileged access (PAM) & SASE solutions
  • Vendor knowledge and past implementation of Cisco, Aruba & Fortinet
  • Extensive experience with Firewall & SSL VPN based policy management
  • both with direct implementation and guiding principles with ‘zero-trust’ approach
Job Responsibility
Job Responsibility
  • Develop, document, communicate, and enforce a network technology standards policy which is delivers value, is manageable and scalable
  • Conduct analysis of security requirements and controls to identify gaps and provides recommendations of industry best practices, trends, and technology products
  • Conduct research and make recommendations on network products, services, protocols, and standards in support of network procurement and digital development efforts
  • Lead efforts on Network infrastructure transition following ‘DevSecOps’ principles & framework in alignment with business application and related Enterprise Architectural standards
  • Help build strategic plans by leveraging leading-edge scientific and technological knowledge to drive business strategies as well as enhance the value proposition of IT solutions across cost, stability and security frameworks
  • Participating and enabling successful Business projects that have network security dependencies
  • Executing and ensure the successful delivery of IT Network and Network security tech
  • Assist with the design and implementation of short- and long-term strategic plans to ensure network services meet existing and future business requirements
  • Works closely with other groups, including System Administrators, AppOps, Infosec, Incident response & Vulnerability management teams, to ensure corporate compliance & improvements across network infrastructure
  • Provide support for Infosec related project initiatives and CSIRT event responses
What we offer
What we offer
  • Benefits begin on day one and include medical, dental, short and long-term disability, AD&D, and life insurance (for individual, families, and domestic partners)
  • Employees are eligible for our matching 401(k) plan and can enroll on the first day of employment with immediate vesting
  • Campbell’s offers unlimited sick time along with paid time off and holiday pay
  • If in WHQ – free access to the fitness center
  • Access to on-site day care (operated by Bright Horizons) and company store
  • Giving back to the communities where our employees work and live is very important to Campbell’s
  • Our “Campbell’s Cares” program matches employee donations and/or volunteer activity up to $1,500 annually
  • Campbell’s has a variety of Employee Resource Groups (ERGs) to support employees
  • competitive health, dental, 401k and wellness benefits beginning on the first day of employment
  • Fulltime
Read More
Arrow Right

Engineering Director

We are seeking a seasoned Engineering Director who thrives in challenging and fa...
Location
Location
Puerto Rico , Aguadilla
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Significant work experience as a director or similar position working across multiple stakeholder organizations, with at least 10+ years of people leadership experience specific to SW and Cloud engineering
  • Solid experience leading SW development across storage, networking, on-prem, and SaaS is a must
  • Experience in setting up geographically distributed sites
  • Must have a strong background in software development lifecycle including cloud infrastructure
  • Familiarity with agile methodologies and tools like JIRA
  • Prior experience in cloud product development and deployments
  • end to end ownership and accountability
  • Solid understanding of fundamental AI and machine learning concepts, including supervised and unsupervised learning, deep learning, reinforcement learning, natural language processing, computer vision, and statistical modeling
  • Extensive business acumen, technical knowledge, and industry experience encompassing one or more engineering, technology, and product domains
  • Demonstrated abilities to drive transformation across a business with exceptional skills in the management of change
Job Responsibility
Job Responsibility
  • Oversee the Puerto Rico Site daily operations, strategic planning and cross-functional team leadership for Hybrid Cloud
  • Recruit, mentor, and manage teams of AI/ML engineers, QA Engineers, Design Engineers and innovation specialists to deliver cutting-edge solutions
  • Continuously evaluate new tools, platforms, and frameworks in AI/ML to drive competitive advantage and operational efficiency
  • Ensure alignment with corporate goals while fostering a high-performance culture, operational efficiency, and employee engagement
  • Lead the development and execution of AI/ML strategies that align with business goals and drive innovation across products, services, or operations
  • Create strategic and tactical operations and resource plans, goals, and priorities for assigned organization based on business and technology roadmap and functional objectives
  • Engage with various senior leaders across the organization, program managers, R&D, support, Quality, product managers, technical leaders and executives to communicate program status, escalate issues, and guide and influence strategic decision-making
  • Manage senior relationships and escalated issues with outsourced partners and suppliers, including setting expectations regarding deliverables, product quality, schedules, and costs
  • ensures that organization is effectively leveraging outsourced resources
  • Identify opportunities for and drive organizational initiatives and programs to support business process improvements and cost reductions
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Product Manager

Senior Product Manager (Data Center AIOps). HPE’s Data Center Networking team is...
Location
Location
United States , Sunnyvale
Salary
Salary:
136500.00 - 276500.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, engineering, or a related technical field
  • MBA or advanced degree preferred
  • 6+ years of experience in product management or technical program management for enterprise networking software
  • Experience working with cross-functional teams in agile environments
  • Understanding of data center networking architectures and protocols (EVPN-VXLAN, BGP, LAG, telemetry)
  • Familiarity with network automation tools (Apstra, Ansible, and Terraform) and controller-based design
  • Troubleshooting ability to interpret streaming telemetry, CLI show commands, syslogs, and other operational data sources for root cause analysis
  • Experience with AI/ML-based observability or AIOps tools for network assurance
  • Demonstrated success managing products delivered as SaaS or cloud-native services
  • Exceptional communication skills, both technical and business-oriented, with the ability to influence across leadership and engineering teams
Job Responsibility
Job Responsibility
  • Define, prioritize, and execute the strategy for the Apstra Data Center Director and Data Center Assurance
  • Partner with cross-functional engineering and design teams to translate customer needs into product specifications and deliverables
  • Analyze operational data from large-scale data center networks to identify root causes and drive product improvements
  • Apply AI/ML techniques to automate fault detection, anomaly correlation, predictions, and intent-based assurance within the solution
  • Work directly with customers, field engineers, and partners to gather feedback and validate product direction
  • Collaborate with marketing and sales enablement to communicate product positioning, competitive differentiation, and go-to-market strategies
  • Manage product performance metrics for deployments, ensuring the efficacy of solutions are addressing real user scenarios
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Go (Next-Gen Firewall)

We are seeking a Senior Backend Engineer (Go) to join our core engineering team ...
Location
Location
Vietnam , Ho Chi Minh City
Salary
Salary:
Not provided
qualgo.net Logo
Qualgo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master’s degree in Computer Science, Cybersecurity, Network Engineering, or related field
  • Deep understanding of goroutines, channels, memory management, and profiling (pprof)
  • Strong grasp of the OSI model, TCP/IP, DNS, TLS/SSL, VPNs (WireGuard/IPsec), and Routing
  • Experience with Docker, Kubernetes, and deploying network appliances on AWS/GCP/Azure
  • Production experience with Kafka, RabbitMQ, or NATS
  • Good English skills (speaking and listening) to communicate with the global teams
  • Hands-on experience with Suricata, Snort, Zeek, or Squid Proxy
  • Familiarity with OPNsense or pfSense architecture is a huge plus
Job Responsibility
Job Responsibility
  • Design and implement high-performance Go services that interact with network subsystems (netfilter/nftables) and open-source security engines (Suricata, Squid, Zeek)
  • Design and implement routing functionalities on low resource gateway system
  • Develop custom plugins or sidecars to ingest, parse, and normalize IDS/IPS alerts (Suricata EVE logs) and Proxy logs for the AI engine
  • Build the "Action Engine" that translates AI threat verdicts into real-time blocking rules (firewall policies, BGP blackholing, or DNS sinkholing)
  • Deeply integrate with OPNsense APIs/plugins to orchestrate policy updates across distributed firewall nodes
  • Architect scalable gRPC and REST APIs to serve as the control plane for thousands of firewall agents
  • Write highly optimized, concurrent Go code to handle high-throughput log ingestion with minimal latency/GC overhead
  • Design distributed locking and consistency mechanisms to ensure firewall policies are synchronized globally across multi-tenant environments
  • Build low-latency pipelines using Kafka or NATS JetStream to stream network telemetry to our AI/ML inference engine
  • Implement WebSocket or HTTP/2 streaming for real-time threat visualization and alerting dashboards
What we offer
What we offer
  • Meaningful work & impact
  • Competitive rewards
  • Growth & well-being
  • People & workspace
  • Young & dynamic environment
Read More
Arrow Right

Senior ML Engineer

Join our dynamic team as an AI Developer, where you'll be at the centre of our A...
Location
Location
Salary
Salary:
Not provided
clixlogix.com Logo
ClixLogix
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A degree in Computer Science, Engineering, or a related field
  • Proven experience in designing and deploying AI/ML solutions. (3-4 years)
  • Stellar programming skills, especially in Python
  • Deep knowledge of machine learning, neural networks, and frameworks like TensorFlow and PyTorch
  • Expertise in data handling, preprocessing, and augmentation
  • Familiarity with version control tools and collaborative practices
  • Strong foundation in software engineering, including testing and debugging
  • Exceptional problem-solving and analytical skills
  • Outstanding communication skills for seamless teamwork
Job Responsibility
Job Responsibility
  • Uphold our company's values and principles, ensuring that all policies and procedures are followed with precision
  • Be a beacon of professionalism, setting an example in day-to-day conduct, interactions, and soft aspects
  • Stay aligned with our Remote work policies, fostering a positive and collaborative work environment
  • Collaborate with cross-functional teams to understand business requirements and translate them into scalable AI/ML solutions
  • Design and architect end-to-end AI/ML pipelines, from data collection to deployment
  • Develop advanced machine learning algorithms to tackle challenges in areas like natural language processing and computer vision
  • Optimize models for performance, accuracy, and efficiency
  • Clean, transform, and enrich raw data for AI/ML models
  • Train, validate, and refine machine learning models using cutting-edge techniques
  • Evaluate model performance across diverse datasets
  • Fulltime
Read More
Arrow Right

Senior Staff Engineer, Software Engineering

We are seeking a highly accomplished Senior Staff Engineer to join our engineeri...
Location
Location
United States , Chevy Chase, MD; Palo Alto, CA; Seattle, WA
Salary
Salary:
130000.00 - 260000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep expertise in infrastructure systems, including compute platforms (Kubernetes, Docker, cloud services), networking, and storage
  • Strong database experience across relational databases (PostgreSQL, MySQL) and NoSQL solutions (MongoDB, Cassandra, Redis, DynamoDB)
  • Demonstrated experience applying AI to solve real-world problems in production environments
  • Expert-level proficiency in at least two programming languages (e.g., Python, Java, Go, Rust)
  • Experience designing and building distributed systems at scale
  • Strong understanding of cloud platforms (Azure OR AWS) and infrastructure-as-code practices
  • Hands-on experience with CI/CD pipelines, build systems, and deployment automation (e.g., GitHub Actions, Jenkins, Azure DevOps, ArgoCD)
  • Background in building real-time data processing systems (Kafka, Flink, Spark)
  • Excellent communication skills with the ability to articulate complex technical concepts to diverse audiences
  • Experience working in a platform engineering team, building internal developer platforms or shared infrastructure services
Job Responsibility
Job Responsibility
  • Define and drive the technical vision for infrastructure and AI-powered systems across the organization
  • Design, architect, and implement highly scalable, fault-tolerant distributed systems
  • Lead technical decision-making on critical projects, balancing short-term needs with long-term sustainability
  • Establish and champion engineering best practices, design patterns, and coding standards
  • Architect and optimize compute infrastructure for performance, reliability, and cost efficiency
  • Design and implement database solutions (relational and NoSQL) that scale to meet business demands
  • Drive cloud infrastructure strategy, including containerization, orchestration, and serverless architectures
  • Ensure system reliability, observability, and operational excellence across all platform components
  • Identify and prioritize opportunities to apply AI/ML to solve high-impact business problems
  • Stay current with emerging AI technologies and evaluate their applicability to business challenges
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right