Gpu Consultant Job at Linux Recruit

Senior Engineer/MTS SLT Product Development Engineer

We are the New Product Introduction (NPI) test engineering team defining and pro...

Location

Singapore , Singapore

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Bachelor's or Master's in computer engineering or computer science or electrical engineering, or comparable disciplines
Knowledge or working experience on GPU architecture, X86 architecture, SoC design and power management features
Experience with SOC (System-On-Chip), Firmware and Software interaction
Software programming and scripting proficiency (Java, Shell script, Perl, Ruby, Python)
Proficiency in Windows, and Linux operating systems
10 years or more industry experience

Job Responsibility

Accountable to drive and develop SLT solutions to meet business milestone, cost and quality in system level area
Solves complex, novel, and non-recurring problems
initiates significant changes to existing processes/methods and leads development and implementation
Conduct engineering evaluations and analysis to drive closure of production issues
Develop and architect SLT logging or data collection flow in manufacturing
Influences technical decisions that have a significant impact on final product
Involves collaboration on or assuming the consultative or leadership responsibilities for a specific project or for product development initiatives
May provide technical supervision or mentoring junior engineers
Upscale overall team capabilities on low level system debug for AMD data center product families

Fulltime

Enterprise Territory Executive

AMD is seeking a high-impact Enterprise Territory Executive to drive growth acro...

Location

United States , Texas

Salary:

224560.00 - 336840.00 USD / Year

AMD

Expiration Date

Until further notice

Requirements

Proven success selling enterprise technology solutions and consistently exceeding revenue objectives
Strong ability to build executive relationships and influence business and technology decision-makers
Strategic mindset with a track record of identifying, developing, and closing complex opportunities
Passion for emerging technologies including AI, cloud computing, data center modernization, and digital transformation
Strong consultative selling skills focused on customer outcomes and business value
Collaborative approach with the ability to work effectively across highly matrixed organizations
Excellent communication, presentation, and relationship-building skills
Willingness and ability to travel approximately 50%
Proven success selling Data Center, AI Infrastructure, Enterprise Compute, Cloud, Storage, Networking, or Commercial Client solutions into enterprise organizations
Demonstrated ability to consistently exceed sales targets and drive business growth

Job Responsibility

Develop and execute territory growth strategies that expand AMD's presence across enterprise customers throughout the Central U.S. region
Identify, qualify, and close opportunities that drive revenue growth, market share expansion, and long-term customer value
Build and maintain a healthy pipeline across Data Center, AI, Cloud, and Commercial Client opportunities
Develop territory plans that align customer priorities with AMD's strategic growth objectives
Build trusted advisor relationships with CIOs, CTOs, Chief Architects, Infrastructure Leaders, Procurement Organizations, and Line-of-Business stakeholders
Understand customer technology roadmaps, business priorities, and modernization initiatives
Position AMD solutions as strategic enablers of AI adoption, cloud transformation, infrastructure modernization, and workforce productivity
Drive adoption of AMD EPYC™ processors across virtualization, cloud, storage, enterprise applications, and high-performance computing workloads
Position AMD Instinct™ accelerators to support AI training, inference, advanced analytics, and emerging enterprise AI initiatives
Engage customers on AI infrastructure strategies, GPU acceleration, software ecosystems, and future workload requirements

What we offer

Benefits offered are described: AMD benefits at a glance.

Fulltime

Principal Product Development Eng. - System Level Test

We are the New Product Introduction (NPI) test engineering team defining and pro...

Location

Malaysia , Penang

Salary:

Not provided

AMD

Expiration Date

Until further notice

Requirements

Knowledge or working experience on GPU architecture, X86 architecture, SoC design and power management features
Experience with SOC (System-On-Chip), Firmware and Software interaction
Software programming and scripting proficiency (Java, Shell script, Perl, Ruby, Python)
Proficiency in Windows, and Linux operating systems
10 years or more industry experience

Job Responsibility

Accountable to drive and develop SLT solutions to meet business milestone, cost and quality in system level area
Solves complex, novel, and non-recurring problems
initiates significant changes to existing processes/methods and leads development and implementation
Conduct engineering evaluations and analysis to drive closure of production issues
Develop and architect SLT logging or data collection flow in manufacturing
Influences technical decisions that have a significant impact on final product
Involves collaboration on or assuming the consultative or leadership responsibilities for a specific project or for product development initiatives
May provide technical supervision or mentoring junior engineers
Upscale overall team capabilities on low level system debug for AMD data center product families

Fulltime

Senior Principal AI Infrastructure Architect

The Senior Principal AI Infrastructure Architect is a highly skilled and advance...

Location

Italy , Milano

Salary:

Not provided

NTT DATA

Expiration Date

Until further notice

Requirements

Significant experience in a consulting, presales or architecture role within a large-scale (preferably multi-national) technology services environment, with a track record of leading AI infrastructure pursuits
Demonstrable experience designing and delivering production AI platforms — from single multi-GPU servers through to multi-rack training clusters and inference factories
Strong working knowledge of the AI hardware vendor landscape (NVIDIA, AMD, Intel, Dell, HPE, Lenovo, Supermicro, Cisco, Pure, VAST, WEKA, DDN, NetApp) and how to position partner ecosystems competitively
Proven ability to translate AI workload requirements (model size, parameter count, sequence length, throughput SLOs, latency targets) into accurate hardware bills of materials and sizing justifications
Significant client engagement and consulting experience, including client needs assessment, change management and the ability to identify whitespace for follow-on AI infrastructure and managed-services work
Significant business development and presales experience on infrastructure-led deals, ideally including sovereign AI, AI Factory or regulated-industry GenAI programmes
Strong understanding of how AI infrastructure integrates with business processes, applications, data platforms and existing enterprise architecture
Bachelor's degree or equivalent in Information Technology, Engineering, Computer Science or a related field
Deep, hands-on knowledge of AI hardware: GPU and accelerator portfolios (NVIDIA Hopper / Blackwell, AMD MI300/MI325, Intel Gaudi 3, emerging custom silicon), host CPU platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), system topologies (HGX, DGX, MGX, OAM) and how each choice maps to specific AI workloads
Strong understanding of AI-class storage: parallel filesystems, all-flash NVMe platforms, S3-class object stores, checkpoint and dataset pipelines and the I/O patterns of large-scale training and inference (VAST, WEKA, DDN EXAScaler, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)

Job Responsibility

Lead the end-to-end design of large, complex AI infrastructure solutions — covering accelerated compute (NVIDIA H100/H200/B200 and GB200 NVL72, AMD Instinct MI300X/MI325X, Intel Gaudi 3), CPU host platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), high-throughput storage tiers and lossless AI fabric — for enterprise, sovereign AI and AI Factory clients
Architect reference designs built on NVIDIA DGX/HGX SuperPOD, Dell AI Factory with NVIDIA, Cisco Nexus HyperFabric AI, HPE / Lenovo / Supermicro accelerated compute and equivalent platforms, balancing single-node performance with cluster-scale efficiency
Size and validate GPU clusters against real workloads — foundation-model pre-training, distributed fine-tuning, RAG, real-time and batch inference — using the right combination of NVLink/NVSwitch domains, InfiniBand NDR/XDR or Ultra Ethernet / NVIDIA Spectrum-X fabrics and tiered NVMe and parallel storage (VAST, WEKA, DDN, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
Define the supporting datacenter design: high-density power (50–140 kW/rack), direct-to-chip and rear-door liquid cooling, structured cabling for AI fabrics and modular deployment models across on-prem, colo and sovereign-cloud footprints
Work closely with the sales team to drive the presales process for AI infrastructure pursuits — client discovery, technical workshops, proposal writing, executive presentations and bid defence
Translate clients' AI ambitions and business outcomes into a hardware and platform roadmap, positioning NTT DATA's end-to-end portfolio — silicon, systems, storage, fabric, MLOps stack and managed services — to land service-led AI solutions
Lead integration of compute, storage, networking, the AI software stack (CUDA, ROCm, Triton, NIM, NVIDIA AI Enterprise, Run:ai, Slurm, Kubernetes / Kubeflow) and managed-service operating models across multiple domains, delivery units and geographies
Build business cases, TCO and unit-economics models (cost per token, cost per training run, GPU-hour economics) and end-to-end transition roadmaps for cloud-to-private AI migrations and sovereign AI deployments
Define architectural principles for AI infrastructure — accelerator utilisation, data gravity, multi-tenancy, model lifecycle, energy efficiency — and apply them to influence architectural outcomes and governance
Develop As-Is, Vision, FMO and To-Be AI platform architectures, identify gaps and develop transition roadmaps

Fulltime

Member of Technical Staff, Microsoft Robotics (Spatial AI)

Microsoft’s Discovery and Quantum (MDQ) division develops and delivers advanced ...

Location

United States , Redmond

Salary:

102100.00 - 202200.00 USD / Year ▼

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 2+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
OR Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) or consulting experience
OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Job Responsibility

Design, develop, and evaluate physical world models that capture 3D spatial structure, object geometry and pose, physics dynamics, material properties, and semantic scene understanding for robotic applications
Build and train world models (e.g., video prediction models, neural physics simulators, 3D generative models, scene graph representations) that predict future states of physical environments conditioned on robot actions, enabling model-based planning and policy learning
Develop spatial AI capabilities including 3D scene reconstruction, object detection and pose estimation, spatial relationship reasoning, occupancy prediction, and dense 3D feature representations for robot perception and planning
Implement and maintain evaluation frameworks for world models and spatial AI systems, including prediction accuracy metrics, planning performance benchmarks, and generalization testing across environments and object categories
Collaborate with robotics researchers, learning engineers, and simulation engineers to integrate world models into robot planning and control pipelines, enabling model-predictive control, imagination-based planning, and data-augmented training
Build data pipelines for training world models, including multi-sensor data fusion (RGB, depth, LiDAR, proprioception), scene annotation, and dataset curation for diverse physical environments and interaction scenarios
Write efficient, readable, extensible code in Python (including PyTorch, JAX, or TensorFlow) for model development, training, and evaluation, leveraging GPU computing infrastructure for large-scale experiments
Contribute to the formulation of the team's world modeling research and development roadmap, identifying high-impact technical directions and collaborating with leadership to prioritize investments
Present research findings and model evaluation results clearly and efficiently to internal stakeholders and external partners, contributing to technical publications, blog posts, and conference presentations
Stay current with state-of-the-art research in world models, spatial AI, 3D vision, neural physics simulation, and foundation models for physical understanding, actively contributing to the body of thought leadership in these areas

Fulltime

Principal Software Consultant - AI/ML Engineer

As an ML Team Lead, you will be responsible for leading the technical direction ...

Location

Pakistan , Lahore, Karachi, Islamabad

Salary:

Not provided

10Pearls

Expiration Date

Until further notice

Requirements

Bachelor's or Master's degree in computer science, Artificial Intelligence, Data Science, Software Engineering, or a related field
7+ years of professional software engineering experience with at least 5 years of hands-on experience building and deploying ML systems into production
Prior experience as a Tech Lead, Staff Engineer, or hands-on lead for AI/ML engineering teams
Strong expertise in classical machine learning domains such as forecasting, ranking, classification, and optimization
Hands-on experience building modern LLM and agentic AI systems including RAG pipelines, tool-using agents, multi-step workflows, and evaluation systems
Strong proficiency in Python and backend system development
Experience with ML frameworks such as PyTorch or TensorFlow
Strong understanding of scalable distributed systems, APIs, system integration, architecture design, and production engineering practices
Experience operating ML services at scale, including SLO management, monitoring, on-call practices, and incident response
Experience working with Kubernetes-based deployments, CI/CD pipelines, and modern cloud-native engineering practices

Job Responsibility

Lead the technical direction for the team’s ML and LLM systems, including architecture patterns, platform choices, evaluation frameworks, and engineering standards
Stay hands-on by designing and implementing complex ML and agentic AI systems, writing production-grade code, and leading through technical execution
Design, develop, and deploy scalable ML and LLM-powered applications and services in production environments
Build and optimize AI-powered solutions such as RAG systems, multi-step agents, AI assistants, chatbots, forecasting systems, ranking models, classification models, and optimization systems
Drive architecture and design reviews to ensure scalability, reliability, security, and maintainability of AI/ML systems
Own the technical roadmap for ML/LLM initiatives and translate business objectives into execution plans and scalable solutions
Collaborate closely with Product Managers, Engineers, Data Engineers, MLOps Engineers, QA Engineers, and cross-functional stakeholders to deliver business-aligned AI solutions
Establish engineering best practices for prompt engineering, model evaluation, regression testing, observability, and production readiness
Define and implement quality standards, evaluation suites, acceptance metrics, and regression plans for all AI/ML features
Ensure high availability, scalability, and resilience of tier-1 ML services through SLOs, monitoring, incident response, failover strategies, circuit breakers, and multi-zone deployments

Fulltime

Staff/Senior Software Consultant - Azure, AKS, MLflow & Kubeflow

10Pearls is seeking a Staff/Senior MLOps Engineer – Azure ML Platform & LLMOps t...

Location

Pakistan , Karachi; Lahore; Islamabad

Salary:

Not provided

10Pearls

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Engineering, or a related field (preferred)
5+ years of professional experience in MLOps, DevOps, SRE, Platform Engineering, or ML Infrastructure roles
Minimum 3 years of hands-on experience supporting production-grade ML systems and AI platforms
Strong hands-on experience with Microsoft Azure, including Azure Kubernetes Service (AKS), Azure Service Bus, Azure Storage, networking, identity management, and cloud cost optimization
Strong Kubernetes operational expertise including Helm, Ingress Controllers, autoscaling (HPA/VPA/KEDA), GPU scheduling, workload troubleshooting, and large-scale container orchestration
Production experience with MLflow, Kubeflow, or equivalent ML platform tooling for experiment tracking, model registries, and ML pipeline orchestration
Strong expertise in GitLab CI/CD or equivalent CI/CD tooling for automated deployments, validation gates, rollback workflows, and progressive delivery patterns
Hands-on experience with monitoring and observability platforms including Prometheus, Grafana, OpenTelemetry, Azure Monitor, Datadog, New Relic, or Elastic
Experience monitoring ML/LLM systems including latency, model performance, drift, token usage, infrastructure health, and operational costs
Strong proficiency in Python and shell scripting for automation and operational tooling

Job Responsibility

Design and operate end-to-end ML infrastructure on Microsoft Azure, including training environments, model registries, deployment workflows, and scalable inference systems on Azure Kubernetes Service (AKS)
Own and evolve MLflow and Kubeflow platforms, including experiment tracking, model registry management, reproducible training workflows, and pipeline orchestration
Build and maintain robust CI/CD pipelines in GitLab for ML models and AI services, including validation gates, canary deployments, progressive delivery, and automated rollback strategies
Design scalable inference systems using AKS autoscaling, GPU scheduling, Redis caching, asynchronous processing with Azure Service Bus, and cost-aware infrastructure planning
Implement comprehensive monitoring and observability for ML and LLM systems, covering infrastructure metrics, latency, drift detection, token usage, quality metrics, and operational cost tracking
Define and enforce platform-level security controls including IAM policies, secrets management, network segmentation, audit logging, dependency scanning, and model access governance
Build highly available and fault-tolerant ML serving infrastructure with strong focus on scalability, disaster recovery, resilience, and platform reliability
Define and maintain platform SLOs for ML services, including incident response processes, postmortems, and operational improvement initiatives
Partner closely with ML Engineers to productionize new ML models, LLM systems, and agentic AI workflows with safe rollout and evaluation patterns
Optimize infrastructure utilization and operational cost across compute, GPU workloads, and LLM provider usage through batching, caching, autoscaling, and routing strategies

Fulltime

Global Lead Architect – Hybrid Cloud, AI & HPE Platform Delivery

A highly senior, customer-facing architecture and delivery leadership role respo...

Location

Bulgaria , Sofia

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

12–15+ years in enterprise IT, with strong focus on: Solution architecture and delivery leadership
Hybrid cloud, AI/HPC, and infrastructure platforms
Proven background in professional services / delivery-led roles, not purely presales
Demonstrated experience leading large-scale, multi-technology programs end-to-end
Strong consulting mindset with excellent stakeholder and executive communication skills
Deep expertise in enterprise private cloud platforms and hybrid architectures
Strong understanding of workload migration, interoperability, and governance
AI platform design (GPU-based infrastructure, NVIDIA ecosystem)
HPC cluster architecture, workload schedulers (Slurm, PBS Pro), and performance tuning
Kubernetes ecosystems (OpenShift, Rancher, CNCF stack)

Job Responsibility

Serve as the technical validation authority during early sales cycles
Lead technical governance from opportunity qualification through delivery execution
Own solution integrity across the lifecycle—design, validation, implementation, and optimization
Architect and oversee end-to-end hybrid and private cloud solutions
Drive adoption of cloud-native, automated, and scalable architectures
Lead delivery teams across complex engagements
Act as the lead design authority ensuring delivery success for AI infrastructure and HPC deployments, Containerized platforms and cloud-native environments, Enterprise hybrid cloud transformations
Provide hands-on guidance during critical phases (design reviews, PoCs, escalations)
Lead technical due diligence during RFP/RFI responses, Solution workshops and discovery sessions, Proof-of-concept engagements
Translate business requirements into deliverable, production-ready architectures

What we offer

Health & Wellbeing
Personal & Professional Development
Unconditional Inclusion

Fulltime

Select Country

Gpu Consultant

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Gpu Consultant

Senior Engineer/MTS SLT Product Development Engineer

Enterprise Territory Executive

Principal Product Development Eng. - System Level Test

Senior Principal AI Infrastructure Architect

Member of Technical Staff, Microsoft Robotics (Spatial AI)

Principal Software Consultant - AI/ML Engineer

Staff/Senior Software Consultant - Azure, AKS, MLflow & Kubeflow

Global Lead Architect – Hybrid Cloud, AI & HPE Platform Delivery

Our AI answers in your language