Senior AI/ML Validation Engineer Job at AMD (Hyderabad)

Senior AI/ML Validation Engineer

AMD

Location:
India , Hyderabad

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Not provided

Save Job

Apply Position

Job Description:

We are seeking an experienced and versatile professional with expertise in validation strategy, automation, and quality for AI/ML model serving, GPU software stacks, device drivers, firmware, and cross-platform systems (Linux/Windows). You will build test frameworks, drive CI quality gates, perform performance and reliability testing, and lead cross-stack triage to ensure robust releases in a rapidly evolving environment.

Job Responsibility:

Own end-to-end test strategy for AI/ML workflows (PyTorch, vLLM), GPU runtimes, drivers, and firmware across kernel and user space
Develop scalable automation frameworks spanning unit, integration, HIL (hardware-in-the-loop), system, and end-to-end tests
Implement and maintain CI quality gates (GitHub Actions/Workflows, Jenkins), including automated build, test execution, artifact management, reporting, and flake reduction
Design and execute performance, stress, reliability, soak, and long-haul tests targeting GPU compute, memory, I/O, and serving throughput/latency
Validate cross-platform compatibility (Linux/Windows), covering driver interfaces, kernel interactions, firmware behavior, and runtime stability
Create reproducible environments with containers/orchestration
instrument telemetry and observability for data-driven QA
Apply agentic AI techniques to accelerate test generation, triage, and root cause analysis
integrate intelligent diagnostics into pipelines
Develop rigorous test cases for low-level features (PCIe, DMA, interrupts, memory management), error handling, recovery, and fault injection
Define and track quality KPIs (coverage, defect escape rate, MTTR, performance regressions) and drive continuous improvement
Lead defect triage across hardware, firmware, driver, runtime, and model layers
collaborate with engineering to resolve issues rapidly
Produce comprehensive documentation: test plans, procedures, fixtures, coverage maps, readiness criteria, and retrospectives

Requirements:

8–12 years in QA/Test for systems software or platform engineering, with at least 4 years focused on GPU software, device drivers, or firmware validation
Demonstrable ownership of validation for AI/ML pipelines and serving stacks using PyTorch and at least one modern inference framework (e.g., vLLM), including accuracy baselining and performance regression detection
Proven expertise testing drivers and firmware with hands-on work in: PCIe fundamentals (link training, BARs, MSI/MSI-X), DMA engines, interrupt handling, and memory models
Failure modes: error injection, recovery paths, power/thermal events, and persistence across reboot/upgrade cycles
Deep proficiency in Linux (kernel/user space) and practical experience with Windows driver ecosystems
ability to: Read kernel logs and symbols, trace with ftrace/perf/ETW, and perform cross-layer debugging
Build custom kernels/modules and analyze crash dumps (kdump, WinDbg)
Strong programming for test automation: Python for framework and orchestration (pytest or equivalent), robust mocking/fixtures, and data-driven test generation
C/C++ for low-level test harnesses, protocol exercisers, and performance micro-benchmarks
Bash/PowerShell for environment setup, CI scripting, and reproducibility
CI/CD mastery with GitHub Actions/Workflows and/or Jenkins: Design gated pipelines with parallelization, artifact management, flaky test quarantine, and automated rollback criteria
Integrate metrics, alerts, and quality reports
enforce go/no-go release thresholds
Performance testing rigor: Methodology for baselining, variance control, and noise isolation
application of statistical techniques (e.g., confidence intervals, A/B comparisons) to detect regressions
GPU-focused profiling and analysis (e.g., perf counters, memory bandwidth, kernel occupancy)
Tooling fluency: gdb, perf, ftrace, valgrind, WinDbg, ETW
log/trace correlation
containerized test environments (Docker) and familiarity with Kubernetes for distributed tests
Exploratory testing mindset: Hypothesis-driven investigation, boundary and adversarial testing, fuzzing (protocol/API), chaos/fault injection, and reverse-engineering of interfaces when documentation is limited
Communication and leadership: Clear, concise defect reporting
ability to drive triage across teams
establish and evangelize quality standards
maintain strong documentation discipline
BS/MS in Computer Science/Computer Engineering, or related discipline

Nice to have:

Lab ops for QA: rack mounting, server configuration, BMC/IPMI, BIOS/fw updates, network/storage setup, power/thermal profiling
Front-end/UI testing experience for internal tools: ReactJS, web UI automation, accessibility and usability checks
Backend/DB validation: REST/gRPC testing, SQL/NoSQL, schema migrations, data integrity, performance tuning
Observability: Prometheus/Grafana, OpenTelemetry
integrating quality signals and alerts into CI/CD and release gates

What we offer:

AMD benefits at a glance

Additional Information:

Job Posted:
February 13, 2026

AMD - All Job Offers

Job Link Share:

Senior AI/ML Validation Engineer

AMD

Location:
India , Hyderabad

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 13, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior AI/ML Validation Engineer

Model Validation Senior Analyst

Senior Software Engineer in Test

Senior Data Engineer

Senior Engineer, Sdet

Senior ML Data Engineer

Senior ML Data Engineer

Senior Software Engineer – AI

Senior Software Engineer, Backend

Senior AI/ML Validation Engineer

AMD

Location:India , Hyderabad

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:February 13, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Senior AI/ML Validation Engineer

Model Validation Senior Analyst

Senior Software Engineer in Test

Senior Data Engineer

Senior Engineer, Sdet

Senior ML Data Engineer

Senior ML Data Engineer

Senior Software Engineer – AI

Senior Software Engineer, Backend

Location:
India , Hyderabad

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 13, 2026