CrawlJobs Logo

Senior AI/ML Validation Engineer

India, Hyderabad · Job Posted February 13, 2026
Apply Position
Job Link Share

Job Description

We are seeking an experienced and versatile professional with expertise in validation strategy, automation, and quality for AI/ML model serving, GPU software stacks, device drivers, firmware, and cross-platform systems (Linux/Windows). You will build test frameworks, drive CI quality gates, perform performance and reliability testing, and lead cross-stack triage to ensure robust releases in a rapidly evolving environment.

Job Responsibility

  • Own end-to-end test strategy for AI/ML workflows (PyTorch, vLLM), GPU runtimes, drivers, and firmware across kernel and user space
  • Develop scalable automation frameworks spanning unit, integration, HIL (hardware-in-the-loop), system, and end-to-end tests
  • Implement and maintain CI quality gates (GitHub Actions/Workflows, Jenkins), including automated build, test execution, artifact management, reporting, and flake reduction
  • Design and execute performance, stress, reliability, soak, and long-haul tests targeting GPU compute, memory, I/O, and serving throughput/latency
  • Validate cross-platform compatibility (Linux/Windows), covering driver interfaces, kernel interactions, firmware behavior, and runtime stability
  • Create reproducible environments with containers/orchestration
  • instrument telemetry and observability for data-driven QA
  • Apply agentic AI techniques to accelerate test generation, triage, and root cause analysis
  • integrate intelligent diagnostics into pipelines
  • Develop rigorous test cases for low-level features (PCIe, DMA, interrupts, memory management), error handling, recovery, and fault injection
  • Define and track quality KPIs (coverage, defect escape rate, MTTR, performance regressions) and drive continuous improvement
  • Lead defect triage across hardware, firmware, driver, runtime, and model layers
  • collaborate with engineering to resolve issues rapidly
  • Produce comprehensive documentation: test plans, procedures, fixtures, coverage maps, readiness criteria, and retrospectives

Requirements

  • 8–12 years in QA/Test for systems software or platform engineering, with at least 4 years focused on GPU software, device drivers, or firmware validation
  • Demonstrable ownership of validation for AI/ML pipelines and serving stacks using PyTorch and at least one modern inference framework (e.g., vLLM), including accuracy baselining and performance regression detection
  • Proven expertise testing drivers and firmware with hands-on work in: PCIe fundamentals (link training, BARs, MSI/MSI-X), DMA engines, interrupt handling, and memory models
  • Failure modes: error injection, recovery paths, power/thermal events, and persistence across reboot/upgrade cycles
  • Deep proficiency in Linux (kernel/user space) and practical experience with Windows driver ecosystems
  • ability to: Read kernel logs and symbols, trace with ftrace/perf/ETW, and perform cross-layer debugging
  • Build custom kernels/modules and analyze crash dumps (kdump, WinDbg)
  • Strong programming for test automation: Python for framework and orchestration (pytest or equivalent), robust mocking/fixtures, and data-driven test generation
  • C/C++ for low-level test harnesses, protocol exercisers, and performance micro-benchmarks
  • Bash/PowerShell for environment setup, CI scripting, and reproducibility
  • CI/CD mastery with GitHub Actions/Workflows and/or Jenkins: Design gated pipelines with parallelization, artifact management, flaky test quarantine, and automated rollback criteria
  • Integrate metrics, alerts, and quality reports
  • enforce go/no-go release thresholds
  • Performance testing rigor: Methodology for baselining, variance control, and noise isolation
  • application of statistical techniques (e.g., confidence intervals, A/B comparisons) to detect regressions
  • GPU-focused profiling and analysis (e.g., perf counters, memory bandwidth, kernel occupancy)
  • Tooling fluency: gdb, perf, ftrace, valgrind, WinDbg, ETW
  • log/trace correlation
  • containerized test environments (Docker) and familiarity with Kubernetes for distributed tests
  • Exploratory testing mindset: Hypothesis-driven investigation, boundary and adversarial testing, fuzzing (protocol/API), chaos/fault injection, and reverse-engineering of interfaces when documentation is limited
  • Communication and leadership: Clear, concise defect reporting
  • ability to drive triage across teams
  • establish and evangelize quality standards
  • maintain strong documentation discipline
  • BS/MS in Computer Science/Computer Engineering, or related discipline

Nice to have

  • Lab ops for QA: rack mounting, server configuration, BMC/IPMI, BIOS/fw updates, network/storage setup, power/thermal profiling
  • Front-end/UI testing experience for internal tools: ReactJS, web UI automation, accessibility and usability checks
  • Backend/DB validation: REST/gRPC testing, SQL/NoSQL, schema migrations, data integrity, performance tuning
  • Observability: Prometheus/Grafana, OpenTelemetry
  • integrating quality signals and alerts into CI/CD and release gates

What we offer

AMD benefits at a glance

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior AI/ML Validation Engineer

8 matching positions

Senior AI/ML Engineer, Physical AI Solutions

This role sits at the intersection of cutting-edge AI technology and real-world ...
Location
Location
United States , North Reading
Salary
Salary:
219800.00 - 351700.00 USD / Year
teradyne.com Logo
Teradyne
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep understanding of sensor technologies (RGBD, LiDAR, ToF, etc.) and the ability to architect solutions that leverage their respective strengths and limitations
  • Experience shipping AI-powered products to market and supporting them in production customer environments
  • Good software engineering fundamentals with experience integrating AI/ML capabilities into existing robotics software stacks
  • Expertise in data science and data engineering
  • Collaborative mindset with experience working in cross-functional teams building complete systems
  • Demonstrated ability to rigorously evaluate ML model performance, establish meaningful metrics, and track quality across training iterations
  • Experience utilizing simulation environments to accelerate engineering and validation workflows
  • Technical Expertise: Core robotics AI concepts: obstacle detection, object identification and classification, pose estimation
  • Various SLAM approaches and their trade-offs
  • Localization techniques from precision to relative positioning
Job Responsibility
Job Responsibility
  • Lead the development and deployment of physical AI capabilities that enable robots to navigate, manipulate, and interact with their environments with unprecedented intelligence
  • Help engineer solutions that leverage diverse sensor modalities
  • Be responsible for taking AI innovations from concept through production deployment and ongoing customer support
  • Collaborate closely with R&D teams at Universal Robots and Mobile Industrial Robots, helping shape the future of collaborative robotics and autonomous mobile platforms across the Teradyne Robotics portfolio
What we offer
What we offer
  • Discretionary bonus(es) based on financial performance
  • Robust health and well-being benefit programs, including medical, dental, vision, Flexible Spending Accounts, retirement savings plans, life and disability insurance, paid vacation & holidays, tuition assistance programs
  • Fulltime
Read More
Arrow Right

Senior AI/ML Engineer

We are seeking an experienced Senior AI/ML Engineer capable of driving complex A...
Location
Location
India , Mumbai
Salary
Salary:
Not provided
votredircom.fr Logo
Wissen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong proficiency in Python
  • Hands-on experience with ML libraries/frameworks: TensorFlow, PyTorch, Scikit-learn, Hugging Face
  • Machine Learning Expertise: Strong understanding of Deep learning, Feature engineering, Model evaluation techniques, Model optimization
  • NLP & LLM Experience: Working experience with transformer models and LLM APIs, Expertise in prompt engineering, embeddings, and vector databases, Familiarity with tools/technologies such as OpenAI / Claude / Llama, FAISS, Pinecone, Weaviate, Chroma
  • MLOps & Deployment: Experience with Docker, Kubernetes, CI/CD for ML pipelines, Model monitoring and versioning, Hands-on with tools like MLflow, Kubeflow, Airflow
  • Cloud Platforms: Hands-on experience with at least one major cloud platform: AWS, Azure, GCP, Experience with cloud AI services such as SageMaker, Azure ML, or Vertex AI
Job Responsibility
Job Responsibility
  • AI/ML Model Development: Design, build, and optimize machine learning models for real-world use cases, Develop models for prediction, classification, recommendation, and NLP tasks, Build end-to-end ML pipelines for training, validation, and evaluation
  • Generative AI & LLM Development: Develop applications using Large Language Models (LLMs), Build and maintain RAG (Retrieval-Augmented Generation) pipelines, Implement prompt engineering, embeddings, vector search, and semantic retrieval, Fine-tune LLMs when required
  • Production Deployment: Deploy ML models into production using scalable microservices and APIs, Build inference pipelines with high performance and low latency, Implement monitoring, logging, and model performance optimization
  • Data Engineering Collaboration: Work with data engineering teams to build robust ML training and inference pipelines, Ensure clean, high-quality datasets and feature engineering workflows
  • AI Platform & Architecture: Design ML system architecture for high-volume, low-latency environments, Integrate AI/ML solutions into enterprise-grade applications
  • Research & Innovation: Stay updated on the latest advancements in AI, ML, and Generative AI, Evaluate and experiment with new frameworks, LLMs, and tools
  • Fulltime
Read More
Arrow Right

Senior Vehicle Motion Control AI/ML Platform Design Engineer

Location
Location
Canada , Markham
Salary
Salary:
115000.00 - 164600.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • M.S. with 5+ years or Ph.D. with 3+ years of experience in control systems development and/or technical leadership in vehicle controls or related dynamic systems
  • Deep theoretical and practical knowledge of chassis and vehicle dynamics
  • Proficiency in classical control methods such as PID, state feedback, and observers
  • Extensive experience with advanced control strategies, including adaptive control, MPC, learning-based MPC, ML/AI-based control strategies, and other modern control methods
  • Strong expertise in advanced state estimation, observer design, and sensor fusion for vehicle motion states
  • Hands-on experience with Kalman filter variants and with system identification and parameter estimation in vehicle dynamics applications
  • Demonstrated experience using Python for data analysis and applying machine learning or data-driven modeling to control, estimation, or vehicle dynamics problems
  • Ability to translate control, estimation, and learned components into robust, production-quality embedded software using C/C++, MATLAB/Simulink, and code generation workflows
  • Experience with vehicle dynamics simulation tools such as CarSim, CarMaker, or equivalent, and with MIL/SIL/HIL/DiL validation workflows
  • Proficiency with vehicle communication and development tools such as Vehicle SPY, INCA, and CANalyzer
Job Responsibility
Job Responsibility
  • Lead development of scalable, modular control and estimation strategies and software architectures for AI/ML-enabled vehicle motion control platforms
  • Design and implement advanced control algorithms, including state-space control, observers, estimators, optimal or robust control, model predictive control, and learning-enabled control approaches
  • Integrate AI/ML components such as learned models, estimators, and policies into real-time control loops while maintaining safety, stability, and interpretability
  • Drive end-to-end development and validation through model-based design, simulation, HIL, SIL, DiL, and in-vehicle testing and calibration
  • Lead high-value AI/ML applications in motion control and state estimation, including data-driven modeling, prediction, and learning-based estimation workflows
  • Define and improve data workflows for collection, curation, labeling, feature engineering, and validation using simulation, proving ground, and vehicle data
  • Contribute to process improvement, peer reviews, technical direction, and mentorship of other engineers
  • Create innovative technical solutions and help protect GM intellectual property through patents and publications
What we offer
What we offer
  • Paid time off including vacation days, holidays, and supplemental benefits for pregnancy, parental and adoption leave
  • Healthcare, dental and vision benefits including health care spending account and wellness incentive
  • Life insurance plans to cover you and your family
  • Company and matching contributions to a Defined Contribution Pension plan to help you save for retirement
  • GM Vehicle Purchase Plan for you, your family, and friends
  • Fulltime
Read More
Arrow Right

Senior Engineer / Lead Engineer - Virtual Engineering - AI ML

Sponsorship:  GM DOES NOT PROVIDE IMMIGRATION-RELATED SPONSORSHIP FOR THIS ROLE....
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Masters Degree Mechanical/Automobile/Production /Mechatronics Engineering discipline or similar
  • 5+ years in Automotive Manufacturing / Manufacturing Engineering Experience
  • 1+ year experience in implementing AI/ML solutions in Automotive use cases
  • Should have executed at least 2 end-to-end projects in the text or Image data domain (from problem definition to deployment)
  • Strong programming skills in Python
  • Proficiency with ML/DL frameworks like Scikit-learn, TensorFlow, PyTorch, XGBoost
  • Solid understanding of statistics, probability, and linear algebra
  • Experience in data preprocessing, feature engineering, ETL and Exploratory Data Analysis (EDA)
  • Experience with MLOps platforms (MLflow, Kubeflow, Vertex AI, Azure ML)
  • Knowledge of ML model evaluation
Job Responsibility
Job Responsibility
  • Collaborate with stakeholders to understand business problems in the in the Manufacturing Engineering and Operations space and solve them using ML methodologies
  • Design, develop, and fine-tune AI/ML models for classification, regression, clustering, and recommendation systems
  • Work with MLOps tools to automate workflows, CI/CD pipelines, and model monitoring
  • Evaluate, validate, and benchmark model performance using appropriate metrics
  • Deploy AI models into production environments in collaboration with IT/AI teams
  • Establish monitoring and maintenance processes to ensure model accuracy over time
  • Ensure that all AI solutions comply with organizational data security, confidentiality, and regulatory requirements
  • Document workflows, results, and lessons learned for organizational knowledge sharing
  • Stay updated on advancements in ML model evaluation, ML frameworks, end-to-end ML pipelines
  • Fulltime
Read More
Arrow Right

Senior Engineer / Lead Engineer - Virtual Engineering - AI CAE

Senior Engineer / Lead Engineer – AI CAE will Drive AI innovation in CAE analysi...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Masters Degree Mechanical/Automobile/Production /Mechatronics Engineering discipline or similar.
  • 5+ years experience in CAE at Automotive Product Development / Manufacturing Engineering.
  • 3+ years' experience in implementing AI solutions in CAE
  • Should have executed at least 5+ years of Core CAE domain (from problem definition to deployment) experience.
  • Strong programming skills in Python, MATLAB, CAE tool-specific APIs (Altair suite, NASTRAN, ANSYS APDL, Abaqus etc.), workflow automation.
  • Experience with ML frameworks like Pytorch, TensorFlow.
  • Understanding of data annotation tools and MLOps workflows.
  • Experience in data handling and feature engineering.
  • Strong problem-solving and analytical mindset.
  • Experience in domain-specific AI use cases (manufacturing, automotive, etc.).
Job Responsibility
Job Responsibility
  • Collaborate with stakeholders to understand business problems in the CAE domain and translate them into AI solutions.
  • Design, develop, and fine-tune AI/ML models for Simulation result prediction and design optimization, Automating repetitive CAE tasks (meshing, boundary conditions, post-processing).
  • Evaluate, validate, and benchmark model performance using appropriate metrics.
  • Deploy AI models into production environments in collaboration with IT/AI teams.
  • Establish monitoring and maintenance processes to ensure model accuracy over time.
  • Ensure that all AI solutions comply with organizational data security, confidentiality, and regulatory requirements.
  • Document workflows, results, and lessons learned for organisational knowledge sharing.
  • Stay updated on advancements in neural networks, multi-physics simulations, surrogate modelling and physics-informed learning techniques.
  • Fulltime
Read More
Arrow Right

Senior Engineer / Lead Engineer - Virtual Engineering - AI CFD

Senior Engineer / Lead Engineer – AI CFD will Drive AI innovation in CFD domain....
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Masters Degree Mechanical/Automobile/Production /Mechatronics Engineering discipline or similar
  • 5+ years experience in CFD at Automotive Product Development / Manufacturing Engineering
  • 2+ years experience in implementing AI solutions in CFD
  • Should have executed at least 5+ years of Core CFD domain (from problem definition to deployment) experience
  • Strong programming skills in Python and C++ for automation and solver integration
  • Experience with ML frameworks like Pytorch, TensorFlow
  • Knowledge of surrogate modeling, reduced-order modeling (ROMs), and regression techniques
  • Experience in data handling (large-scale CFD datasets) and feature engineering(feature extraction from flow fields like velocity, pressure, turbulence quantities)
  • Strong problem-solving and analytical mindset
  • Understanding of data annotation tools and MLOps workflows
Job Responsibility
Job Responsibility
  • Collaborate with stakeholders to understand business challenges in the CFD space and solve them using API based customization and AI methodologies
  • Collect, clean, annotate, and prepare datasets for text analysis and image comparison tasks
  • Design, develop, and fine-tune AI/ML models for: Automating mesh generation, solver setup, and post-processing of CFD results
  • Building optimization pipelines for thermal and fluid systems using AI-assisted approaches
  • Evaluate, validate, and benchmark model performance using appropriate metrics
  • Deploy AI models into production environments in collaboration with IT/AI teams
  • Establish monitoring and maintenance processes to ensure model accuracy over time
  • Ensure that all AI solutions comply with organizational data security, confidentiality, and regulatory requirements
  • Document workflows, results, and lessons learned for organizational knowledge sharing
  • Stay updated on advancements in neural networks, multi-physics simulations, surrogate modelling and physics-informed learning techniques
  • Fulltime
Read More
Arrow Right

Staff II Software Engineer AI/ML Ops

We're looking for a Lead Data Engineer to design, build, and optimize data pipel...
Location
Location
United States , Pleasanton
Salary
Salary:
245000.00 - 307000.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
  • Proficiency in containerization technologies (e.g., Docker, Kubernetes)
  • Proficient in scripting languages (e.g., Bash, python) for automation
  • Experience with workflow orchestration tools (e.g., Apache Airflow)
Job Responsibility
Job Responsibility
  • Lead data pipeline development: Build and maintain PySpark ETL pipelines with high data quality and performance
  • Manage integrations: Establish robust connections to client data sources via APIs and tools like FiveTran, Plaid, and BlackLine's own internal connector ecosystem
  • Ensure reliability: Monitor pipeline performance, automate testing, and validate data accuracy
  • Optimize for scale: Implement performance improvements (e.g., CDC mechanisms, indexing strategies) for large-scale datasets
  • Collaborate & innovate: Work with business stakeholders to refine data requirements and integrate cutting-edge AI and big data technologies
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
What we offer
What we offer
  • Short-term and long-term incentive programs
  • Robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Senior AI/ML Developer

A growing mid-market organization is expanding its AI capabilities and is seekin...
Location
Location
United States , Atlanta, GA
Salary
Salary:
Not provided
tier4group.com Logo
Tier4 Group
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in AI engineering, machine learning, or software development
  • Strong programming skills in Python
  • Experience working with modern ML frameworks
  • Experience deploying models into production environments
  • Experience building data pipelines and API-based integrations
  • Ability to work in environments where requirements evolve quickly
Job Responsibility
Job Responsibility
  • Maintain and optimize existing ML and predictive models
  • Improve model retraining, validation, and monitoring
  • Strengthen modeling pipelines to ensure reliability and repeatability
  • Help design and implement real-world GenAI applications, including Retrieval-Augmented Generation (RAG), Prompt orchestration and evaluation, Monitoring and tuning LLM outputs
  • Integrate vendor-delivered GenAI capabilities into the organization’s data ecosystem
  • Building and maintaining data and AI pipelines
  • Integrating models with enterprise systems and APIs
  • Supporting CI/CD processes and model versioning
  • Ensuring solutions can scale in production environments
  • Partner with teams such as Actuarial and Underwriting
What we offer
What we offer
  • Base salary plus yearly bonus and long-term incentive plan
  • Fulltime
Read More
Arrow Right