CrawlJobs Logo

Senior AI/ML Validation Engineer

amd.com Logo

AMD

Location Icon

Location:
India , Hyderabad

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking an experienced and versatile professional with expertise in validation strategy, automation, and quality for AI/ML model serving, GPU software stacks, device drivers, firmware, and cross-platform systems (Linux/Windows). You will build test frameworks, drive CI quality gates, perform performance and reliability testing, and lead cross-stack triage to ensure robust releases in a rapidly evolving environment.

Job Responsibility:

  • Own end-to-end test strategy for AI/ML workflows (PyTorch, vLLM), GPU runtimes, drivers, and firmware across kernel and user space
  • Develop scalable automation frameworks spanning unit, integration, HIL (hardware-in-the-loop), system, and end-to-end tests
  • Implement and maintain CI quality gates (GitHub Actions/Workflows, Jenkins), including automated build, test execution, artifact management, reporting, and flake reduction
  • Design and execute performance, stress, reliability, soak, and long-haul tests targeting GPU compute, memory, I/O, and serving throughput/latency
  • Validate cross-platform compatibility (Linux/Windows), covering driver interfaces, kernel interactions, firmware behavior, and runtime stability
  • Create reproducible environments with containers/orchestration
  • instrument telemetry and observability for data-driven QA
  • Apply agentic AI techniques to accelerate test generation, triage, and root cause analysis
  • integrate intelligent diagnostics into pipelines
  • Develop rigorous test cases for low-level features (PCIe, DMA, interrupts, memory management), error handling, recovery, and fault injection
  • Define and track quality KPIs (coverage, defect escape rate, MTTR, performance regressions) and drive continuous improvement
  • Lead defect triage across hardware, firmware, driver, runtime, and model layers
  • collaborate with engineering to resolve issues rapidly
  • Produce comprehensive documentation: test plans, procedures, fixtures, coverage maps, readiness criteria, and retrospectives

Requirements:

  • 8–12 years in QA/Test for systems software or platform engineering, with at least 4 years focused on GPU software, device drivers, or firmware validation
  • Demonstrable ownership of validation for AI/ML pipelines and serving stacks using PyTorch and at least one modern inference framework (e.g., vLLM), including accuracy baselining and performance regression detection
  • Proven expertise testing drivers and firmware with hands-on work in: PCIe fundamentals (link training, BARs, MSI/MSI-X), DMA engines, interrupt handling, and memory models
  • Failure modes: error injection, recovery paths, power/thermal events, and persistence across reboot/upgrade cycles
  • Deep proficiency in Linux (kernel/user space) and practical experience with Windows driver ecosystems
  • ability to: Read kernel logs and symbols, trace with ftrace/perf/ETW, and perform cross-layer debugging
  • Build custom kernels/modules and analyze crash dumps (kdump, WinDbg)
  • Strong programming for test automation: Python for framework and orchestration (pytest or equivalent), robust mocking/fixtures, and data-driven test generation
  • C/C++ for low-level test harnesses, protocol exercisers, and performance micro-benchmarks
  • Bash/PowerShell for environment setup, CI scripting, and reproducibility
  • CI/CD mastery with GitHub Actions/Workflows and/or Jenkins: Design gated pipelines with parallelization, artifact management, flaky test quarantine, and automated rollback criteria
  • Integrate metrics, alerts, and quality reports
  • enforce go/no-go release thresholds
  • Performance testing rigor: Methodology for baselining, variance control, and noise isolation
  • application of statistical techniques (e.g., confidence intervals, A/B comparisons) to detect regressions
  • GPU-focused profiling and analysis (e.g., perf counters, memory bandwidth, kernel occupancy)
  • Tooling fluency: gdb, perf, ftrace, valgrind, WinDbg, ETW
  • log/trace correlation
  • containerized test environments (Docker) and familiarity with Kubernetes for distributed tests
  • Exploratory testing mindset: Hypothesis-driven investigation, boundary and adversarial testing, fuzzing (protocol/API), chaos/fault injection, and reverse-engineering of interfaces when documentation is limited
  • Communication and leadership: Clear, concise defect reporting
  • ability to drive triage across teams
  • establish and evangelize quality standards
  • maintain strong documentation discipline
  • BS/MS in Computer Science/Computer Engineering, or related discipline

Nice to have:

  • Lab ops for QA: rack mounting, server configuration, BMC/IPMI, BIOS/fw updates, network/storage setup, power/thermal profiling
  • Front-end/UI testing experience for internal tools: ReactJS, web UI automation, accessibility and usability checks
  • Backend/DB validation: REST/gRPC testing, SQL/NoSQL, schema migrations, data integrity, performance tuning
  • Observability: Prometheus/Grafana, OpenTelemetry
  • integrating quality signals and alerts into CI/CD and release gates
What we offer:

AMD benefits at a glance

Additional Information:

Job Posted:
February 13, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior AI/ML Validation Engineer

Model Validation Senior Analyst

This position is part of the Artificial Intelligence (AI) Review and Challenge G...
Location
Location
India , Mumbai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3-6 years' experience in related field
  • Advanced degree (Master's and above) required in mathematics, statistics, computer science, engineering, data science, AI/ML
  • Experience/familiarity with AI/ML applications in cybersecurity, chatbot, natural language processing, image/voice recognition, robotic process automation
  • In-depth technical knowledge of common AI/ML techniques
  • Strong understanding of risks associated with AI/ML and corresponding mitigants
  • Ability to collaborate with peers and stakeholders with various background
  • Ability to effectively explain technical terms to audience with different levels of technical knowledge
  • Self-motivated and detail oriented
  • Proficiency in programs such as R and Python and common AI/ML packages
Job Responsibility
Job Responsibility
  • Provide independent review and effective challenge on the soundness and fit-for-purpose of AI/ML non-model objects used in Citi
  • Manage AI/ML risk across all life-cycle activities including initial review, ongoing monitoring, and periodic reviews
  • Conduct analysis and prepare detailed technical documentation reports sufficient to meet regulatory guidelines and exceed industry standards
  • Identify weaknesses and limitations of AI/ML objects and inform stakeholders of their risk profile and recommend compensating controls
  • Communicate results to diverse audiences such as AI/ML object owners and developers and senior management
  • Manage stakeholder interactions with AI/ML object developers and owners across the review lifecycle
  • Provide guidance to junior reviewers as and when necessary
  • Contribute to strategic, cross-functional initiatives within the model risk management organization
  • Appropriately assess risk when business decisions are made
What we offer
What we offer
  • Access to telehealth options, health advocates, confidential counseling
  • Expanded Paid Parental Leave Policy
  • Programs to manage financial well-being and help plan for future
  • Access to learning and development resources
  • Generous paid time off packages
  • Resources and tools to volunteer in communities
  • Fulltime
Read More
Arrow Right

Senior Software Engineer in Test

Axon is on a mission to protect life. As part of that mission, Axon Assistant is...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in test automation, software engineering, or SDET roles
  • Advanced proficiency in Python, Java, C#, JavaScript, or Go
  • Strong experience building and scaling test automation frameworks and developer-focused tools
  • Deep understanding of distributed systems, API testing, and CI/CD pipelines
  • Hands-on experience testing AI/ML-powered systems, real-time services, or multi-modal UIs
  • Track record of owning quality strategy and delivery for mission-critical software in production
Job Responsibility
Job Responsibility
  • Architect and implement automation frameworks, test strategies, and quality infrastructure across web, mobile, and on-device platforms
  • Design scalable validation systems for real-time voice interaction, AI/LLM-driven features, and distributed cloud services
  • Partner with engineers to shape code for testability and embed quality early in the development process
  • Lead cross-functional quality initiatives to improve CI/CD pipelines, observability, and release readiness
  • Drive performance, load, and resilience testing, especially for latency-sensitive, real-time systems
  • Mentor other SDETs and developers in automation strategy, debugging, and risk mitigation
  • Own root cause analysis for complex, system-level issues — using telemetry, tracing, and logs
  • Contribute to documentation of tools, architecture, and best practices that scale across teams
What we offer
What we offer
  • Competitive base salary and RSUs
  • Comprehensive pension plan with matching contribution
  • Private health insurance & cash plans
  • 30 days paid holiday + UK public holidays
  • Enhanced maternity/paternity leave
  • GymPass subscription
  • Life assurance & income protection
  • Career growth support and wellness resources
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

The Data Engineer is responsible for designing, building, and maintaining robust...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
ibvogt.com Logo
ib vogt GmbH
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Degree in Computer Science, Data Engineering, or related field
  • 5+ years of experience in data engineering or similar roles
  • experience in renewable energy, engineering, or asset-heavy industries is a plus
  • Strong experience with modern data stack (e.g., PowerPlatform, Azure Data Factory, Databricks, Airflow, dbt, Synapse, Snowflake, BigQuery, etc.)
  • Proficiency in Python and SQL for data transformation and automation
  • Experience with APIs, message queues (Kafka, Event Hub), data streaming and knowledge of data lakehouse and data warehouse architectures
  • Familiarity with CI/CD pipelines, DevOps practices, and containerization (Docker, Kubernetes)
  • Understanding of cloud environments (preferably Microsoft Azure, PowerPlatform)
  • Strong analytical mindset and problem-solving attitude paired with a structured, detail-oriented, and documentation-driven work style
  • Team-oriented approach and excellent communication skills in English
Job Responsibility
Job Responsibility
  • Design, implement, and maintain efficient ETL/ELT data pipelines connecting internal systems (M365, Sharepoint, ERP, CRM, SCADA, O&M, etc.) and external data sources
  • Integrate structured and unstructured data from multiple sources into the central data lake / warehouse / Dataverse
  • Build data models and transformation workflows to support analytics, reporting, and AI/ML use cases
  • Implement data quality checks, validation rules, and metadata management according to the company’s data governance framework
  • Automate workflows, optimize performance, and ensure scalability of data pipelines and processing infrastructure
  • Work closely with Data Scientists, Software Engineers, and Domain Experts to deliver reliable datasets for Digital Twin and AI applications
  • Maintain clear documentation of data flows, schemas, and operational processes
What we offer
What we offer
  • Competitive remuneration and motivating benefits
  • Opportunity to shape the data foundation of ib vogt’s digital transformation journey
  • Work on cutting-edge data platforms supporting real-world renewable energy assets
  • A truly international working environment with colleagues from all over the world
  • An open-minded, collaborative, dynamic, and highly motivated team
  • Fulltime
Read More
Arrow Right

Senior Engineer, Sdet

The Sr Software Development Engineer in Test (SDET) is responsible for developin...
Location
Location
United States , Bellevue
Salary
Salary:
Not provided
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Software Engineering, Computer Science or a related field with 5 years of relevant work experience
  • proficiency in 3 or more development languages such as C#, Java, JavaScript
  • proficiency in 3 or more automation tools such as Selenium, TestNG, Postman
  • experience with Test Driven Development and Behavior Driven Development methodologies
  • experience using CA Service Virtualization or equivalent
  • proficiency in agile project management systems
  • experience in designing and developing applications for Unix or Windows environments, mobile platforms, or multi-tiered applications
  • knowledge of AI and machine learning concepts including model evaluation and validation techniques
  • knowledge of version control systems like Git
  • strong problem-solving and analytical skills.
Job Responsibility
Job Responsibility
  • Design and implement robust testing strategies for Generative AI solutions
  • test functionality, performance accuracy, and production sampling
  • collaborate with data scientists and ML engineers
  • develop and maintain automated test suites for AI/ML pipelines and APIs
  • validate AI generative solutions for bias, fairness, and ethical compliance
  • provide internal training on Continuous Testing
  • contribute to the evolution of testing practices and quality standards
  • lead defect management and test strategies
  • clarify test requirements and provide estimates for tasks.
What we offer
What we offer
  • Medical, dental, and vision insurance
  • flexible spending account
  • 401(k) plan
  • employee stock grants
  • employee stock purchase plan
  • paid parental and family leave
  • family building benefits
  • childcare subsidy
  • short-term and long-term disability insurance
  • life and accident insurance
  • Fulltime
Read More
Arrow Right

Senior ML Data Engineer

As a Senior Data Engineer, you will play a pivotal role in our AI/ML workstream,...
Location
Location
Poland , Warsaw
Salary
Salary:
Not provided
awin.com Logo
Awin Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor or Master’s degree in data science, data engineering, Computer Science with focus on math and statistics / Master’s degree is preferred
  • At least 5 years experience as AI/ML data engineer undertaking above task and accountabilities
  • Strong foundation in computer science principes and statistical methods
  • Strong experience with cloud technology (AWS or Azure)
  • Strong experience with creation of data ingestion pipeline and ET process
  • Strong knowledge of big data tool such as Spark, Databricks and Python
  • Strong understanding of common machine learning techniques and frameworks (e.g. mlflow)
  • Strong knowledge of Natural language processing (NPL) concepts
  • Strong knowledge of scrum practices and agile mindset
  • Strong Analytical and Problem-Solving Skills with attention to data quality and accuracy
Job Responsibility
Job Responsibility
  • Design and maintain scalable data pipelines and storage systems for both agentic and traditional ML workloads
  • Productionise LLM- and agent-based workflows, ensuring reliability, observability, and performance
  • Build and maintain feature stores, vector/embedding stores, and core data assets for ML
  • Develop and manage end-to-end traditional ML pipelines: data prep, training, validation, deployment, and monitoring
  • Implement data quality checks, drift detection, and automated retraining processes
  • Optimise cost, latency, and performance across all AI/ML infrastructure
  • Collaborate with data scientists and engineers to deliver production-ready ML and AI systems
  • Ensure AI/ML systems meet governance, security, and compliance requirements
  • Mentor teams and drive innovation across both agentic and classical ML engineering practices
  • Participate in team meetings and contribute to project planning and strategy discussions
What we offer
What we offer
  • Flexi-Week and Work-Life Balance: We prioritise your mental health and well-being, offering you a flexible four-day Flexi-Week at full pay and with no reduction to your annual holiday allowance. We also offer a variety of different paid special leaves as well as volunteer days
  • Remote Working Allowance: You will receive a monthly allowance to cover part of your running costs. In addition, we will support you in setting up your remote workspace appropriately
  • Pension: Awin offers access to an additional pension insurance to all employees in Germany
  • Flexi-Office: We offer an international culture and flexibility through our Flexi-Office and hybrid/remote work possibilities to work across Awin regions
  • Development: We’ve built our extensive training suite Awin Academy to cover a wide range of skills that nurture you professionally and personally, with trainings conveniently packaged together to support your overall development
  • Appreciation: Thank and reward colleagues by sending them a voucher through our peer-to-peer program
Read More
Arrow Right

Senior ML Data Engineer

As a Senior Data Engineer, you will play a pivotal role in our AI/ML workstream,...
Location
Location
Salary
Salary:
Not provided
awin.com Logo
Awin Global
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor or Master’s degree in data science, data engineering, Computer Science with focus on math and statistics / Master’s degree is preferred
  • At least 5 years experience as AI/ML data engineer undertaking above task and accountabilities
  • Strong foundation in computer science principes and statistical methods
  • Strong experience with cloud technology (AWS or Azure)
  • Strong experience with creation of data ingestion pipeline and ET process
  • Strong knowledge of big data tool such as Spark, Databricks and Python
  • Strong understanding of common machine learning techniques and frameworks (e.g. mlflow)
  • Strong knowledge of Natural language processing (NPL) concepts
  • Strong knowledge of scrum practices and agile mindset
Job Responsibility
Job Responsibility
  • Design and maintain scalable data pipelines and storage systems for both agentic and traditional ML workloads
  • Productionise LLM- and agent-based workflows, ensuring reliability, observability, and performance
  • Build and maintain feature stores, vector/embedding stores, and core data assets for ML
  • Develop and manage end-to-end traditional ML pipelines: data prep, training, validation, deployment, and monitoring
  • Implement data quality checks, drift detection, and automated retraining processes
  • Optimise cost, latency, and performance across all AI/ML infrastructure
  • Collaborate with data scientists and engineers to deliver production-ready ML and AI systems
  • Ensure AI/ML systems meet governance, security, and compliance requirements
  • Mentor teams and drive innovation across both agentic and classical ML engineering practices
  • Participate in team meetings and contribute to project planning and strategy discussions
What we offer
What we offer
  • Flexi-Week and Work-Life Balance: We prioritise your mental health and well-being, offering you a flexible four-day Flexi-Week at full pay and with no reduction to your annual holiday allowance. We also offer a variety of different paid special leaves as well as volunteer days
  • Remote Working Allowance: You will receive a monthly allowance to cover part of your running costs. In addition, we will support you in setting up your remote workspace appropriately
  • Pension: Awin offers access to an additional pension insurance to all employees in Germany
  • Flexi-Office: We offer an international culture and flexibility through our Flexi-Office and hybrid/remote work possibilities to work across Awin regions
  • Development: We’ve built our extensive training suite Awin Academy to cover a wide range of skills that nurture you professionally and personally, with trainings conveniently packaged together to support your overall development
  • Appreciation: Thank and reward colleagues by sending them a voucher through our peer-to-peer program
Read More
Arrow Right

Senior Software Engineer – AI

NStarX is seeking a highly skilled Senior Software Engineer – AI with a strong f...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
nstarxinc.com Logo
NStarX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field (PhD is a plus)
  • 9+ years of experience in AI/ML engineering or related roles
  • 3+ years of experience in Generative AI with team leadership responsibilities
  • Proven track record of production-grade ML and GenAI model development and deployment
  • Programming: Python (preferred)
  • GenAI Frameworks: Hugging Face Transformers, Diffusers, LangChain, TGI
  • Serving & Inference: FastAPI, gRPC, NVIDIA Triton, TorchServe
  • Cloud Platforms: AWS (SageMaker, EKS), GCP (Vertex AI, GKE), Azure (Azure ML, AKS)
  • MLOps & DevOps: Kubeflow, MLflow, GitHub Actions, Jenkins, Helm, Terraform
  • Optimization Techniques: Model quantization, distillation, pipeline and tensor parallelism
Job Responsibility
Job Responsibility
  • Design, develop, and deploy machine learning models and AI algorithms to address complex business challenges
  • Lead and mentor a team of AI/ML engineers, ensuring quality and scalability in solution design and implementation
  • Collaborate closely with cross-functional teams including data scientists, software engineers, product managers, and UX designers
  • Lead the development and deployment of Generative AI applications across text, code, image, and audio modalities using state-of-the-art LLMs
  • Design and implement CI/CD pipelines for the GenAI model lifecycle including training, validation, packaging, and deployment
  • Apply best practices for model performance tuning, cost optimization, and scalable deployment in cloud and hybrid environments
  • Develop prompt engineering, fine-tuning strategies (LoRA, QLoRA, PEFT), and evaluation protocols tailored to business use cases
  • Stay current with emerging trends in AI, ML, and Generative AI and drive adoption across teams
  • Document processes, model architectures, and deployment strategies for traceability and knowledge sharing
  • Work closely with cross-functional teams to gather requirements and deliver high-quality solutions
What we offer
What we offer
  • Competitive salary aligned with market standards
  • Opportunities for professional development and skill enhancement
  • A collaborative and innovative work environment
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Backend

As a Senior Software Engineer, Backend specializing in database architecture and...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 240000.00 USD / Year
chefrobotics.ai Logo
Chef Robotics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
  • 7+ years of professional experience in backend development roles with demonstrated leadership experience
  • Expert knowledge of relational databases (MySQL, PostgreSQL) including schema design, optimization, and administration
  • Strong proficiency with Python and JavaScript/TypeScript with advanced software engineering skills
  • Extensive experience leading projects with at least two web frameworks: Flask, FastAPI, Django, Node.js, or Next.js
  • Proven experience designing and implementing RESTful and GraphQL APIs at scale
  • Advanced understanding of containerization (Docker) and orchestration (Kubernetes) technologies
  • Experience with cloud infrastructure and deployment (AWS, GCP, or Azure) in production environments
  • Proven experience leading complex backend projects and mentoring junior engineers
  • Understanding of data requirements for robotics or automation systems
Job Responsibility
Job Responsibility
  • Lead the design, implementation, and optimization of database schemas to support robot operations, telemetry, recipe management, and system analytics
  • Develop robust data migration strategies and version control for database schema evolution
  • Implement efficient query optimization and indexing strategies to support high-throughput robot operations
  • Establish data integrity protocols and backup systems to ensure operational continuity across customer deployments
  • Create scalable data access layers that balance security, performance, and maintainability
  • Mentor team members on database design patterns and optimization techniques
  • Lead the development and maintenance of scalable APIs to serve robot control systems, dashboards, and monitoring tools
  • Design and implement secure authentication and authorization mechanisms across backend services
  • Develop robust middleware for processing and validating data between robotics subsystems
  • Create service interfaces that enable efficient communication between robotics components and cloud services
What we offer
What we offer
  • medical, dental, and vision insurance
  • commuter benefits
  • flexible paid time off (PTO)
  • catered lunch
  • 401(k) matching
  • early-stage equity
  • Fulltime
Read More
Arrow Right