Product Development Engineer - AI Systems Job at AMD (Austin)

AI Systems Engineer – AI Model (Training & Inference)

The AMD AI Group is looking for a Senior Software Development Engineer to own th...

Location

Canada , Markham

Salary:

106400.00 - 159600.00 CAD / Year

AMD

Expiration Date

Until further notice

Requirements

Industry experience shipping production AI/ML infrastructure, with hands-on work spanning both training and inference.
Bachelor’s or Master’s degree or Ph.D in Computer/Software Engineering, Computer Science, or related technical discipline

Job Responsibility

Enable and optimize large-scale model training (LLMs, VLMs, MoE architectures) on AMD Instinct GPU clusters, ensuring correctness, reproducibility, and competitive throughput.
Build and maintain training infrastructure: job orchestration, distributed checkpointing, data loading pipelines, and storage optimization for multi-thousand GPU clusters on Kubernetes.
Debug and resolve training-specific issues including gradient norm explosions, non-deterministic behavior across GPU generations, and compute-communication overlap in distributed training (FSDP, DeepSpeed, Megatron-LM).
Optimize RCCL collective communication patterns for training workloads, including all-reduce, all-gather, and reduce-scatter across multi-node topologies.
Develop monitoring, alerting, and compliance infrastructure to ensure training cluster health, data security, and SLA adherence at scale.
Design and build end-to-end validation and testing infrastructure using proxy workloads, synthetic benchmarks, and configurable workload generators to systematically validate platform readiness across AMD Instinct GPU generations.
Write and optimize high-performance GPU kernels (GEMM, attention, quantized matmul, GPTQ/AWQ) in HIP, Triton, and MLIR targeting AMD Instinct architectures, with demonstrated ability to outperform open-source baselines.
Drive end-to-end inference enablement on new AMD GPU silicon - be among the first to get frontier models running on each new Instinct generation, creating reproducible guides and reference implementations.
Optimize inference serving frameworks (vLLM, SGLang, TorchServe) for AMD GPUs: batching strategies, KV-cache management, speculative decoding, and continuous batching for production throughput/latency targets.
Develop novel approaches to inference acceleration, including bio-inspired algorithms, SLM-assisted batching, and custom scheduling strategies that exploit AMD hardware characteristics.

Fulltime

Staff AI Engineer - Agentic AI Systems

As a Staff AI Engineer, you will play a key role in designing and delivering hig...

Location

India , Bengaluru, Karnataka, India | Hyderabad, Telangana, India | Pune, Maharashtra, India

Salary:

Not provided

Teradata

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science or equivalent from a recognized institution
8+ years of experience in backend services, distributed systems, or data platform development
Strong proficiency in Java, Go, or Python for service development
Deep understanding of design principles, distributed system patterns, and service architecture
Hands-on experience designing and developing RESTful APIs
Experience with SQL and NoSQL databases and data modelling
Strong debugging, problem solving, and troubleshooting skills
Experience with modern containerization and orchestration tools such as Kubernetes
Knowledge of public cloud platforms
Experience with AI productivity tools (e.g., GitHub Copilot)

Job Responsibility

Design, architect, develop, and maintain high quality systems, services, and applications with an emphasis on scalability, reliability, and performance
Collaborate with cross-functional engineers and product partners to shape architecture and consistently deliver end to end features
Build and integrate robust RESTful APIs, ensuring security, data consistency, and maintainability
Work with SQL and NoSQL databases to implement efficient data models and service access patterns
Apply and experiment with AI/ML technologies, including agentic AI and large language models (LLMs)
Use AI powered engineering tools to improve development quality, speed, and productivity
Mentor engineers, supporting them in technical planning, implementation, and best practices
Identify and resolve system performance bottlenecks, optimizing code, architecture, and infrastructure
Write unit and integration tests and participate in code reviews to uphold engineering excellence
Investigate production issues, ensuring timely and effective solutions

Fulltime

Full Stack Engineer (AI & Agentic AI Systems)

The Full Stack Engineer (AI & Agentic AI Systems) is a strategic professional wh...

Location

India , Pune; Chennai

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

2-5 years in an Apps Development role
Demonstrated execution capabilities
Strong analytical and quantitative skills
Data driven and results-oriented
Experience in running high traffic, distributed, cloud based services
Experience in affecting large culture change
Experience leading infrastructure programs
Skilled at working with third party service providers
Excellent written and oral communication skills
Bachelor’s/University degree or equivalent experience

Job Responsibility

Design and deliver end‑to‑end solutions spanning architecture, system design, low‑level design, and high‑quality coding across modern full‑stack environments
Build responsive, modular UI applications using React, integrating complex AI-driven workflows and real‑time interactions
Develop scalable, high‑performance backend services in Java / Python, implementing resilient APIs, event‑driven patterns, and microservices architectures
Engineer AI‑powered features leveraging Google Gemini LLM, Vertex AI, ADK, vector databases (A2A), RAG pipelines, MCP, context engineering, and advanced prompt engineering techniques
Implement secure, well‑structured REST and GraphQL APIs, ensuring reliability, versioning discipline, and clean integration patterns across platforms
Optimize system performance and scalability, applying profiling, load‑testing insights, caching strategies, and distributed system tuning
Drive robust CI/CD practices, integrating automated testing, code quality gates, containerization, and cloud‑native deployment pipelines
Partner with QE to build and maintain automated test suites (UI, API, integration, and performance), improving release quality and reducing regression risk
Identify, diagnose, and remediate performance bottlenecks, penetration testing vulnerabilities, and production issues with precision and root‑cause clarity
Collaborate cross‑functionally with AI scientists, architects, and product teams to translate business challenges into production‑ready, intelligent agentic systems

Fulltime

Full Stack Engineer (AI & Agentic AI Systems)

The Full Stack Engineer (AI & Agentic AI Systems) is a strategic professional wh...

Location

India , Pune; Chennai

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

8+ years in a product development/product management environment
Strong analytical and quantitative skills
Data driven and results-oriented
Experience delivering with an agile methodology
Experience in affecting large culture change
Experience leading infrastructure programs
Skilled at working with third party service providers
Excellent written and oral communication skills
Bachelor’s/University degree or equivalent experience
Strong expertise in SQL (Oracle, PostgreSQL)

Job Responsibility

Design and deliver end‑to‑end solutions spanning architecture, system design, low‑level design, and high‑quality coding across modern full‑stack environments
Build responsive, modular UI applications using React, integrating complex AI-driven workflows and real‑time interactions
Develop scalable, high‑performance backend services in Java / Python, implementing resilient APIs, event‑driven patterns, and microservices architectures
Engineer AI‑powered features leveraging Google Gemini LLM, Vertex AI, ADK, vector databases (A2A), RAG pipelines, MCP, context engineering, and advanced prompt engineering techniques
Implement secure, well‑structured REST and GraphQL APIs, ensuring reliability, versioning discipline, and clean integration patterns across platforms
Optimize system performance and scalability, applying profiling, load‑testing insights, caching strategies, and distributed system tuning
Drive robust CI/CD practices, integrating automated testing, code quality gates, containerization, and cloud‑native deployment pipelines
Partner with QE to build and maintain automated test suites (UI, API, integration, and performance), improving release quality and reducing regression risk
Identify, diagnose, and remediate performance bottlenecks, penetration testing vulnerabilities, and production issues with precision and root‑cause clarity
Collaborate cross‑functionally with AI scientists, architects, and product teams to translate business challenges into production‑ready, intelligent agentic systems

Fulltime

HPC & AI Systems Engineer for Integrated Systems Test

HPC & AI Systems Engineer for Integrated Systems Test role at Hewlett Packard En...

Location

Puerto Rico , Aguadilla

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

Bachelor's or master's degree in Computer Engineering, Computer Science, Electrical Engineering, Information Systems, or equivalent
Minimum 4 years of experience
Experience with certification & submission to OS vendors of Linux (RedHat, SLES, Ubuntu, etc.), Windows Server operating systems, Windows Client operating systems, and VMWare (ESXi)
Experience installing and working with Linux, Windows and VMWare OSes
Experience in programming or scripting languages, Python, PowerShell, Perl, Linux Shell, Java, MySQL, MS SQL Server
Understanding of Redfish commands, RESTful API, and JSON format
Knowledge of creating and using Docker containers and VMs
Experience in configuring Storage (internal/external storage, file systems, and raid/non-raid settings) and Networking devices (iSCSI, FCoE, IPs, VLANs, Bonding, Jumbo Frames, LAGs)
Knowledge of networking concepts such as NIC teaming, VLANs, IPv4, IPv6
Excellent written and verbal communication skills in English

Job Responsibility

Work with Program & Product Management, technical leads, and product development teams to obtain product feature requirements
Design and implement new test features in existing and new test cases
Analyze, debug and provide feedback/resolution on issues uncovered by test team prior to submission of results to OS vendors for approval
Implement software solutions for multiple test programs/projects with internal and outsourced development partners
Review and evaluate the implementation and use of test automation and test tools
Planning, development, and implementation of software tools for the testing and evaluation of current and next-generation HPE HPC products
Debug and analyze issues to a successful resolution
Perform testing in local and remote labs
Drive appropriate automated test execution to test engineers at various global locations
Provide training and guidance to test teams both onshore and offshore

What we offer

Health & Wellbeing benefits
Personal & Professional Development programs
Unconditional Inclusion environment
Comprehensive suite of benefits that supports physical, financial and emotional wellbeing

Fulltime

Ai Systems Engineer

We are hiring founding AI Systems Engineers to help build that machinery. This r...

Location

Canada , Kitchener

Salary:

111000.00 - 133500.00 CAD / Year

Dialpad

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Engineering, or equivalent related experience
2 to 6 years of professional software engineering experience, with a proven track record of shipping production infrastructure or real systems that matter
Experience in writing solid, maintainable production code and applying strong software engineering fundamentals to solve complex debugging challenges
Experience in operating within ambiguous, cross-functional environments where requirements evolve and interfaces are real
Expertise in building for reproducibility, operability, and rollout safety, focusing on the quality of change rather than just local implementation

Job Responsibility

Help design, build, and improve the systems that connect AI capability development to production reality
Improving how model and capability artifacts are packaged, versioned, promoted, and rolled back
Building or improving deployment and release pathways for AI-backed services
Enabling shadow-serving, staged rollout, and candidate-versus-incumbent comparison
Strengthening runtime behavior, observability, and debugging for model-backed systems
Building or automating evaluation systems that make release decisions evidence-based
Reducing bespoke coordination and strengthening the shared rails used by multiple AI teams

What we offer

Competitive salary
Comprehensive benefits
Real opportunities for growth
Cutting-edge AI tools
Robust training program
Inclusive office environment
Great Place to Work culture

Fulltime

Software Product Engineer, AI Platforms

Carex is partnering with a FinTech industry partner to hire a Software Product E...

Location

United States , Madison

Salary:

Not provided

Carex Consulting Group

Expiration Date

Until further notice

Requirements

Strong experience building modern web applications with React and TypeScript
Full-stack development experience with modern APIs and backend services
A builder mindset with a passion for creating, experimenting, and solving problems
Demonstrated ownership and initiative through professional projects, startups, side projects, GitHub activity, or other hands-on work
Ability to thrive in a fast-paced, high-expectation startup environment
Strong communication skills and product-oriented thinking
Adaptability, curiosity, and enthusiasm for learning quickly

Job Responsibility

Own and drive development for key product areas within the platform
Build full-stack features using TypeScript, React/Next.js, Node.js, PostgreSQL, and tRPC
Collaborate closely with engineering, product, and design teams in rapid iteration cycles
Help evolve and scale platform architecture as products and systems grow
Partner directly with users to understand workflows and improve customer experiences
Contribute to AI-powered workflow automation and intelligent tooling initiatives
Help elevate engineering standards through strong ownership, communication, and execution
Operate with high autonomy while maintaining transparency and accountability
Support future growth initiatives and potentially expand into people leadership responsibilities

Fulltime

Staff AI Product Engineer

Staff AI Product Engineer (Prague/Brno/Remote EU) - Product Engineering at Produ...

Location

Czech Republic , Prague

Salary:

Not provided

ProductBoard

Expiration Date

Until further notice

Requirements

6+ years of professional software engineering experience, with a proven track record of shipping scalable production systems
Hands-on experience with LLMs in real products — including prompt design, context management, evaluation, and understanding real-world limitations (hallucinations, latency, cost, reliability)
Strong backend development skills in Python, Kotlin, or Java, with experience designing and evolving service-level logic and infrastructure
Proven ability to own features end-to-end across backend and frontend systems
You think like a builder — ideally a former founding engineer or founder — with a strong instinct for shipping what truly moves the product and the company forward
You’re drawn to agent-native architecture and believe this is where software is heading. You’re excited to rethink codebases, APIs, and documentation so AI agents can reliably operate and scale
You use AI tools daily (e.g., Cursor, Claude Code) and treat them as a core part of your engineering workflow — constantly experimenting, refining prompts, and pushing them to meaningfully increase your speed and output
Comfortable working in distributed systems and event-driven architectures (e.g., queues, async processing, service-to-service communication)
Extra credit: Experience building agentic systems is highly valued. Alternatively, deep hands-on work with advanced LLM workflows (tool use, multi-step reasoning, memory, orchestration) and a clear understanding of their trade-offs — along with the drive and technical maturity to quickly evolve toward agentic architectures

Job Responsibility

Build AI-first product features
Design and implement agent workflows that go beyond chat: multi-step reasoning, tool use, autonomous task execution, and human-gated checkpoints
Make our codebase AI-ready: define clear module boundaries, improve API contracts, add semantic context, and build the structured documentation that makes AI agents more effective across every repo
Optionally join a team building LLM-powered workflows that process and categorize unstructured feedback from multiple sources, turning it into actionable user insights
Drive discovery and experimentation across different domains by building MVPs and POCs, discussing findings with engineers and product teams, and shaping execution plans
Collaborate closely with product managers and designers to shape what we build, not just how we build it
Act as a knowledge multiplier, sharing what you learn across and beyond your team to raise the bar for everyone

What we offer

Stock options
MacBook + 34″ monitor
Budget for online courses, books, and conferences
5 weeks of vacation + 9 sick days
Volunteer Days
Carrot Fertility Benefits
Free snacks, drinks, and yummy catered lunches
MultiSport card to access sports facilities
Flexible working hours and home office
Parental benefits

Fulltime

Select Country

Product Development Engineer - AI Systems

Job Description

Job Responsibility

Requirements

Looking for more opportunities?