CrawlJobs Logo

Senior AI Models GPU Deployment Software Engineer

India, Bangalore · Job Posted April 15, 2026
Apply Position
Job Link Share

Job Description

Join AMD and help bring cutting-edge AI models to life on AMD GPUs! We’re looking for someone excited about AI and high-performance computing. In this role, you’ll work with the latest hardware and software technologies to make AI models run faster and more efficiently. You’ll be part of a collaborative team that values learning and innovation.

Job Responsibility

  • Help run and improve AI models (like Chatbots, Vision, and MultiModal systems) on AMD GPUs
  • Work with popular AI tools like PyTorch and TensorFlow to make them faster on AMD GPUs
  • Collaborate with open-source communities to share improvements
  • Apply good coding practices to build reliable and efficient software

Requirements

  • Basic understanding of GPU computing (HIP, CUDA, or OpenCL is a plus)
  • Interest in computer architecture and how hardware works
  • Familiarity with AI concepts (Natural Language Processing, Vision, Audio, Recommendations)
  • Programming skills in C++, Python, or similar languages
  • Ability to debug and test your code
  • Bachelor’s degree in Computer Science, Computer Engineering, or a related field

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior AI Models GPU Deployment Software Engineer

8 matching positions

Senior Manager, Performance AI/ML Network Deployment Engineering

The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering i...
Location
Location
United States , Santa Clara
Salary
Salary:
210400.00 - 315600.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements
  • Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains, namely compute, network, storage
  • Experience in working with large customers such as Cloud Service Providers and global enterprise customers
  • Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc
  • Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
  • Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
  • Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends
  • Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista experience is preferred
  • Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market
  • Excellent communication level from engineer to mid-management to C-level of audience
Job Responsibility
Job Responsibility
  • Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models
  • Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability
  • Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads
  • Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations
  • Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins
  • Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences
Read More
Arrow Right

Senior Software Engineer, AI Platform and Enablement

We're building a next-generation AI-powered platform and web application for cre...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 286000.00 USD / Year
descript.com Logo
Descript
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Experience in deploying and managing AI models in production
  • Experience with the tools of large volume data pipelines like spark, flume, dask, etc.
  • Familiarity with cloud platforms (AWS, Google Cloud, Azure) and container technologies (Docker, Kubernetes)
  • Knowledge of DevOps and MLOps best practices
  • Strong problem-solving abilities and excellent communication skills
Job Responsibility
Job Responsibility
  • Build, maintain, and standardize third-party model integrations, including consulting for other engineering teams with AI model integration needs
  • Design, implement, and maintain our AI infrastructure supporting our machine learning life cycle, including data ingestion pipelines, training developer experience and infrastructure, evaluation frameworks, and deployments / GPU infrastructure
  • Collaborate with Product Managers, Research Engineers, and AI Researchers to understand their infrastructure needs and ensure our AI systems are robust, scalable, and efficient
  • Optimize and scale our models and algorithms for efficient inference
  • Deploy, monitor, and manage AI models in production
What we offer
What we offer
  • Generous healthcare package
  • 401k matching program
  • Catered lunches
  • Flexible vacation time
  • Fulltime
Read More
Arrow Right
New

Senior Principal AI Infrastructure Architect

The Senior Principal AI Infrastructure Architect is a highly skilled and advance...
Location
Location
Italy , Milano
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Significant experience in a consulting, presales or architecture role within a large-scale (preferably multi-national) technology services environment, with a track record of leading AI infrastructure pursuits
  • Demonstrable experience designing and delivering production AI platforms — from single multi-GPU servers through to multi-rack training clusters and inference factories
  • Strong working knowledge of the AI hardware vendor landscape (NVIDIA, AMD, Intel, Dell, HPE, Lenovo, Supermicro, Cisco, Pure, VAST, WEKA, DDN, NetApp) and how to position partner ecosystems competitively
  • Proven ability to translate AI workload requirements (model size, parameter count, sequence length, throughput SLOs, latency targets) into accurate hardware bills of materials and sizing justifications
  • Significant client engagement and consulting experience, including client needs assessment, change management and the ability to identify whitespace for follow-on AI infrastructure and managed-services work
  • Significant business development and presales experience on infrastructure-led deals, ideally including sovereign AI, AI Factory or regulated-industry GenAI programmes
  • Strong understanding of how AI infrastructure integrates with business processes, applications, data platforms and existing enterprise architecture
  • Bachelor's degree or equivalent in Information Technology, Engineering, Computer Science or a related field
  • Deep, hands-on knowledge of AI hardware: GPU and accelerator portfolios (NVIDIA Hopper / Blackwell, AMD MI300/MI325, Intel Gaudi 3, emerging custom silicon), host CPU platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), system topologies (HGX, DGX, MGX, OAM) and how each choice maps to specific AI workloads
  • Strong understanding of AI-class storage: parallel filesystems, all-flash NVMe platforms, S3-class object stores, checkpoint and dataset pipelines and the I/O patterns of large-scale training and inference (VAST, WEKA, DDN EXAScaler, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
Job Responsibility
Job Responsibility
  • Lead the end-to-end design of large, complex AI infrastructure solutions — covering accelerated compute (NVIDIA H100/H200/B200 and GB200 NVL72, AMD Instinct MI300X/MI325X, Intel Gaudi 3), CPU host platforms (Intel Xeon, AMD EPYC, NVIDIA Grace), high-throughput storage tiers and lossless AI fabric — for enterprise, sovereign AI and AI Factory clients
  • Architect reference designs built on NVIDIA DGX/HGX SuperPOD, Dell AI Factory with NVIDIA, Cisco Nexus HyperFabric AI, HPE / Lenovo / Supermicro accelerated compute and equivalent platforms, balancing single-node performance with cluster-scale efficiency
  • Size and validate GPU clusters against real workloads — foundation-model pre-training, distributed fine-tuning, RAG, real-time and batch inference — using the right combination of NVLink/NVSwitch domains, InfiniBand NDR/XDR or Ultra Ethernet / NVIDIA Spectrum-X fabrics and tiered NVMe and parallel storage (VAST, WEKA, DDN, Pure FlashBlade, NetApp ONTAP AI, Dell PowerScale)
  • Define the supporting datacenter design: high-density power (50–140 kW/rack), direct-to-chip and rear-door liquid cooling, structured cabling for AI fabrics and modular deployment models across on-prem, colo and sovereign-cloud footprints
  • Work closely with the sales team to drive the presales process for AI infrastructure pursuits — client discovery, technical workshops, proposal writing, executive presentations and bid defence
  • Translate clients' AI ambitions and business outcomes into a hardware and platform roadmap, positioning NTT DATA's end-to-end portfolio — silicon, systems, storage, fabric, MLOps stack and managed services — to land service-led AI solutions
  • Lead integration of compute, storage, networking, the AI software stack (CUDA, ROCm, Triton, NIM, NVIDIA AI Enterprise, Run:ai, Slurm, Kubernetes / Kubeflow) and managed-service operating models across multiple domains, delivery units and geographies
  • Build business cases, TCO and unit-economics models (cost per token, cost per training run, GPU-hour economics) and end-to-end transition roadmaps for cloud-to-private AI migrations and sovereign AI deployments
  • Define architectural principles for AI infrastructure — accelerator utilisation, data gravity, multi-tenancy, model lifecycle, energy efficiency — and apply them to influence architectural outcomes and governance
  • Develop As-Is, Vision, FMO and To-Be AI platform architectures, identify gaps and develop transition roadmaps
  • Fulltime
Read More
Arrow Right

Senior Software Engineer- AI

Are you looking for an opportunity to work with the latest Azure offerings and p...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in Software Development
  • Strong programming expertise in one or more languages such as Python, Go, Java, or C#, with experience designing production-grade services and APIs
  • Experience building AI-powered applications, including integrating LLMs, implementing agent or Copilot workflows, and orchestrating multi-step AI interactions
  • Hands-on experience with LLM application frameworks and orchestration tools such as Semantic Kernel, LangChain, or similar agent frameworks
  • Familiarity with retrieval-augmented generation (RAG) architectures, vector databases, embeddings, and semantic search systems
  • Experience evaluating and improving model performance through prompt design, evaluation frameworks, fine-tuning, or feedback loops
  • Solid understanding of distributed systems concepts including scalability, reliability, observability, caching, and asynchronous processing
  • Experience deploying and operating AI workloads in cloud environments (preferably Azure), including containerized services and GPU-enabled infrastructure
  • Understanding of Responsible AI practices, including model governance, safety, privacy, and evaluation of AI behaviour in production systems
  • Ability to work across product, research, and engineering teams to translate product scenarios into scalable AI system architectures
Job Responsibility
Job Responsibility
  • Design, build, and operate scalable AI systems that power intelligent product experiences, including Copilot and agent-driven workflows
  • Architect and implement backend services that support multi-step AI interactions, including orchestration pipelines, context management, memory/state persistence, and tool execution
  • Integrate large language models (LLMs), APIs, and internal services to enable context-aware, human-in-the-loop experiences across customer scenarios
  • Build and maintain data and inference pipelines that support model training, fine-tuning, evaluation, and real-time inference across diverse data sources
  • Evaluate, benchmark, and tune AI/ML models (LLMs and traditional models) to meet product requirements for accuracy, latency, reliability, and safety
  • Implement robust retrieval, grounding, and knowledge integration mechanisms (e.g., RAG systems, semantic indexing, vector search) to power intelligent applications
  • Collaborate with product managers, software engineers, and researchers to translate product vision into production-ready AI capabilities and measurable outcomes
  • Ensure reliability, observability, and governance of AI systems, including monitoring model performance, data quality, and responsible AI practices
  • Build reusable platforms, APIs, and tools that enable teams to rapidly develop AI-powered features and self-service intelligent applications
  • Fulltime
Read More
Arrow Right

Distinguished Engineer

As a Distinguished Engineer at Capital One, you will be a part of a community of...
Location
Location
United States , San Jose, California; McLean, Virginia; New York, New York; San Francisco, California
Salary
Salary:
Not provided
capitalone.com Logo
Capital One
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree
  • At least 7 years of experience in Software engineering
Job Responsibility
Job Responsibility
  • Articulate and evangelize a bold technical vision for your domain
  • Decompose complex problems into practical and operational solutions
  • Ensure the quality of technical design and implementation
  • Serve as an authoritative expert on non-functional system characteristics, such as performance, scalability and operability
  • Continue learning and injecting advanced technical knowledge into our community
  • Handle several projects simultaneously, balancing your time to maximize impact
  • Act as a role model and mentor within the tech community, helping to coach and strengthen the technical expertise and know-how of our engineering and product community
  • Design and drive the long-term technical roadmap for our Foundation Model Hosting platform, ensuring high throughput, ultra-low latency, and optimal GPU utilization across massive, multi-tenant workloads
  • Lead performance engineering across both the platform and model layers
  • You will pioneer the implementation of advanced techniques such as speculative decoding, continuous batching, kv-cache optimization (PagedAttention), and custom quantization strategies (FP8, INT4, AWQ)
What we offer
What we offer
  • performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
  • comprehensive, competitive, and inclusive set of health, financial and other benefits
  • Fulltime
Read More
Arrow Right

Senior AI Software Development Engineer

We are currently seeking a senior, experienced AI Software Engineer to join our ...
Location
Location
Romania , Iasi; Brasov; Bucharest
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Demonstrated experience delivering complex AI or software systems and influencing technical direction within a team
  • Strong understanding of AI/ML concepts and techniques, including deep learning, supervised and unsupervised learning, reinforcement learning, and probabilistic graphical models
  • Familiarity with popular ML frameworks and libraries, such as TensorFlow, PyTorch, Keras, and Scikit-learn
  • Proficient in programming languages such as Python, C++, and Java, with a strong focus on maintainable, high-quality production code
  • Familiarity with AMD's hardware (GPU, CPU, and APU) and software (ROCm, OpenCL, HIP) platforms is a plus, but not required
  • Strong analytical, problem-solving, and critical-thinking skills, with the ability to balance hands-on development with broader technical ownership
  • Excellent written and verbal communication skills, with the ability to effectively communicate complex concepts to a diverse audience
  • Bachelor’s or Master’s degree in Computer Science, Computer/Software Engineering or related technical discipline
Job Responsibility
Job Responsibility
  • Serve as a senior technical contributor, helping define system architecture, development standards, and best practices
  • Provide mentorship and technical guidance to other engineers through design discussions, code reviews, and knowledge sharing
  • Assist in the development of artificial intelligence models, algorithms, and systems tailored to specific project goals and requirements
  • Collaborate effectively with cross-functional teams, including product managers, researchers, hardware engineers, and software developers to support the development of comprehensive AI solutions
  • Learn and adapt to new techniques and methodologies to enhance product performance and develop new features
  • Optimize machine learning models for efficient deployment on AMD hardware and software platforms
  • Contribute to the process of monitoring the performance of deployed models, maintenance and updates, and troubleshooting any related issues
  • Stay current on the latest advancements in the fields of AI and machine learning, collaborating closely with colleagues to foster a culture of innovation
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, CoreAI Workload Engines

The CoreAI Workloads team builds the foundational inference engines and APIs tha...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field and 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
  • Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation.
  • Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes.
  • Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
  • Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality.
  • Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits).
  • Strong collaboration and communication skills, with the ability to work across organizational boundaries.
Job Responsibility
Job Responsibility
  • Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
  • Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails.
  • Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
  • Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
  • Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.
  • Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs).
  • Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up).
  • Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/InfiniBand-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement.
  • Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving.
  • Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability.
What we offer
What we offer
  • Benefits and other compensation
  • Fulltime
Read More
Arrow Right

Senior AI Models MAD - Model Automation and Dashboarding Engineer

AMD is looking for a skilled and motivated software engineer to join the Model A...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Undergraduate and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field
  • Strong C/C++/Python programming and software design skills, including debugging, performance analysis, and test design
  • Experience in test automation, CI/CD, and Linux scripting
  • Knowledge of GPU computing (HIP, CUDA, OpenCL)
  • Knowledge of Docker, Kubernetes, or Ansible for testing and deploying AI models and services at scale
  • Proficiency with version control (GitHub), testing strategies, code reviews, and collaborative software development
  • Strong written and verbal communication skills with a proactive approach to defining and driving development efforts
Job Responsibility
Job Responsibility
  • Enable and optimize key AI models (LLM, Vision, MultiModal, etc.) on AMD GPUs
  • Optimize AI frameworks like PyTorch, TensorFlow, etc., on AMD GPUs in upstream open-source repositories
  • Collaborate with internal GPU library teams and open-source framework maintainers to analyze, optimize, and integrate code changes upstream
  • Build and maintain automated functional and performance testing pipelines for AI models across ROCm-supported hardware using scalable tools
  • Develop tools and automation for continuous benchmarking and regression tracking across hardware generations and ROCm releases
  • Build and maintain real-time dashboards that report relevant performance, accuracy, and reliability metrics
  • Support public-facing MAD GitHub repositories and Docker releases, enabling the community to run and validate models on ROCm
  • Contribute to the design of portable, easy-to-use Python interfaces that support multi-node profiling, distributed workloads, and containerized deployments
Read More
Arrow Right