Architecture Intern - Inference Job at Etched (San Jose)

ML Engineer - Inference Serving

Luma’s mission is to build multimodal AI to expand human imagination and capabil...

Location

United States; United Kingdom , Palo Alto; London

Salary:

187500.00 - 395000.00 USD / Year

Luma AI

Expiration Date

Until further notice

Requirements

Strong Python and system architecture skills
Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar
Experience with queues, scheduling, traffic-control, fleet management at scale
Experience with Linux, Docker, and Kubernetes
Python
Redis
S3-compatible Storage
Model serving (one of: PyTorch, vLLM, SGLang, Huggingface)
Understanding of large-scale orchestration, deployment, scheduling (via Kubernetes or similar)

Job Responsibility

Ship new model architectures by integrating them into our inference engine
Collaborate closely across research, engineering and infrastructure to streamline and optimize model efficiency and deployments
Build internal tooling to measure, profile, and track the lifetime of inference jobs and workflows
Automate, test and maintain our inference services to ensure maximum uptime and reliability
Optimize deployment workflows to scale across thousands of machines
Manage and optimize our inference workloads across different clusters & hardware providers
Build sophisticated scheduling systems to optimally leverage our expensive GPU resources while meeting internal SLOs
Build and maintain CI/CD pipelines for processing/optimizing model checkpoints, platform components, and SDKs for internal teams to integrate into our products/internal tooling

Fulltime

AI / ML Engineer, Software Engineering

iCapital is seeking an experienced and forward-thinking AI/ML Engineer Vice Pres...

Location

United States , New York

Salary:

180000.00 - 220000.00 USD / Year

iCapital Network

Expiration Date

Until further notice

Requirements

8+ years of experience in software engineering, with at least 2+ years focused on AI/ML systems
Proven experience in building and deploying ML models in production environments
Hands-on experience with AI agent frameworks (e.g., LangChain, Semantic Kernel, AutoGen, or custom-built systems)
Strong understanding of the ML lifecycle, including data pipelines, model training, evaluation, deployment, and monitoring
Familiar with MLOps tools such as MLflow, Kubeflow, or SageMaker
Deep understanding of LLM orchestration, prompt engineering, tool use, and memory architectures
Familiar with various LLM inference engines such as vLLM or SGLang
Experience in integrating agents with APIs, databases, and external systems
Familiar with retrieval-augmented generation (RAG), vector databases, and knowledge graphs
Experience deploying AI systems in cloud environments (AWS, GCP, Azure) and utilizing containerization tools (Docker, Kubernetes)

Job Responsibility

Design, build, and optimize scalable AI/ML infrastructure and services powering intelligent features across our platform
Lead the development of AI agents capable of autonomous decision-making, task execution, and multi-step reasoning across internal and customer-facing applications
Architect and implement modular agent frameworks by integrating tools, APIs, and memory systems for dynamic and context-aware behavior
Collaborate with product, data, and infrastructure teams to embed AI capabilities into production systems
Drive the architecture and development of ML pipelines, model serving frameworks, and real-time inference systems
Evaluate and integrate state-of-the-art AI tools and frameworks to accelerate development and deployment
Provide technical mentorship and guidance to engineers, contributing to team growth and best practices
Partner with Data Science teams to operationalize models, ensuring a smooth transition from experimentation to production
Contribute to technical roadmaps and help define long-term AI/ML platform and agent strategy
Optimize agent performance for latency, reliability, and safety in production environments

What we offer

Equity for all full-time employees
Annual performance bonus
Employer matched retirement plan
Generously subsidized healthcare with 100% employer paid dental, vision, telemedicine, and virtual mental health counseling
Parental leave
Unlimited paid time off (PTO)

Fulltime

New

Cloud Solution Architect - AI/ML

We are looking for a Cloud Solution Architect (CSA) who is passionate about driv...

Location

United Kingdom , London

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science, Information Technology, Engineering, Business, or related field AND experience in cloud/infrastructure technologies, information technology (IT) consulting/support, systems administration, network operations, software development/support, technology solutions, practice development, architecture, and/or consulting
OR equivalent experience
This role requires UK Security Clearance, therefore candidates will need to either have existing security clearance or meet the minimum criteria to apply for security clearance.
Strong years of experience working in a customer-facing role (e.g., internal and/or external).
Strong years of experience working on technical projects
Technical Certification in Cloud (e.g., Azure, Amazon Web Services, Google, security certifications)
Experience and expertise in one or more of the following areas: Azure AI Foundry (Models, Agent Service, Semantic Kernel, Search, ML, SDK)
AppPlat/Containers/Serverless (App Service, AKS, ACA, ARO, Functions)
DevOps (CI/CD, Azure DevOps, DevSecOps)
GitHub (Copilot, Enterprise, Adv Security, Actions, Codespaces)

Job Responsibility

Understand customers’ Business and IT priorities and translate them into AI, ML, and cloud engineering architectures, spanning platform engineering, cloud‑native apps, inference pipelines, data workflows, and low‑code extensibility.
Act as a highly technical partner, leading customers through architecture reviews, proofs‑of‑concept, and MVP builds, including environment setup, model orchestration, retrieval pipelines, CI/CD automation, and deployment readiness.
Implement secure, performant solutions that meet production standards across performance, reliability, maintainability, observability, and Responsible AI requirements.
Deliver engineering‑focused workshops, deep‑dives, and readiness sessions
guide customers on ML engineering patterns, prompt engineering, data preparation, RAG design, deployment pipelines, and cloud development best practices.
Accelerate customer success by diagnosing and resolving technical blockers in application development, ML workflows, model deployment, and cloud infrastructure, driving adoption of Azure AI, Foundry, and cloud services.Use engineering knowledge to propose architecture improvements, performance optimisations, and scalable solution patterns.
Stay current with the latest Azure AI, OpenAI, Foundry, HuggingFace, GitHub, and cloud-native capabilities
be a practitioner in Python, .NET, JavaScript/Node, or equivalent enterprise stacks.Contribute reusable assets, patterns, sample architectures, code accelerators, and internal IP to scale technical impact across the CSA community.

Fulltime

Lead Machine Learning Engineer

Machine Learning Engineers specializing in Inference Optimization focus on maxim...

Location

Singapore , Singapore

Salary:

Not provided

Thoughtworks

Expiration Date

Until further notice

Requirements

Deep practical expertise in model and runtime optimization techniques (quantization, pruning, distillation, batching, caching)
Proven experience optimizing inference workloads using frameworks such as vLLM, NVIDIA Triton/Dynamo
Strong proficiency in deep learning frameworks (e.g. PyTorch, TensorFlow) with production deployment experience
Ability to diagnose and optimize performance using profiling tools (e.g. Nsight, PyTorch/TensorFlow profilers)
Solid understanding of GPU and accelerator architectures, and experience tuning workloads for cost and performance efficiency
Experience designing and benchmarking scalable inference systems across heterogeneous environments (GPU clusters, serverless, edge)
Familiarity with observability stacks, telemetry, and cost instrumentation for AI workloads
Demonstrated ability to lead small-to-medium engineering teams or technical workstreams
Skilled at balancing hands-on delivery with architectural oversight and mentorship
Strong communication and stakeholder engagement skills and are able to connect low-level optimizations with business impact

Job Responsibility

Lead the design and implementation of advanced model optimization pipelines, including quantization, pruning, and distillation
Architect and tune inference runtimes and serving frameworks to achieve optimal performance across deployments
Guide teams in implementing high-throughput serving strategies (continuous batching, KV caching, speculative decoding, asynchronous scheduling)
Develop benchmarks and performance dashboards to measure and communicate system-level efficiency improvements (throughput, latency, GPU utilization, cost)
Evaluate trade-offs across accuracy, performance, and cost, and design architectures to meet target SLAs across varied hardware environments (cloud, on-prem, edge)
Collaborate with infrastructure, MLOps, and product teams to embed inference optimization into production workflows and platform designs
Provide technical leadership and mentorship to engineers, fostering a culture of experimentation, rigor, and continuous performance improvement
Contribute to the development of internal frameworks, reference architectures, and playbooks for scalable and cost-efficient inference
Engage with clients to translate optimization outcomes into business value and articulate the ROI of technical improvements

What we offer

Learning & Development
Interactive tools
Numerous development programs
Teammates who want to help you grow
Empowering our employees in their career journeys

Fulltime

New

Lead Product Manager, AI Storage Solutions

As a Lead Product Manager, AI Storage Solutions you will define the strategic vi...

Location

United States , Milpitas

Salary:

194425.00 - 275414.00 USD / Year

Sandisk

Expiration Date

May 06, 2026

Requirements

Bachelor’s degree in Electrical Engineering, Computer Science, Engineering, or related field
Minimum of 8+ years of experience in product management
Minimum of 2 years experience in AI Architectures for datacenters or on-device
Strong desire to take ownership of the full product lifecycle
Proven track record of managing and launching successful emerging technology products
Deep understanding of flash memory technology and AI Storage Solutions, 3D stacking, TSV architectures. NAND-based high bandwidth architectures
Any knowledge of AI software stack will be a added advantage

Job Responsibility

Define the strategic vision, roadmap, and execution plan for next‑generation memory solutions that enable cutting‑edge AI, Machine Learning, and High‑Performance Computing ecosystems
Serve as the connective tissue across engineering, marketing, operations, and hyperscale customers—driving competitive differentiation and business growth for high‑performance memory products
Own the multi‑year product roadmap, aligning with technology inflection points such as AI Storage Solutions, and emerging AI Storage Solutions architectures
Translate market dynamics, customer signals, and competitive insights into product requirements (MRD/PRD)
Lead deep technical engagements with hyperscalers and chipmakers (e.g., NVIDIA, AMD), turning customer performance, latency, and capacity needs into engineering deliverables
Position product solutions for AI inference, KV‑cache, and memory‑centric architectures informed by industry trends, inference context memory, KV‑tiering
Build business cases including TAM/SAM/SOM models, pricing strategies, investment requirements, and ROI modeling
Guide lifecycle execution from concept → development → through cross‑functional program reviews (Stage Gate)
Partner closely with ASIC, firmware, NAND, system, and quality teams to drive technical readiness and performance targets
Manage supplier relationships and work with supply chain to ensure cost, yield, and delivery success

What we offer

Short-Term Incentive (STI) Plan
Long-Term Incentive (LTI) program (restricted stock units (RSUs) or cash equivalents)
RSU awards for eligible new hires
Paid vacation time
Paid sick leave
Medical/dental/vision insurance
Life, accident and disability insurance
Tax-advantaged flexible spending and health savings accounts
Employee assistance program
Other voluntary benefit programs such as supplemental life and AD&D, legal plan, pet insurance, critical illness, accident and hospital indemnity

Fulltime

New

Research Scientist Intern, Edge Compute Architectures

Meta Reality Labs Research is looking for emerging scientists and researchers wi...

Location

United States , Burlingame

Salary:

7313.00 - 12134.00 USD / Month

GenAI GTM Representative

Fireworks is hiring a GenAI GTM Representative to drive adoption of Fireworks wi...

Location

United States , New York; San Mateo; Redwood City; Remote

Salary:

100000.00 - 120000.00 USD / Year

Fireworks AI

Expiration Date

Until further notice

Requirements

2+ years of experience in sales, business development, customer success, or product roles focused on technical buyers
Thrive in fast-moving environments and love engaging with startup founders and builders
Can understand and explain technical topics like LLM inference, API performance, model tuning, or GPU architecture
Are proactive, persistent, and creative in breaking into accounts and building relationships
Have strong communication skills and can earn trust with both engineers and executives
Want to be on the front lines of the GenAI movement and work closely with companies shaping the future

Job Responsibility

Own a portfolio of GenAI-native startup accounts and drive adoption across key use cases
Actively identify, engage, and qualify new high-potential prospects through warm outreach, events, and network-driven channels
Lead technical and product conversations with founders, ML engineers, and infra leads
Help customers get live quickly by coordinating onboarding, benchmarking, and integration efforts
Work closely with internal teams to shape tailored solutions, POCs, and fine-tuning approaches
Translate customer feedback into actionable insights for the product and engineering teams
Track account health and usage data to identify expansion opportunities
Be a visible representative of Fireworks in the GenAI ecosystem — including attending demo days, meetups, and online communities

What we offer

Competitive base salary
strong equity
long-term upside
comprehensive benefits package
meaningful equity in a fast-growing startup

Fulltime

New

Senior GPU Engineer

We are seeking an expert Senior GPU Engineer to join our AI Infrastructure team....

Location

China , Beijing

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
4+ years of experience in systems programming, HPC, or GPU software development, featuring at least 5 years of hands-on CUDA/C++ kernel development
Expertise in the CUDA programming model and NVIDIA GPU architectures (specifically Ampere/Hopper)
Deep understanding of the memory hierarchy (Shared Memory, L2 cache, Registers), warp-level primitives, occupancy optimization, and bank conflict resolution
Familiarity with advanced hardware features: Tensor Cores, TMA (Tensor Memory Accelerator), and asynchronous copy
Proven ability to navigate and modify complex, large-scale codebases (e.g., PyTorch internals, Linux kernel)
Experience with build and binding ecosystems: CMake, pybind11, and CI/CD for GPU workloads
Mastery of NVIDIA Nsight Systems/Compute
Ability to mathematically reason about performance using the Roofline Model, memory bandwidth utilization, and compute throughput

Job Responsibility

Custom Operator Development: Design and implement highly optimized GPU kernels (CUDA/Triton) for critical deep learning operations (e.g., FlashAttention, GEMM, LayerNorm) to outperform standard libraries
Inference Engine Architecture: Contribute to the development of our high-performance inference engine, focusing on graph optimizations, operator fusion, and dynamic memory management (e.g., KV Cache optimization)
Performance Optimization: Deeply analyze and profile model performance using tools like Nsight Systems/Compute. Identify bottlenecks in memory bandwidth, instruction throughput, and kernel launch overheads
Model Acceleration: Implement advanced acceleration techniques such as Quantization (INT8, FP8, AWQ), Kernel Fusion, and continuous batching
Distributed Computing: Optimize communication primitives (NCCL) to enable efficient multi-GPU and multi-node inference (Tensor Parallelism, Pipeline Parallelism)
Hardware Adaptation: Ensure the software stack fully utilizes modern GPU architecture features (e.g., NVIDIA Hopper/Ampere Tensor Cores, Asynchronous Copy)

Fulltime

Architecture Intern - Inference

Etched

Location:
United States , San Jose

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:
February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Architecture Intern - Inference

ML Engineer - Inference Serving

AI / ML Engineer, Software Engineering

Cloud Solution Architect - AI/ML

Lead Machine Learning Engineer

Lead Product Manager, AI Storage Solutions

Research Scientist Intern, Edge Compute Architectures

GenAI GTM Representative

Senior GPU Engineer

Architecture Intern - Inference

Etched

Location:United States , San Jose

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Nice to have:

Additional Information:

Job Posted:February 18, 2026

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for Architecture Intern - Inference

ML Engineer - Inference Serving

AI / ML Engineer, Software Engineering

Cloud Solution Architect - AI/ML

Lead Machine Learning Engineer

Lead Product Manager, AI Storage Solutions

Research Scientist Intern, Edge Compute Architectures

GenAI GTM Representative

Senior GPU Engineer

Location:
United States , San Jose

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
February 18, 2026