CrawlJobs Logo

Architecture Intern - Inference

etched.com Logo

Etched

Location Icon

Location:
United States , San Jose

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a talented Architecture intern to join our team and contribute to the design of next-generation AI accelerators. This role focuses on developing and optimizing compute architectures that deliver exceptional performance and efficiency for transformer workloads. You will work on cutting-edge architectural problems and performance modeling over the course of your internship.

Job Responsibility:

  • Support porting state-of-the-art models to our architecture
  • Help build programming abstractions and testing capabilities to rapidly iterate on model porting
  • Assist in building, enhancing, and scaling Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling
  • Contribute to optimizing routing and communication layers using Sohu’s collectives
  • Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues
  • Develop and leverage a deep understanding of Sohu to co-design both HW instructions and model architecture operations to maximize model performance
  • Implement high-performance software components for the Model Toolkit

Requirements:

  • Progress towards a Bachelor’s, Master’s, or PhD degree in computer science, computer engineering, or a related field
  • Proficiency in C++ or Rust
  • Understanding of performance-sensitive or complex distributed software systems, e.g. Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand)
  • Familiarity with PyTorch or JAX
  • Ported applications to non-standard accelerator hardware or hardware platforms
  • Deep knowledge of transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.)

Nice to have:

  • Low-latency, high-performance applications using both kernel-level and user-space networking stacks
  • Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns
  • Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE)
  • Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths
What we offer:
  • 12-week paid internship
  • Generous housing support for those relocating
  • Daily lunch and dinner in our office
  • Direct mentorship from industry leaders and world-class engineers
  • Opportunity to work on one of the most important problems of our time

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Architecture Intern - Inference

ML Engineer - Inference Serving

Luma’s mission is to build multimodal AI to expand human imagination and capabil...
Location
Location
United States; United Kingdom , Palo Alto; London
Salary
Salary:
187500.00 - 395000.00 USD / Year
lumalabs.ai Logo
Luma AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Python and system architecture skills
  • Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar
  • Experience with queues, scheduling, traffic-control, fleet management at scale
  • Experience with Linux, Docker, and Kubernetes
  • Python
  • Redis
  • S3-compatible Storage
  • Model serving (one of: PyTorch, vLLM, SGLang, Huggingface)
  • Understanding of large-scale orchestration, deployment, scheduling (via Kubernetes or similar)
Job Responsibility
Job Responsibility
  • Ship new model architectures by integrating them into our inference engine
  • Collaborate closely across research, engineering and infrastructure to streamline and optimize model efficiency and deployments
  • Build internal tooling to measure, profile, and track the lifetime of inference jobs and workflows
  • Automate, test and maintain our inference services to ensure maximum uptime and reliability
  • Optimize deployment workflows to scale across thousands of machines
  • Manage and optimize our inference workloads across different clusters & hardware providers
  • Build sophisticated scheduling systems to optimally leverage our expensive GPU resources while meeting internal SLOs
  • Build and maintain CI/CD pipelines for processing/optimizing model checkpoints, platform components, and SDKs for internal teams to integrate into our products/internal tooling
  • Fulltime
Read More
Arrow Right

AI / ML Engineer, Software Engineering

iCapital is seeking an experienced and forward-thinking AI/ML Engineer Vice Pres...
Location
Location
United States , New York
Salary
Salary:
180000.00 - 220000.00 USD / Year
icapital.com Logo
iCapital Network
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in software engineering, with at least 2+ years focused on AI/ML systems
  • Proven experience in building and deploying ML models in production environments
  • Hands-on experience with AI agent frameworks (e.g., LangChain, Semantic Kernel, AutoGen, or custom-built systems)
  • Strong understanding of the ML lifecycle, including data pipelines, model training, evaluation, deployment, and monitoring
  • Familiar with MLOps tools such as MLflow, Kubeflow, or SageMaker
  • Deep understanding of LLM orchestration, prompt engineering, tool use, and memory architectures
  • Familiar with various LLM inference engines such as vLLM or SGLang
  • Experience in integrating agents with APIs, databases, and external systems
  • Familiar with retrieval-augmented generation (RAG), vector databases, and knowledge graphs
  • Experience deploying AI systems in cloud environments (AWS, GCP, Azure) and utilizing containerization tools (Docker, Kubernetes)
Job Responsibility
Job Responsibility
  • Design, build, and optimize scalable AI/ML infrastructure and services powering intelligent features across our platform
  • Lead the development of AI agents capable of autonomous decision-making, task execution, and multi-step reasoning across internal and customer-facing applications
  • Architect and implement modular agent frameworks by integrating tools, APIs, and memory systems for dynamic and context-aware behavior
  • Collaborate with product, data, and infrastructure teams to embed AI capabilities into production systems
  • Drive the architecture and development of ML pipelines, model serving frameworks, and real-time inference systems
  • Evaluate and integrate state-of-the-art AI tools and frameworks to accelerate development and deployment
  • Provide technical mentorship and guidance to engineers, contributing to team growth and best practices
  • Partner with Data Science teams to operationalize models, ensuring a smooth transition from experimentation to production
  • Contribute to technical roadmaps and help define long-term AI/ML platform and agent strategy
  • Optimize agent performance for latency, reliability, and safety in production environments
What we offer
What we offer
  • Equity for all full-time employees
  • Annual performance bonus
  • Employer matched retirement plan
  • Generously subsidized healthcare with 100% employer paid dental, vision, telemedicine, and virtual mental health counseling
  • Parental leave
  • Unlimited paid time off (PTO)
  • Fulltime
Read More
Arrow Right
New

Cloud Solution Architect - AI/ML

We are looking for a Cloud Solution Architect (CSA) who is passionate about driv...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, Information Technology, Engineering, Business, or related field AND experience in cloud/infrastructure technologies, information technology (IT) consulting/support, systems administration, network operations, software development/support, technology solutions, practice development, architecture, and/or consulting
  • OR equivalent experience
  • This role requires UK Security Clearance, therefore candidates will need to either have existing security clearance or meet the minimum criteria to apply for security clearance.
  • Strong years of experience working in a customer-facing role (e.g., internal and/or external).
  • Strong years of experience working on technical projects
  • Technical Certification in Cloud (e.g., Azure, Amazon Web Services, Google, security certifications)
  • Experience and expertise in one or more of the following areas: Azure AI Foundry (Models, Agent Service, Semantic Kernel, Search, ML, SDK)
  • AppPlat/Containers/Serverless (App Service, AKS, ACA, ARO, Functions)
  • DevOps (CI/CD, Azure DevOps, DevSecOps)
  • GitHub (Copilot, Enterprise, Adv Security, Actions, Codespaces)
Job Responsibility
Job Responsibility
  • Understand customers’ Business and IT priorities and translate them into AI, ML, and cloud engineering architectures, spanning platform engineering, cloud‑native apps, inference pipelines, data workflows, and low‑code extensibility.
  • Act as a highly technical partner, leading customers through architecture reviews, proofs‑of‑concept, and MVP builds, including environment setup, model orchestration, retrieval pipelines, CI/CD automation, and deployment readiness.
  • Implement secure, performant solutions that meet production standards across performance, reliability, maintainability, observability, and Responsible AI requirements.
  • Deliver engineering‑focused workshops, deep‑dives, and readiness sessions
  • guide customers on ML engineering patterns, prompt engineering, data preparation, RAG design, deployment pipelines, and cloud development best practices.
  • Accelerate customer success by diagnosing and resolving technical blockers in application development, ML workflows, model deployment, and cloud infrastructure, driving adoption of Azure AI, Foundry, and cloud services.Use engineering knowledge to propose architecture improvements, performance optimisations, and scalable solution patterns.
  • Stay current with the latest Azure AI, OpenAI, Foundry, HuggingFace, GitHub, and cloud-native capabilities
  • be a practitioner in Python, .NET, JavaScript/Node, or equivalent enterprise stacks.Contribute reusable assets, patterns, sample architectures, code accelerators, and internal IP to scale technical impact across the CSA community.
  • Fulltime
Read More
Arrow Right

Lead Machine Learning Engineer

Machine Learning Engineers specializing in Inference Optimization focus on maxim...
Location
Location
Singapore , Singapore
Salary
Salary:
Not provided
thoughtworks.com Logo
Thoughtworks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep practical expertise in model and runtime optimization techniques (quantization, pruning, distillation, batching, caching)
  • Proven experience optimizing inference workloads using frameworks such as vLLM, NVIDIA Triton/Dynamo
  • Strong proficiency in deep learning frameworks (e.g. PyTorch, TensorFlow) with production deployment experience
  • Ability to diagnose and optimize performance using profiling tools (e.g. Nsight, PyTorch/TensorFlow profilers)
  • Solid understanding of GPU and accelerator architectures, and experience tuning workloads for cost and performance efficiency
  • Experience designing and benchmarking scalable inference systems across heterogeneous environments (GPU clusters, serverless, edge)
  • Familiarity with observability stacks, telemetry, and cost instrumentation for AI workloads
  • Demonstrated ability to lead small-to-medium engineering teams or technical workstreams
  • Skilled at balancing hands-on delivery with architectural oversight and mentorship
  • Strong communication and stakeholder engagement skills and are able to connect low-level optimizations with business impact
Job Responsibility
Job Responsibility
  • Lead the design and implementation of advanced model optimization pipelines, including quantization, pruning, and distillation
  • Architect and tune inference runtimes and serving frameworks to achieve optimal performance across deployments
  • Guide teams in implementing high-throughput serving strategies (continuous batching, KV caching, speculative decoding, asynchronous scheduling)
  • Develop benchmarks and performance dashboards to measure and communicate system-level efficiency improvements (throughput, latency, GPU utilization, cost)
  • Evaluate trade-offs across accuracy, performance, and cost, and design architectures to meet target SLAs across varied hardware environments (cloud, on-prem, edge)
  • Collaborate with infrastructure, MLOps, and product teams to embed inference optimization into production workflows and platform designs
  • Provide technical leadership and mentorship to engineers, fostering a culture of experimentation, rigor, and continuous performance improvement
  • Contribute to the development of internal frameworks, reference architectures, and playbooks for scalable and cost-efficient inference
  • Engage with clients to translate optimization outcomes into business value and articulate the ROI of technical improvements
What we offer
What we offer
  • Learning & Development
  • Interactive tools
  • Numerous development programs
  • Teammates who want to help you grow
  • Empowering our employees in their career journeys
  • Fulltime
Read More
Arrow Right
New

Lead Product Manager, AI Storage Solutions

As a Lead Product Manager, AI Storage Solutions you will define the strategic vi...
Location
Location
United States , Milpitas
Salary
Salary:
194425.00 - 275414.00 USD / Year
sandisk.com Logo
Sandisk
Expiration Date
May 06, 2026
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Electrical Engineering, Computer Science, Engineering, or related field
  • Minimum of 8+ years of experience in product management
  • Minimum of 2 years experience in AI Architectures for datacenters or on-device
  • Strong desire to take ownership of the full product lifecycle
  • Proven track record of managing and launching successful emerging technology products
  • Deep understanding of flash memory technology and AI Storage Solutions, 3D stacking, TSV architectures. NAND-based high bandwidth architectures
  • Any knowledge of AI software stack will be a added advantage
Job Responsibility
Job Responsibility
  • Define the strategic vision, roadmap, and execution plan for next‑generation memory solutions that enable cutting‑edge AI, Machine Learning, and High‑Performance Computing ecosystems
  • Serve as the connective tissue across engineering, marketing, operations, and hyperscale customers—driving competitive differentiation and business growth for high‑performance memory products
  • Own the multi‑year product roadmap, aligning with technology inflection points such as AI Storage Solutions, and emerging AI Storage Solutions architectures
  • Translate market dynamics, customer signals, and competitive insights into product requirements (MRD/PRD)
  • Lead deep technical engagements with hyperscalers and chipmakers (e.g., NVIDIA, AMD), turning customer performance, latency, and capacity needs into engineering deliverables
  • Position product solutions for AI inference, KV‑cache, and memory‑centric architectures informed by industry trends, inference context memory, KV‑tiering
  • Build business cases including TAM/SAM/SOM models, pricing strategies, investment requirements, and ROI modeling
  • Guide lifecycle execution from concept → development → through cross‑functional program reviews (Stage Gate)
  • Partner closely with ASIC, firmware, NAND, system, and quality teams to drive technical readiness and performance targets
  • Manage supplier relationships and work with supply chain to ensure cost, yield, and delivery success
What we offer
What we offer
  • Short-Term Incentive (STI) Plan
  • Long-Term Incentive (LTI) program (restricted stock units (RSUs) or cash equivalents)
  • RSU awards for eligible new hires
  • Paid vacation time
  • Paid sick leave
  • Medical/dental/vision insurance
  • Life, accident and disability insurance
  • Tax-advantaged flexible spending and health savings accounts
  • Employee assistance program
  • Other voluntary benefit programs such as supplemental life and AD&D, legal plan, pet insurance, critical illness, accident and hospital indemnity
  • Fulltime
Read More
Arrow Right
New

Research Scientist Intern, Edge Compute Architectures

Meta Reality Labs Research is looking for emerging scientists and researchers wi...
Location
Location
United States , Burlingame
Salary
Salary:
7313.00 - 12134.00 USD / Month
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Currently has or is in the process of obtaining a PhD in the fields of Electrical Engineering, Computer Engineering, Computer Science, or related field
  • Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
  • 2+ years of research experience in one or more of the following: machine learning accelerators or inference engines for deep neural networks (DNN), computer architecture for mobile or embedded systems (ARM/RISC-V ISA), deployment of computer vision algorithms on embedded systems
  • Experience with deep learning frameworks (Pythorch, Tensorflow, etc.)
  • Proficient with Python/C/C++
Job Responsibility
Job Responsibility
  • Support architecture studies, performance analysis, and design studies
  • Participate in research activities focused on edge compute architecture
Read More
Arrow Right

GenAI GTM Representative

Fireworks is hiring a GenAI GTM Representative to drive adoption of Fireworks wi...
Location
Location
United States , New York; San Mateo; Redwood City; Remote
Salary
Salary:
100000.00 - 120000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 2+ years of experience in sales, business development, customer success, or product roles focused on technical buyers
  • Thrive in fast-moving environments and love engaging with startup founders and builders
  • Can understand and explain technical topics like LLM inference, API performance, model tuning, or GPU architecture
  • Are proactive, persistent, and creative in breaking into accounts and building relationships
  • Have strong communication skills and can earn trust with both engineers and executives
  • Want to be on the front lines of the GenAI movement and work closely with companies shaping the future
Job Responsibility
Job Responsibility
  • Own a portfolio of GenAI-native startup accounts and drive adoption across key use cases
  • Actively identify, engage, and qualify new high-potential prospects through warm outreach, events, and network-driven channels
  • Lead technical and product conversations with founders, ML engineers, and infra leads
  • Help customers get live quickly by coordinating onboarding, benchmarking, and integration efforts
  • Work closely with internal teams to shape tailored solutions, POCs, and fine-tuning approaches
  • Translate customer feedback into actionable insights for the product and engineering teams
  • Track account health and usage data to identify expansion opportunities
  • Be a visible representative of Fireworks in the GenAI ecosystem — including attending demo days, meetups, and online communities
What we offer
What we offer
  • Competitive base salary
  • strong equity
  • long-term upside
  • comprehensive benefits package
  • meaningful equity in a fast-growing startup
  • Fulltime
Read More
Arrow Right
New

Senior GPU Engineer

We are seeking an expert Senior GPU Engineer to join our AI Infrastructure team....
Location
Location
China , Beijing
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 4+ years of experience in systems programming, HPC, or GPU software development, featuring at least 5 years of hands-on CUDA/C++ kernel development
  • Expertise in the CUDA programming model and NVIDIA GPU architectures (specifically Ampere/Hopper)
  • Deep understanding of the memory hierarchy (Shared Memory, L2 cache, Registers), warp-level primitives, occupancy optimization, and bank conflict resolution
  • Familiarity with advanced hardware features: Tensor Cores, TMA (Tensor Memory Accelerator), and asynchronous copy
  • Proven ability to navigate and modify complex, large-scale codebases (e.g., PyTorch internals, Linux kernel)
  • Experience with build and binding ecosystems: CMake, pybind11, and CI/CD for GPU workloads
  • Mastery of NVIDIA Nsight Systems/Compute
  • Ability to mathematically reason about performance using the Roofline Model, memory bandwidth utilization, and compute throughput
Job Responsibility
Job Responsibility
  • Custom Operator Development: Design and implement highly optimized GPU kernels (CUDA/Triton) for critical deep learning operations (e.g., FlashAttention, GEMM, LayerNorm) to outperform standard libraries
  • Inference Engine Architecture: Contribute to the development of our high-performance inference engine, focusing on graph optimizations, operator fusion, and dynamic memory management (e.g., KV Cache optimization)
  • Performance Optimization: Deeply analyze and profile model performance using tools like Nsight Systems/Compute. Identify bottlenecks in memory bandwidth, instruction throughput, and kernel launch overheads
  • Model Acceleration: Implement advanced acceleration techniques such as Quantization (INT8, FP8, AWQ), Kernel Fusion, and continuous batching
  • Distributed Computing: Optimize communication primitives (NCCL) to enable efficient multi-GPU and multi-node inference (Tensor Parallelism, Pipeline Parallelism)
  • Hardware Adaptation: Ensure the software stack fully utilizes modern GPU architecture features (e.g., NVIDIA Hopper/Ampere Tensor Cores, Asynchronous Copy)
  • Fulltime
Read More
Arrow Right