Software Engineer, AI Infrastructure Job at Fireworks AI (New York, NY)

Senior Software Engineer and Principal Software Engineer - Power Point AI Team

The PowerPoint team is embarking on an exciting new chapter - evolving a product...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
8+ years of experience in backend service engineering, including work on high-scale infrastructures
Proficiency in one or more systems programming languages such as C#, C++
1+ years of experience in software engineering, designing and developing systems (and APIs) that deploy and integrate with AI models
2+ years of experience working with rich telemetry, making data driven decisions, and carrying out rapid experimentation
2+ years of experience building software for scale, performance, and reliability
Academic or industry experience with building, finetuning, deploying or building eval-driven systems utilizing the models (any category)

Job Responsibility

Lead design and delivery of complex, scalable AI features ensuring resilience and exceptional user experience
Drive technical strategy and architecture decisions across multiple services, influencing partner teams and aligning with compliance and security requirements
Champion modern engineering practices, including AI-driven approaches, automation, and cloud-native patterns, across the full development lifecycle
Mentor and guide engineers, fostering technical excellence and continuous improvement in security, reliability, and performance
Collaborate cross-org to solve challenging technical problems, streamline processes, and reduce operational costs while improving live-site health
Design and implement scalable backend services optimized for machine learning workflows and large language model integration
Develop and maintain evaluation-driven systems that leverage text and multimodal inputs (e.g., images) to power visual-creation experiences
Build and optimize APIs and infrastructure to support high-performance model inference and experimentation at scale
Collaborate with product, ML, and design teams to integrate models into user-facing features, ensuring seamless functionality and performance
Conduct model evaluations and experiments, analyze results, and iterate on improvements to enhance accuracy and user experience

Fulltime

Senior Software Engineer - AI Infrastructure (Scheduler) - CoreAI

The AI Platform organization builds the end-to-end Azure AI stack, from the infr...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C++, C#, Java, Scala, Rust, Go, TypeScript | OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Work on the design and development of the core AI Infrastructure distributed and in-cluster services that support large scale AI training and inferencing
Develop, test, and maintain control plane services written in C#, hosted on Service Fabric or Kubernetes (AKS) clusters
Enhance systems and applications to ensure high stability, efficiency and maintainability, low latency, tight cloud security
Provide operational support and DRI (on-call) responsibilities for the service
Develop and foster a deep understanding of the machine learning concepts, use cases, and relevant services used by our customers
Collaborate closely with service engineers, product managers, and internal applied research and data science teams within Microsoft to build better solutions together
Provide vision, expertise, and technical leadership to other team members
Help to grow talent in these areas
Embody our culture and values

Fulltime

Software Engineer - AI Infrastructure

We’re looking for a software engineer to join our Infrastructure team—building a...

Location

United States , New York City

Salary:

135000.00 - 280000.00 USD / Year

Assembled

Expiration Date

Until further notice

Requirements

Have 6+ years of engineering experience, with past ownership of high-scale, production-critical infrastructure
Have experience with distributed systems and container orchestration (especially Kubernetes)
Have worked with AI/ML platforms or are excited to build foundational infrastructure for LLM-based applications
Thrive in fast-paced environments with shifting requirements and ambiguous problem spaces
Are motivated by impact, enjoy deep technical challenges, and want to work cross-functionally across security, AI, and product
Have strong familiarity with one or more parts of our tech stack: Cloud provider: AWS
Orchestration: Kubernetes + Karpenter
LLM integration: Experience with OpenAI, Anthropic, or open-source model serving (e.g., vLLM, HuggingFace TGI, Ray Serve)
Prompt & embedding infrastructure: Vector databases (e.g., Pinecone, Weaviate, PGVector), semantic search, prompt templating systems
Datastores: Postgres + PgBouncer, Snowflake, Redis

Job Responsibility

Agent service reliability and scaling: We manage and scale the infrastructure that serves LLM-powered agents across chat, email, and voice. This includes selecting inference strategies, integrating with model providers (e.g. OpenAI, Anthropic), and dynamically routing traffic for performance and cost efficiency
Prompt and embedding storage systems: Assist relies heavily on dynamically generated prompts and semantic search across support content. The team owns highly-available, fast-access storage and indexing layers optimized for real-time AI interactions
Privacy and security: Enterprises expect strict guardrails around AI use. We’re building systems like network-level intrusion detection (IDS/IPS), audit logging, and LLM usage policy enforcement to meet these expectations and unlock new sales channels
Observability and usage analytics: We operate systems that surface key metrics—token usage, latency, cost per response, and quality signals—so the Assist team can continuously improve Assist’s performance and accuracy
AI-powered developer tools: We are beginning to explore and evangelize the use of AI to accelerate internal engineering workflows—through internal chat agents, pair programming tools, and intelligent automation for deployment, debugging, and on-call. Our goal is to empower engineers across the company to build faster and more confidently with AI

What we offer

Generous medical, dental, and vision benefits
Paid company holidays, sick time, and unlimited time off
Monthly credits to spend on each: professional development, general wellness, Assembled customers, and commuting
Paid parental leave
Hybrid work model with catered lunches everyday (M-F), snacks, and beverages in our SF & NY offices
401(k) plan enrollment
Stock options are provided as part of the compensation package

Fulltime

Staff Infrastructure Software Engineer - AI Platform

We are currently seeking a Staff Software Engineer to join the AI Platform team ...

Location

United Kingdom , Edinburgh

Salary:

Not provided

Addepar

Expiration Date

Until further notice

Requirements

Extensive experience as a Software/Backend Engineer, with a track record of taking on increasing responsibility
Experience across the full product lifecycle: designing, implementing, shipping, scaling, operationalizing, and maintaining technology/SaaS products
Exceptional Programming skills and fundamentals in Python/Go/Java, with a proven track record of building large scale production systems
Proficient experience with diverse compute environments including microservices (K8s), Databricks and serverless architectures (e.g. AWS Lambda)
Demonstrable experience leading initiatives with infrastructure-as-code tools such as Terraform in complex, multi-account environments
Proficient experience with comprehensive monitoring and alerting stacks (e.g. Prometheus/Grafana/Sentry/cloud-native tools), with a focus on observability strategy
Excellent interpersonal and communication skills to effectively collaborate with multi-functional teams, articulate complex technical concepts, and influence outcomes

Job Responsibility

Design and build the production runtime for LLM-based agents and products, creating the services and infrastructure that serve autonomous agents
Develop deep application-level knowledge to proactively inform and influence requirements, constraints and best practices for implementing composable, complex AI systems
Lead the design, implementation, and automation of production infrastructure on a variety of cloud environments (Kubernetes/Databricks), to enable us to ship and scale AI features instantly
Evangelize and promote disciplined, best engineering practices to enforce strong production hygiene and culture
Initiate and lead collaborations with cross-functional teams to identify and resolve complex application or infrastructure issues, serving as a technical subject matter expert
Architect, build, and maintain advanced, automated CI/CD pipelines e.g. using Jenkins, ArgoCD, AWS CodeBuild/Pipeline, GitHub Actions, or similar, establishing best practices for deployment strategies (e.g., blue/green, canary)
Develop systems and best practices monitoring, alerting, and troubleshooting of our probabilistic and AI-driven systems and broader software stack

Staff Infrastructure Software Engineer, Enterprise AI

Scale GP is building the next generation of enterprise-grade Generative AI produ...

Location

United States , New York; San Francisco

Salary:

216200.00 - 270250.00 USD / Year

Scale

Expiration Date

Until further notice

Requirements

Proven experience in a senior role
5+ years of full-time software engineering experience
Deep understanding of modern infrastructure practices, including CI/CD, IaC (e.g., Terraform, Helm Charts), container orchestration (e.g., Kubernetes) and observability platforms (e.g., Datadog, Prometheus, Grafana)
Extensive experience with at least one major cloud provider (AWS, Azure, or GCP)
Strong knowledge of security and compliance in enterprise environments, with a focus on access management, data isolation, and customer-specific VPC setups
Proficiency in Python or JavaScript/TypeScript, and SQL

Job Responsibility

Define the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers
Lead the infrastructure roadmap with a strong focus on compliance, privacy, and security standards, including designing change management and data isolation strategies
Own the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics) to proactively ensure system health and enable rapid incident response
Drive developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms throughout the engineering organization
Solve the toughest engineering problems related to multi-tenancy, data isolation, and high-performance inference at a massive scale, taking end-to-end ownership across the full product lifecycle

What we offer

Comprehensive health, dental and vision coverage
retirement benefits
a learning and development stipend
generous PTO
equity based compensation
additional benefits such as a commuter stipend

Fulltime

Senior Software Engineer, Data Infrastructure & AI

Fullstory Anywhere is one of Fullstory's three primary product verticals, and it...

Location

United States , Atlanta

Salary:

160000.00 - 170000.00 USD / Year

Fullstory

Expiration Date

Until further notice

Requirements

Significant experience building and operating high-throughput data pipelines (batch and/or streaming) in a major cloud platform, including work with cloud data warehouses like BigQuery, Snowflake, or Databricks.
Proficiency in Go, Python, Java or a similar language.
Hands-on experience with data transformation tooling such as dbt, with a strong understanding of data modeling and pipeline observability.
Familiarity with LLM integration patterns and evaluation approaches (e.g., LangSmith, Vertex AI, or comparable frameworks), or demonstrated ability to ramp quickly in applied AI.
A track record of owning major system areas end-to-end: driving architectural decisions, maintaining production health, and improving reliability over time.

Job Responsibility

Maintain, extend, and scale Go microservices that transform and deliver Fullstory session data into customer warehouses and power the team's MCP server that enables AI agent integrations.
Develop and maintain dbt models and pipeline orchestration to ensure timely, fault-tolerant data migrations across hundreds of customer destinations.
Define evaluation frameworks for LLM outputs using tools like Langsmith and Vertex AI, ensuring AI-powered customer agents produce accurate, useful results.
Investigate and resolve production incidents across the data pipeline, implementing systemic fixes that prevent entire classes of failure from recurring.
Write technical design documents that drive consensus on architectural changes, proactively surfacing scaling bottlenecks, edge cases, and cross-team dependencies.
Demonstrate sound technical judgment by de-risking work through spikes, taking on tech debt deliberately, and knowing when to escalate versus dig in.

What we offer

Flexibility and Connection
flexible PTO policy
annual company-wide closure
Benefits
paid parental leave
Bereavement leave, including miscarriage/pregnancy loss
Learning opportunities
annual learning subsidy
Productivity support
monthly productivity stipend

Fulltime

Lead Software Engineer, Backend (AI Infrastructure & Tooling)

Do you love building and pioneering in the technology space? Do you enjoy solvin...

Location

United States , Plano

Salary:

179400.00 - 204700.00 USD / Year

Capital One

Expiration Date

Until further notice

Requirements

Bachelor’s Degree
At least 4 years of professional software engineering experience (Internship experience does not apply)
At least 1 year experience with cloud computing (AWS, Microsoft Azure, Google Cloud)

Job Responsibility

Lead a portfolio of diverse technology projects and a team of developers with deep experience in distributed microservices, and full stack systems to create solutions that help meet regulatory needs for the company
Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, mentoring other members of the engineering community, and from time to time, be asked to code or evaluate code
Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment
Utilize programming languages like Java, Python, SQL, Node, Go, and Scala, Open Source RDBMS and NoSQL databases, Container Orchestration services including Docker and Kubernetes, and a variety of AWS tools and services

What we offer

performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI)
comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being

Fulltime

Senior Software Engineer - Data Platform, AI Infrastructure

We are building a large-scale, productized data platform that powers critical in...

Location

United States , Redmond

Salary:

119800.00 - 234700.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Strong programming experience in Python
Experience building and operating large-scale distributed systems
Hands-on experience with: Backend services or APIs (e.g., FastAPI, Flask, or similar)
Cloud-based infrastructure (Azure, AWS, or GCP)
Monitoring and observability systems (metrics, logging, alerting)
Experience designing systems with reliability, scalability, and operational clarity in mind
Proven ability to own and deliver production systems end-to-end
Ability to break down ambiguous problems, ask the right questions, and execute effectively

Job Responsibility

Design, build, and operate core components of a distributed data platform, including: Orchestration systems (e.g., Airflow or equivalent)
Backend services and APIs (Python/FastAPI or similar)
Monitoring, alerting, and reliability systems
Own the end-to-end lifecycle of platform components - from design through deployment, scaling, and maintenance
Ensure systems meet requirements for availability, performance, and data reliability at large scale
Define and enforce standardized patterns for infrastructure, deployment, and observability across the platform
Partner with data engineering teams to enable efficient, reliable data processing workflows
Diagnose and resolve complex issues in distributed systems, including performance bottlenecks and failure modes
Contribute to infrastructure-as-code and deployment systems to support reproducibility and operational excellence
Drive continuous improvements in system robustness, cost efficiency, and operational clarity

What we offer

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Fulltime

Select Country

Software Engineer, AI Infrastructure

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?

Software Engineer, AI Infrastructure

Senior Software Engineer and Principal Software Engineer - Power Point AI Team

Senior Software Engineer - AI Infrastructure (Scheduler) - CoreAI

Software Engineer - AI Infrastructure

Staff Infrastructure Software Engineer - AI Platform

Staff Infrastructure Software Engineer, Enterprise AI

Senior Software Engineer, Data Infrastructure & AI

Lead Software Engineer, Backend (AI Infrastructure & Tooling)

Senior Software Engineer - Data Platform, AI Infrastructure

Our AI answers in your language