CrawlJobs Logo

Senior Data Engineer - AI Infrastructure

United States, Redmond Employment contract 119800.00 - 234700.00 USD / Year · Job Posted May 03, 2026
Apply Position
Job Link Share

Job Description

We are building a large-scale data platform that transforms raw system logs into high-quality, structured datasets used for experimentation and analytics. The platform processes terabytes to petabytes of data daily and serves as a foundational asset for multiple teams. This Senior Data Engineer - AI Infrastrucute role focuses on designing and implementing data pipelines, ensuring correctness, and building scalable data models. You will work closely with data scientists and platform engineers to ensure that data is accurate, reliable, and usable for downstream decision-making. We are looking for engineers who care deeply about data correctness, understand how systems behave at scale, and can translate complex data into well-structured, reliable datasets.

Job Responsibility

  • Design and implement large-scale data pipelines using PySpark and distributed processing frameworks
  • Build and maintain data models that accurately represent underlying system behavior and business logic
  • Ensure high standards of data correctness, completeness, and consistency across datasets
  • Develop validation, monitoring, and alerting mechanisms to detect data quality issues
  • Partner with data scientists to support experimentation and analytics use cases
  • Collaborate with platform engineers to ensure efficient data ingestion, processing, and storage
  • Optimize pipelines for performance, scalability, and cost efficiency
  • Define and enforce best practices for schema design, data transformations, and pipeline reliability

Requirements

  • Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 3+ years experience in business analytics, data science, software development, data modeling, or data engineering OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice to have

  • Experience with Azure technologies such as: ADLS Gen2 (Blob Storage)
  • Synapse Spark
  • Azure Data Explorer (ADX)
  • Experience working with structured and semi-structured data (e.g., JSON logs)
  • Familiarity with experimentation and analytics workflows
  • Experience with orchestration tools (e.g., Airflow)
  • Exposure to privacy, compliance, and secure data handling practices
  • 5+ years of experience in data engineering or software engineering with a strong focus on data systems
  • Strong experience with PySpark or similar distributed data processing frameworks
  • Experience building and operating large-scale data pipelines
  • Strong understanding of data modeling and schema design
  • Experience ensuring data quality and correctness in production systems
  • Proficiency in Python
  • Experience working with cloud-based data platforms (Azure, AWS, or GCP)
  • Ability to reason about data at scale, including performance and failure modes

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Data Engineer - AI Infrastructure

8 matching positions

Senior Software Engineer, Data Infrastructure & AI

Fullstory Anywhere is one of Fullstory's three primary product verticals, and it...
Location
Location
United States , Atlanta
Salary
Salary:
160000.00 - 170000.00 USD / Year
fullstory.com Logo
Fullstory
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Significant experience building and operating high-throughput data pipelines (batch and/or streaming) in a major cloud platform, including work with cloud data warehouses like BigQuery, Snowflake, or Databricks.
  • Proficiency in Go, Python, Java or a similar language.
  • Hands-on experience with data transformation tooling such as dbt, with a strong understanding of data modeling and pipeline observability.
  • Familiarity with LLM integration patterns and evaluation approaches (e.g., LangSmith, Vertex AI, or comparable frameworks), or demonstrated ability to ramp quickly in applied AI.
  • A track record of owning major system areas end-to-end: driving architectural decisions, maintaining production health, and improving reliability over time.
Job Responsibility
Job Responsibility
  • Maintain, extend, and scale Go microservices that transform and deliver Fullstory session data into customer warehouses and power the team's MCP server that enables AI agent integrations.
  • Develop and maintain dbt models and pipeline orchestration to ensure timely, fault-tolerant data migrations across hundreds of customer destinations.
  • Define evaluation frameworks for LLM outputs using tools like Langsmith and Vertex AI, ensuring AI-powered customer agents produce accurate, useful results.
  • Investigate and resolve production incidents across the data pipeline, implementing systemic fixes that prevent entire classes of failure from recurring.
  • Write technical design documents that drive consensus on architectural changes, proactively surfacing scaling bottlenecks, edge cases, and cross-team dependencies.
  • Demonstrate sound technical judgment by de-risking work through spikes, taking on tech debt deliberately, and knowing when to escalate versus dig in.
What we offer
What we offer
  • Flexibility and Connection
  • flexible PTO policy
  • annual company-wide closure
  • Benefits
  • paid parental leave
  • Bereavement leave, including miscarriage/pregnancy loss
  • Learning opportunities
  • annual learning subsidy
  • Productivity support
  • monthly productivity stipend
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Data Platform, AI Infrastructure

We are building a large-scale, productized data platform that powers critical in...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
  • Strong programming experience in Python
  • Experience building and operating large-scale distributed systems
  • Hands-on experience with: Backend services or APIs (e.g., FastAPI, Flask, or similar)
  • Cloud-based infrastructure (Azure, AWS, or GCP)
  • Monitoring and observability systems (metrics, logging, alerting)
  • Experience designing systems with reliability, scalability, and operational clarity in mind
  • Proven ability to own and deliver production systems end-to-end
  • Ability to break down ambiguous problems, ask the right questions, and execute effectively
Job Responsibility
Job Responsibility
  • Design, build, and operate core components of a distributed data platform, including: Orchestration systems (e.g., Airflow or equivalent)
  • Backend services and APIs (Python/FastAPI or similar)
  • Monitoring, alerting, and reliability systems
  • Own the end-to-end lifecycle of platform components - from design through deployment, scaling, and maintenance
  • Ensure systems meet requirements for availability, performance, and data reliability at large scale
  • Define and enforce standardized patterns for infrastructure, deployment, and observability across the platform
  • Partner with data engineering teams to enable efficient, reliable data processing workflows
  • Diagnose and resolve complex issues in distributed systems, including performance bottlenecks and failure modes
  • Contribute to infrastructure-as-code and deployment systems to support reproducibility and operational excellence
  • Drive continuous improvements in system robustness, cost efficiency, and operational clarity
What we offer
What we offer
  • Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
  • Fulltime
Read More
Arrow Right

Senior Data Engineer - AI and Analytics

We're building a world of health around every individual — shaping a more connec...
Location
Location
United States , Buffalo Grove
Salary
Salary:
101970.00 - 203940.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
June 24, 2026
Flip Icon
Requirements
Requirements
  • 3-5+ years of experience with SQL, NoSQL
  • 3-5+ years of experience with Python
  • 3+ years of experience with Data warehouses (such as data modeling and technical architectures) and infrastructure components
  • 3+ years of experience with ETL/ELT, and building high-volume data pipelines
  • 3+ years of experience with reporting/analytic tools
  • 3+ years of experience with Query optimization, data structures, transformation, metadata, dependency, and workload management
  • 3+ years of experience with Big data and cloud architecture
  • 3+ years of hands-on experience building modern data pipelines within a major cloud platform (preferably GCP, open to AWS or Azure)
  • 3+ years of experience with deployment/scaling of apps on containerized environment (i.e. Kubernetes, AKS)
  • 3+ years of experience with real-time and streaming technology (i.e. Azure Event Hubs, Azure Functions, Kafka, Spark Streaming)
Job Responsibility
Job Responsibility
  • Design, develop, and maintain optimal data pipelines to assemble large and intricate datasets
  • Cater to the business requirements of various CVS lines of business
  • Collaborate closely with teams to craft tools to provide actionable insights and integrate them with consumer touchpoints
  • Solve problems associated with large scale complex, structured and unstructured data
What we offer
What we offer
  • Medical, dental, and vision coverage
  • paid time off
  • retirement savings options
  • wellness programs
  • bonus, commission or short-term incentive program
  • Fulltime
!
Read More
Arrow Right

Senior AI Infrastructure Engineer - Training Platform

As a Software Engineer on the Machine Learning Infrastructure team, you will bui...
Location
Location
United States , San Francisco; Seattle; New York
Salary
Salary:
216000.00 - 270000.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in backend or infrastructure engineering, with at least 2 years focused on orchestrating ML workloads at scale (100+ GPU nodes)
  • Strong programming skills in one or more languages (e.g. Python, Go, Rust, C++)
  • Experience with complex compute management systems that cover queueing, quotas, preemption, and gang scheduling
  • Experience with distributed training infrastructure, such as EFA, Infiniband, and topology-aware scheduling
  • Experience with distributed storage systems (e.g. Lustre, S3) as they relate to training throughput
  • Expert-level knowledge of Kubernetes internals (Custom Resources, Operators, Admission Controllers) and how they interact with device plugins for specialized hardware
  • Familiarity with cloud infrastructure (AWS, GCP) and infrastructure as code (e.g., Terraform)
  • Proven ability to solve complex problems and work independently in fast-moving environments
Job Responsibility
Job Responsibility
  • Architect and scale a multi-tenant orchestration layer that abstracts away the complexity of GPU clusters, ensuring high utilization and seamless job recovery
  • Design and implement scheduling primitives to optimize the lifecycle of training jobs
  • Develop deep observability and automated health-checking into the training stack to proactively identify and isolate hardware failures
  • Evaluate and integrate emerging technologies in the CNCF and AI ecosystem (e.g. Ray, Kueue), making data-driven build vs. buy decisions that balance velocity with long-term maintainability
  • Work closely with Finance and Procurement teams to drive our capacity planning process
  • Participate in our team's on call process to ensure the availability of our services
  • Own projects end-to-end, from requirements, scoping, design, to implementation, in a highly collaborative and cross-functional environment
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • commuter stipend (may be eligible)
  • Fulltime
Read More
Arrow Right

Senior AI Infrastructure Engineer

Together AI is building the AI Acceleration Cloud, an end-to-end platform for th...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
together.ai Logo
Together AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
  • 5+ years experience writing high-performance, well-tested, production quality code
  • Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
  • Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale
  • Experience with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD)
Job Responsibility
Job Responsibility
  • Design, build, and maintain performant, secure, and highly-available backend services/operators that run in our data centers and automate hardware management, such as Infiniband partitioning, in-DC parallel storage provisioning, and VM provisioning
  • Design and build out the IaaS software layer for a new GB200 data center with thousands of GPUs
  • Work on a global multi-exabyte high-performance object store, serving massive datasets for pretraining
  • Build advanced observability stacks for our customers with automated node lifecycle management for fault-tolerant distributed pretraining
  • Perform architecture and research work for decentralized AI workloads
  • Work on the core, open-source Together AI platform
  • Create services, tools, and developer documentation
  • Create testing frameworks for robustness and fault-tolerance
  • Fulltime
Read More
Arrow Right

Senior Data Engineer (with AI)

Provectus is a leading AI consultancy and AWS Premier Consulting Partner with 15...
Location
Location
Poland , Wroclaw Metropolitan Area
Salary
Salary:
Not provided
provectus.com Logo
Provectus
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of hands-on engineering experience with production systems
  • Full-stack mindset, comfortable across AI, Backend development, Data, and cloud infrastructure
  • Autonomous working style
  • Experience adopting AI tools in day-to-day workflows (e.g. Claude Code, GitHub Copilot, or similar)
  • Strong sense of ownership and proactivity
  • Openness to broadening skills into adjacent areas
  • B2+ English, comfortable collaborating across distributed, multicultural teams
  • Strong Python and SQL skills and solid software engineering fundamentals
  • Hands-on experience with Apache Spark for large-scale data processing
  • Proficiency with cloud data warehouse technologies: Snowflake, Redshift, or ClickHouse
Job Responsibility
Job Responsibility
  • Design, build, and maintain robust data pipelines and ML systems for production environments
  • Develop and deploy ML and LLM-based solutions addressing real client business challenges
  • Build and maintain ETL/ELT workflows using modern orchestration and distributed computing tools
  • Implement MLOps practices: CI/CD, automated testing, model monitoring, and experiment tracking
  • Architect and implement cloud-native data and AI/ML solutions, primarily on AWS
  • Collaborate closely with Data Scientists, AI/ML Engineers, Backend Engineers, and client stakeholders
  • Participate in code reviews, contribute to technical documentation, and share knowledge within the team
  • Engage in client-facing discussions to understand requirements and propose technical solutions
What we offer
What we offer
  • Impactful work: projects span GenAI, MLOps, and NextGen data platforms for global enterprises across multiple industries
  • Senior-calibre peers: collaborate with top ML and Data professionals across North America, LATAM, and EMEA
  • Career growth: a clear path toward Tech Lead if you have the ambition — we actively develop our engineers
  • Recognised expertise: AWS Premier Consulting Partner featured in Forrester’s AI Technical Services Landscape
Read More
Arrow Right

Senior Software Engineer - AI Infrastructure (Scheduler) - CoreAI

The AI Platform organization builds the end-to-end Azure AI stack, from the infr...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C++, C#, Java, Scala, Rust, Go, TypeScript | OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Work on the design and development of the core AI Infrastructure distributed and in-cluster services that support large scale AI training and inferencing
  • Develop, test, and maintain control plane services written in C#, hosted on Service Fabric or Kubernetes (AKS) clusters
  • Enhance systems and applications to ensure high stability, efficiency and maintainability, low latency, tight cloud security
  • Provide operational support and DRI (on-call) responsibilities for the service
  • Develop and foster a deep understanding of the machine learning concepts, use cases, and relevant services used by our customers
  • Collaborate closely with service engineers, product managers, and internal applied research and data science teams within Microsoft to build better solutions together
  • Provide vision, expertise, and technical leadership to other team members
  • Help to grow talent in these areas
  • Embody our culture and values
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI Data Platform (CoreAI)

Join Microsoft’s CoreAI – AI Platform team in Bay Area/Redmond to build the AI D...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years
Job Responsibility
Job Responsibility
  • Design and build scalable data pipelines and services to automate the dataset lifecycle (ingestion, registration, validation, PII handling, discovery, sharing, lineage), including intelligent agent-driven automation for key stages
  • Develop secure and reliable infrastructure for data access, entitlement management, and operational support across global time zones
  • Implement governance and compliance tooling to ensure data integrity, auditability, and adherence to regulatory standards
  • Create user-facing tools and APIs that make datasets easily discoverable and reusable
  • Contribute to strategic extensions such as continuous feedback loops, human-in-the-loop workflows, and data intelligence services for internal and external stakeholders
  • Collaborate with cross-org partners to align priorities and deliver company-wide impact
  • Fulltime
Read More
Arrow Right