CrawlJobs Logo

Senior Software Engineer - Real-Time Workflows & ML Serving

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
India , Bangalore

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Modern ads platforms run on always-on, real-time data: streaming events, feature computation, near-real-time aggregations, and low-latency serving to power ML models that operate at massive scale under strict freshness, cost, and reliability requirements. Microsoft Ads builds and operates large-scale, latency-sensitive systems that serve billions of requests. We are looking for a Sr Software Engineer who is hands-on with production coding and system design to build the real-time data pipelines and feature/embedding materialization systems that feed online stores/caches and integrate tightly with ML inference serving. This role is ideal for engineers who enjoy: building robust streaming + ETL systems (correctness, idempotency, backfills, late data), owning SLOs with strong observability and operational maturity, and optimizing end-to-end performance and cost across compute, storage, and serving integrations. Primary success metrics are freshness, correctness, latency, reliability, and cost in production.

Job Responsibility:

  • Design and implement real-time streaming ETL / feature pipelines (e.g., Flink or Spark Structured Streaming) that meet strict freshness and correctness constraints
  • Build and operate reliable messaging and ingestion with Kafka/Pulsar (partitioning strategy, retries, ordering guarantees, DLQs, backpressure handling)
  • Own data contracts between producers, pipelines, and consumers: schema evolution, versioning, compatibility, validation, and safe rollout
  • Implement production-grade backfill/replay workflows
  • Define and meet SLOs using OpenTelemetry/Prometheus/Grafana for metrics, tracing, dashboards, alerting, and incident response readiness
  • Integrate pipelines with online stores/caches and ML consumers (feature stores, embedding pipelines, LLM API calls, online/offline consistency patterns)
  • Partner with applied scientists on feature/embedding definitions, validation, and end-to-end quality measurement
  • Optimize end-to-end performance and efficiency: CPU/memory/I/O, serialization, caching, network overhead, concurrency, and pipeline compute cost
  • Contribute to serving/inference integrations where needed (e.g., Triton/ONNX Runtime/TensorRT) including batching and latency/cost tradeoffs
  • Ship safely with CI/CD, automated testing (unit/integration/data quality), and operational playbooks/runbooks

Requirements:

  • Bachelor’s or Master’s degree in Computer Science, Electrical/Computer Engineering, or a related field, with 6+ years of related experience
  • Strong programming skills in language C++,C# or Python (at least one required)
  • Hands-on experience in one or more: Building and operating streaming data pipelines in production (Flink or Spark Structured Streaming), Distributed systems engineering with strong reliability and operational rigor, Messaging systems such as Kafka/Pulsar
  • Experience operating services with Kubernetes/containers and production readiness practices (deployments, scaling, rollbacks)
  • Experience with observability stacks such as OpenTelemetry, Prometheus, Grafana

Nice to have:

  • Experience with feature stores, embedding pipelines, and online/offline consistency (freshness guarantees, correctness validation)
  • Experience with data lakehouse/table formats and optimizations eg partitioning, compaction, and incremental processing
  • Experience with GPU inference serving (Triton, ONNX Runtime/TensorRT) and performance techniques (batching, request shaping, tail-latency reduction)
  • Background in cost/performance modeling, capacity planning, and reliability improvements for high-scale data platforms
  • Experience in Ads/search/recommendations or other high-scale systems where freshness, latency, and cost are important

Additional Information:

Job Posted:
February 14, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer - Real-Time Workflows & ML Serving

Senior Software Engineer, Backend

As a Senior Software Engineer, Backend specializing in database architecture and...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 240000.00 USD / Year
chefrobotics.ai Logo
Chef Robotics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
  • 7+ years of professional experience in backend development roles with demonstrated leadership experience
  • Expert knowledge of relational databases (MySQL, PostgreSQL) including schema design, optimization, and administration
  • Strong proficiency with Python and JavaScript/TypeScript with advanced software engineering skills
  • Extensive experience leading projects with at least two web frameworks: Flask, FastAPI, Django, Node.js, or Next.js
  • Proven experience designing and implementing RESTful and GraphQL APIs at scale
  • Advanced understanding of containerization (Docker) and orchestration (Kubernetes) technologies
  • Experience with cloud infrastructure and deployment (AWS, GCP, or Azure) in production environments
  • Proven experience leading complex backend projects and mentoring junior engineers
  • Understanding of data requirements for robotics or automation systems
Job Responsibility
Job Responsibility
  • Lead the design, implementation, and optimization of database schemas to support robot operations, telemetry, recipe management, and system analytics
  • Develop robust data migration strategies and version control for database schema evolution
  • Implement efficient query optimization and indexing strategies to support high-throughput robot operations
  • Establish data integrity protocols and backup systems to ensure operational continuity across customer deployments
  • Create scalable data access layers that balance security, performance, and maintainability
  • Mentor team members on database design patterns and optimization techniques
  • Lead the development and maintenance of scalable APIs to serve robot control systems, dashboards, and monitoring tools
  • Design and implement secure authentication and authorization mechanisms across backend services
  • Develop robust middleware for processing and validating data between robotics subsystems
  • Create service interfaces that enable efficient communication between robotics components and cloud services
What we offer
What we offer
  • medical, dental, and vision insurance
  • commuter benefits
  • flexible paid time off (PTO)
  • catered lunch
  • 401(k) matching
  • early-stage equity
  • Fulltime
Read More
Arrow Right

Senior ML Platform Engineer

At WHOOP, we're on a mission to unlock human performance and healthspan. WHOOP e...
Location
Location
United States , Boston
Salary
Salary:
150000.00 - 210000.00 USD / Year
whoop.com Logo
Whoop
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s Degree in Computer Science, Engineering, or a related field
  • or equivalent practical experience
  • 5+ years of experience in software engineering with a focus on ML infrastructure, cloud platforms, or MLOps
  • Strong programming skills in Python, with experience in building distributed systems and REST/gRPC APIs
  • Deep knowledge of cloud-native services and infrastructure-as-code (e.g., AWS CDK, Terraform, CloudFormation)
  • Hands-on experience with model deployment platforms such as AWS SageMaker, Vertex AI, or Kubernetes-based serving stacks
  • Proficiency in ML lifecycle tools (MLflow, Weights & Biases, BentoML) and containerization strategies (Docker, Kubernetes)
  • Understanding of data engineering and ingestion pipelines, with ability to interface with data lakes, feature stores, and streaming systems
  • Proven ability to work cross-functionally with Data Science, Data Platform, and Software Engineering teams, influencing decisions and driving alignment
  • Passion for AI and automation to solve real-world problems and improve operational workflows
Job Responsibility
Job Responsibility
  • Architect, build, own, and operate scalable ML infrastructure in cloud environments (e.g., AWS), optimizing for speed, observability, cost, and reproducibility
  • Create, support, and maintain core MLOps infrastructure (e.g., MLflow, feature store, experiment tracking, model registry), ensuring reliability, scalability, and long-term sustainability
  • Develop, evolve, and operate MLOps platforms and frameworks that standardize model deployment, versioning, drift detection, and lifecycle management at scale
  • Implement and continuously maintain end-to-end CI/CD pipelines for ML models using orchestration tools (e.g., Prefect, Airflow, Argo Workflows), ensuring robust testing, reproducibility, and traceability
  • Partner closely with Data Science, Sensor Intelligence, and Data Platform teams to operationalize and support model development, deployment, and monitoring workflows
  • Build, manage, and maintain both real-time and batch inference infrastructure, supporting diverse use cases from physiological analytics to personalized feedback loops for WHOOP members
  • Design, implement, and own automated observability tooling (e.g., for model latency, data drift, accuracy degradation), integrating metrics, logging, and alerting with existing platforms
  • Leverage AI-powered tools and automation to reduce operational overhead, enhance developer productivity, and accelerate model release cycles
  • Contribute to and maintain internal platform documentation, SDKs, and training materials, enabling self-service capabilities for model deployment and experimentation
  • Continuously evaluate and integrate emerging technologies and deployment strategies, influencing WHOOP’s roadmap for AI-driven platform efficiency, reliability, and scale
What we offer
What we offer
  • equity
  • benefits
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, ML Platform

We’re looking for a software engineer to join Parafin’s Infrastructure team and ...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 265000.00 USD / Year
parafin.com Logo
Parafin
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of software engineering experience, including experience on ML platform/MLOps systems (training, deployment, and/or feature pipelines)
  • Strong Python
  • solid software design and testing fundamentals
  • Proficiency with SQL
  • hands-on Spark/PySpark experience
  • Knowledge of ML fundamentals—probability & statistics, supervised vs. unsupervised learning, bias/variance & regularization, feature engineering, model evaluation metrics, validation strategies, and production concerns like drift, stability, and monitoring
  • Expertise with modern data/ML stacks—AWS, Databricks (workflows, lakehouse, MLflow/registry, Model Serving), and Airflow (or equivalent orchestration)
  • Experience building real-time systems (service design, caching, rate limiting, backpressure) and batch pipelines at scale
  • Practical knowledge of feature-store concepts (offline/online stores, backfills, point-in-time correctness), model registries, experiment tracking, and evaluation frameworks
  • Strong problem-solving skills and a proactive attitude toward ownership and platform health
Job Responsibility
Job Responsibility
  • Turn notebooks into software
  • Decompose data scientist training/inference notebooks into reusable, tested components (libraries, pipelines, templates) with clear interfaces and documentation
  • Create developer-friendly ML abstractions
  • Build SDKs, CLIs, and templates that make it simple to define features, train/evaluate models, and deploy to batch or real-time targets with minimal boilerplate
  • Build our real-time ML inference platform
  • Stand up and scale low-latency model serving
  • Expand batch ML inference
  • Improve scheduling, parallelism, cost controls, observability, and failure/rollback for large-scale batch scoring and post-processing
  • Own and expand the feature store
  • Design offline/online feature definitions, high read/write throughput, and consistent offline/online semantics
What we offer
What we offer
  • Equity grant
  • Medical, dental & vision insurance
  • Work from home flexibility
  • Unlimited PTO
  • Commuter benefits
  • Free lunches
  • Paid parental leave
  • 401(k)
  • Employee assistance program
  • Fulltime
Read More
Arrow Right

Senior Manager, AI Platform Engineering

Socure is building the identity trust infrastructure for the digital economy — v...
Location
Location
United States
Salary
Salary:
190000.00 - 210000.00 USD / Year
socure.com Logo
Socure
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of professional software engineering experience, including time spent building or operating large-scale ML, data, or distributed systems platforms
  • 3+ years of engineering leadership experience managing multiple teams or engineering managers
  • Strong technical background in ML infrastructure, data engineering, and/or cloud-native distributed systems
  • Demonstrated experience delivering complex, cross-functional platform initiatives
  • Excellent communication and stakeholder management skills, with the ability to translate between technical detail and business priorities
  • Experience working in fast-paced, iterative environments using modern development practices
Job Responsibility
Job Responsibility
  • Develop and own the roadmap for Socure’s AI/ML platform, including data and feature engineering workflows, training infrastructure, experimentation tooling, model deployment/serving, monitoring, and governance
  • Define architecture and standards that create clear, scalable, and secure paths for building and operating AI systems
  • Assess technology options and drive consolidation across the company to reduce fragmentation and improve consistency across the ML toolchain
  • Partner with Data Science, Engineering, Product, and Sales-Enablement teams to develop AI infrastructure that delights Customers
  • Lead the design and operation of the end-to-end ML lifecycle: data ingestion, feature engineering, experimentation, training, model registry, deployment, and continuous monitoring
  • Partner closely with Data Science to enable fast, reproducible experimentation and reduce operational friction
  • Ensure the platform delivers reliability, traceability, observability, and performance for both batch and real-time model workloads
  • Guide the team to deliver high-quality platform capabilities with predictable timelines and strong technical rigor
  • Remove cross-team bottlenecks, align dependencies, and ensure seamless execution across Data, Infrastructure, and Product
  • Establish SLAs, operational standards, and production-readiness guidelines for ML pipelines and serving systems
What we offer
What we offer
  • Offers Equity
  • Offers Bonus
  • benefits
  • Fulltime
Read More
Arrow Right

Senior AI Software Engineer

DefineX is a next-generation consulting house and venture builder, helping finan...
Location
Location
Turkey , Istanbul
Salary
Salary:
Not provided
definex.com Logo
DefineX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS, MS, or PhD in Computer Science/Engineering, Mathematics, or a related field
  • 5+ years of hands-on experience in Java-based software development, ideally within enterprise or banking environments
  • Solid understanding of machine learning concepts and model lifecycles, with hands-on experience integrating ML models into production systems
  • Experience bridging Python-based ML workflows and Java services (e.g. consuming serialized models, model-serving APIs, or hybrid architectures)
  • Practical experience with LLMs, NLP technologies, and predictive modeling techniques
  • Strong proficiency in Java, with working knowledge of Python for AI/ML integration scenarios
  • Familiarity with NoSQL databases (HBase, Elasticsearch, Couchbase, etc.)
  • Experience designing and operating microservices architecture using Kubernetes and/or OpenShift
  • Strong understanding of software architecture, data structures, data modeling, and RESTful web services
  • Experience with containerization, CI/CD, and version control (Docker, Kubernetes, Git)
Job Responsibility
Job Responsibility
  • Design and develop Java-based AI services that integrate machine learning and deep learning models into enterprise systems, including the consumption of Python-trained models (e.g. serialized models such as pickle) within Java-driven architectures
  • Build end-to-end ML model lifecycles covering model integration, versioning, deployment, monitoring, and retraining triggers in production environments
  • Develop prototypes across AI use cases, focusing on production readiness rather than experimentation only
  • Collaborate with data scientists and ML engineers to operationalize models by exposing them via APIs or embedding them into Java-based microservices
  • Build scalable systems and pipelines for high-throughput data processing and real-time or near real-time inference
  • Work closely with business and technical stakeholders to translate business problems into robust AI-enabled software solutions
  • Contribute to long-term AI platform evolution while delivering incremental, high-impact milestones
What we offer
What we offer
  • Growth and Development: Be part of a growing global team of professionals with training and support to help you grow
  • Every DefineXer has a Growth Coach to accelerate their growth through feedback
  • Independence and Ownership: Blur in creative and challenging business and technology transformation projects
  • Time Off: 20 vacation days per annum
  • We love to Give Back: You will get certain hours a year to volunteer and organize office volunteer programs with local NGOs
  • Health and Wellness: Competitive private health and life insurance coverage
  • Fulltime
Read More
Arrow Right

Senior Software Platform Engineer

Solvd Inc. is a rapidly growing AI-native consulting and technology services fir...
Location
Location
Brazil
Salary
Salary:
Not provided
solvd.com Logo
Solvd Inc.
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of professional experience in software engineering and infrastructure engineering
  • Extensive experience building and maintaining AI/ML infrastructure in production, including model, deployment, and lifecycle management
  • Strong knowledge of AWS and infrastructure-as-code frameworks, ideally with CDK
  • Expert-level coding skills in TypeScript and Python building robust APIs and backend services
  • Production-level experience with Databricks MLFlow, including model registration, versioning, asset bundles, and model serving workflows
  • Expert level understanding of containerization (Docker)
  • Proven ability to design reliable, secure, and scalable infrastructure for both real-time and batch ML workloads
  • Ability to articulate ideas clearly, present findings persuasively, and build rapport with clients and team members
  • Strong collaboration skills and the ability to partner effectively with cross-functional teams
Job Responsibility
Job Responsibility
  • Design, implement, and maintain cloud-native platform to support AI and data workloads, with a focus on AI and data platforms such as Databricks and AWS Bedrock
  • Build and manage scalable data pipelines to ingest, transform, and serve data for ML and analytics
  • Develop infrastructure-as-code using tools like Cloudformation, AWS CDK to ensure repeatable and secure deployments
  • Collaborate with AI engineers, data engineers, and platform teams to improve the performance, reliability, and cost-efficiency of AI models in production
  • Drive best practices for observability, including monitoring, alerting, and logging for AI platforms
  • Contribute to the design and evolution of our AI platform to support new ML frameworks, workflows, and data types
  • Stay current with new tools and technologies to recommend improvements to architecture and operations
  • Integrate AI models and large language models (LLMs) into production systems to enable use cases using architectures like retrieval-augmented generation (RAG)
What we offer
What we offer
  • Shape real-world AI-driven projects across key industries, working with clients from startup innovation to enterprise transformation
  • Be part of a global team with equal opportunities for collaboration across continents and cultures
  • Thrive in an inclusive environment that prioritizes continuous learning, innovation, and ethical AI standards
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI

Credit Genie is a mobile-first financial wellness platform designed to help indi...
Location
Location
United States , Pittsburgh
Salary
Salary:
150000.00 - 250000.00 USD / Year
creditgenie.com Logo
Credit Genie
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A Software Engineer with 5+ years of industry experience
  • Strong foundations in multiple programming languages (Python, Java, TypeScript, etc.)
  • Hands-on experience with cloud platforms (AWS, GCP, or Azure)
  • Experienced at designing and implementing distributed, production-grade systems
  • Comfortable with system design, APIs, version control, Infrastructure as Code, and testing
  • Collaborative and excited by fast-moving, problem-solving environments
  • Prior exposure with Machine Learning and AI concepts, tools, or frameworks (e.g., LLMs, vector databases, specialized model serving)
Job Responsibility
Job Responsibility
  • Lead the design and implementation of highly available, scalable backend services and APIs that serve and integrate our AI models and applications into production systems
  • Architect and optimize the services and data pipelines essential for deploying, monitoring, and maintaining real-time AI inferencing and retrieval at scale
  • Collaborate with AI and ML Engineers to improve model deployment, monitoring, and experimentation workflows (AIOps)
  • Drive technical excellence, setting high standards for code quality, system reliability, and performance
  • Mentor and guide other engineers on best practices for building robust backend systems in an AI-focused environment
  • Have fun working on hard and highly impactful problems
What we offer
What we offer
  • Offers Equity
  • Offers Bonus
  • Comprehensive medical, vision, and dental coverage
  • 401(k) retirement plan with company match
  • Short & long term disability insurance
  • Life insurance
  • Flexible PTO
  • 100% company-paid medical, dental, and vision coverage for you and your dependents on your first day of employment
  • Receive up to $100 per month in fitness reimbursement or enjoy a complimentary full membership to LifeTime Fitness or Equinox
  • 401(k) with a 3.5% match and immediate vesting
  • Fulltime
Read More
Arrow Right

Big Data/Java Application Developer

The Big Data/Java Application Developer is an intermediate level position respon...
Location
Location
Canada , Mississauga
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of relevant experience
  • Experience in systems analysis and programming of software applications
  • Experience in managing and implementing successful projects
  • Working knowledge of consulting/project management techniques/methods
  • Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
  • Hands on relevant experience in Angular, HTML, CSS Java, Spring boot, Oracle, NoSQL OR Design, develop, and optimize scalable distributed data processing pipelines using Apache Spark and Scala.
  • Proficiency in Functional Programming: High proficiency in Scala-based functional programming for developing robust and efficient data processing pipelines.
  • Proficiency in Big Data Technologies: Strong experience with Apache Spark, Hadoop ecosystem tools such as Hive, HDFS, and YARN.
  • Programming and Scripting: Advanced knowledge of Scala and a good understanding of Python for data engineering tasks.
  • Data Modeling and ETL Processes: Solid understanding of data modeling principles and ETL processes in big data environments.
Job Responsibility
Job Responsibility
  • Conduct tasks related to feasibility studies, time and cost estimates, IT planning, risk technology, applications development, model development, and establish and implement new or revised applications systems and programs to meet specific business needs or user areas
  • Monitor and control all phases of development process and analysis, design, construction, testing, and implementation as well as provide user and operational support on applications to business users
  • Utilize in-depth specialty knowledge of applications development to analyze complex problems/issues, provide evaluation of business process, system process, and industry standards, and make evaluative judgement
  • Recommend and develop security measures in post implementation analysis of business usage to ensure successful system design and functionality
  • Consult with users/clients and other technology groups on issues, recommend advanced programming solutions, and install and assist customer exposure systems
  • Ensure essential procedures are followed and help define operating standards and processes
  • Serve as advisor or coach to new or lower level analysts
  • Has the ability to operate with a limited level of direct supervision.
  • Can exercise independence of judgement and autonomy.
  • Experience managing an data focused product, ML platform and or UI/UX
  • Fulltime
Read More
Arrow Right