CrawlJobs Logo

Senior Software Engineer - Kubernetes AI

https://www.microsoft.com/ Logo

Microsoft Corporation

Location Icon

Location:
United States , Multiple Locations

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

119800.00 - 234700.00 USD / Year

Job Description:

We are seeking experienced engineers to help build cloud‑native, open‑source AI frameworks and platforms that power AI/ML training, fine‑tuning, inference, and agentic applications at scale. This role focuses on designing and implementing Kubernetes‑native abstractions and operators that make advanced AI workloads reliable, scalable, and easy for developers to consume across cloud and hybrid environments. You will contribute to and help lead work in upstream open‑source communities while shaping and building production‑grade AI platforms used by internal teams and external customers.

Job Responsibility:

  • Design, implement, and maintain Kubernetes operators and controllers for AI/ML workloads
  • Partner with product managers, business stakeholders, and users to understand user pain points deeply and create innovative solutions that delight your customers in an agile development environment
  • Contribute to applicable upstream open-source projects
  • Write technical design documents and participate in architecture reviews
  • Mentor team members and external contributors through code reviews
  • Debug and optimize distributed AI systems running at scale
  • Strive for excellence in everything you do: culture, collaboration, process, tools, design, engineering practices, customer experience, performance, security etc.

Requirements:

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, or Python
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check

Nice to have:

  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, or Python
  • OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, or Python
  • OR equivalent experience
  • Hands‑on experience building or operating AI/ML training, fine-tuning, and inference platforms in cloud‑native environments
  • Proficiency with Go and/or Python for building platform components, Kubernetes operators/controllers, and integrations in production environments
  • Demonstrated experience contributing to or maintaining open‑source software, especially in the Kubernetes, AI/ML, or cloud‑native ecosystem

Additional Information:

Job Posted:
February 16, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Software Engineer - Kubernetes AI

Senior Golang Software Engineer

We are Citi’s Application, Platform and Engineering team, shaping tech direction...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Fluency in Golang
  • Experience designing control and sandboxing systems for AI experimentation
  • Experience maintaining and/or contributing to bug bounty and responsible disclosure programs
  • Understanding of language models and transformers
  • Rich understanding of vector stores and search algorithms
  • Large-scale ETL development
  • Direct engineering experience of high performance, large-scale ML systems
  • Hands-on MLOps experience with appreciation of end-to-end CI/CD process
  • Experience supporting fast-paced startup engineering teams
  • Contributor to open source with methods of creating APIs and ML/Ops automation
Job Responsibility
Job Responsibility
  • Lead the 0-1 build of multiple AI products
  • Design and build high-quality, highly reliable products with user experience at the center
  • Be responsible for engineering innovative, best-in-class AI platforms for the bank
  • Creating firsts in the Generative AI space for Citi as part of the team that defines the strategic direction for the bank
  • Continually iterate and scale Generative AI products while listening to the needs of customers
  • Mentor and nurture other engineers to help them grow their skills and expertise
What we offer
What we offer
  • 27 days annual leave plus bank holidays
  • Discretional annual performance-related bonus
  • Private medical care and life insurance
  • Employee assistance program
  • Pension plan
  • Paid parental leave
  • Special discounts for employees, family, and friends
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Build AI Tools

This role sits within the newly formed GenAI Security team, which is responsible...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Highly motivated self-starter with excellent interpersonal and problem-solving skills
  • Bachelor’s degree or equivalent work experience
  • Good oral and written communication skills
  • Significant relevant industry work experience
  • Experience of the full lifecycle of design, implementation and running of enterprise software solutions involving cross functional team collaboration
  • Expertise in a major programming language such as Python and/or Go, and associated tooling (Git, Maven, IDEs, Jenkins, Bitbucket etc)
  • Expertise in designing and implementing secure APIs and libraries
  • Experience in Generative AI, LLM frameworks, LLM prompt engineering and/or adversarial testing is a bonus
  • Experience with Cyber engineering and Operations, which could include DevSecOps or MLSecOps
  • Experience contributing to the architecture and design (architecture, design patterns, reliability, scaling) of new and current systems
Job Responsibility
Job Responsibility
  • Designing, developing, optimizing, and enhancing a GenAI prompt security platform to protect firm AI/LLM-based applications from adversarial attacks and prompt injections
  • Building and automating a security testing framework to validate protection mechanisms for various LLM use cases
  • Owning solutions that are expected to operate and perform at scale across the organisation
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation, across different time zones
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Observability

The Observability team at Airtable ensures that engineers have the tools they ne...
Location
Location
United States , San Francisco; New York; Seattle
Salary
Salary:
196000.00 - 270000.00 USD / Year
airtable.com Logo
Airtable
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of software engineering experience
  • 3+ years focused on observability or infrastructure at scale
  • Demonstrated success implementing and running production-grade logging, metrics, or tracing systems
  • Proficiency in distributed systems concepts, data streaming pipelines, and container orchestration (Kubernetes)
  • Deep hands-on knowledge of tools such as Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, or ClickHouse
  • Comfort with at least one programming language (e.g., Go, Python, Java) to build and maintain observability tooling
  • Experience mentoring engineers and collaborating across multiple teams
  • Strong communication skills
  • Eagerness to own high-impact initiatives
  • Proven ability to balance short-term fixes with long-term strategic vision
Job Responsibility
Job Responsibility
  • Architect and scale core observability systems
  • Lead the design and evolution of logging, metrics, and tracing pipelines
  • Evaluate and integrate new technologies (e.g., OpenTelemetry, ClickHouse, ELK stack)
  • Guide and mentor a growing team of infrastructure engineers
  • Define and uphold coding standards and operational excellence
  • Partner with Deploy Infrastructure, Service Orchestration, and Product teams
  • Align infrastructure decisions with business goals
  • Own end-to-end reliability for observability tools and establish SLAs, SLOs, and error budgets
  • Optimize performance and cost of large-scale data pipelines
  • Shape the observability roadmap
What we offer
What we offer
  • Opportunity to receive benefits
  • Restricted stock units
  • May include incentive compensation
  • Comprehensive benefit offerings
  • Fulltime
Read More
Arrow Right

Senior AI Engineer

We are seeking a Senior AI Engineer (L4, Individual Contributor) to design, buil...
Location
Location
India , Chennai
Salary
Salary:
Not provided
arcadia.com Logo
Arcadia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of professional software engineering experience
  • 3+ years in AI/ML development
  • Strong expertise in Python, PyTorch/TensorFlow, scikit-learn, and ML tooling (MLflow, LangChain)
  • Proficiency with SQL, cloud services (AWS), containers (Docker, Kubernetes), and distributed systems
  • Understanding of modern AI research (LLMs, diffusion models, transformers)
  • Experience deploying ML models in production with CI/CD
  • Strong analytical skills, ability to balance speed and rigor in experimentation
  • A passion for sustainability and the clean-energy mission
  • Experienced with building agentic pipelines with the latest models from Anthropic, Google, OpenAI, and more
Job Responsibility
Job Responsibility
  • Integrate with LLMs and be an expert in prompt engineering to derive the right results from the models with limited hallucination
  • Design and train ML/AI models (forecasting, NLP, graph learning, generative AI) to improve data quality, cost effectiveness, and system scalability
  • Deploy and optimize models for large-scale production workloads using Python-based services in AWS/Kubernetes environments
  • Build robust, automated data pipelines and ML Ops workflows for continuous training and deployment
  • Research and experiment with modern AI methods (transformers, foundation models, reinforcement learning) and adapt them to energy-sector challenges not limited to utility statements
  • Drive performance improvements in model accuracy, latency, and cost efficiency
  • Collaborate with Product, SRE, and Analytics teams to deliver AI-enabled features across Arcadia’s platform
  • Write clean, maintainable code, contribute to architecture reviews, and mentor junior engineers
  • Build true agentic workflows with multi-step processing incorporating RAG pipelines and MCPs
What we offer
What we offer
  • Competitive compensation and employee stock options
  • Hybrid/remote-first working model (India-based role, with global collaboration)
  • Flexible leave policy
  • Comprehensive medical insurance (self + family members)
  • Annual performance cycle + quarterly recognition awards
  • A supportive, diverse engineering culture grounded in empathy, teamwork, and innovation
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

As a Full-Stack Software Engineer in the Archer AI team, you will design, develo...
Location
Location
United States , San Jose
Salary
Salary:
134400.00 - 168000.00 USD / Year
archer.com Logo
Archer Aviation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.S. or M.S. degree in Computer Science, Software Engineering, or related field
  • 5+ years of professional software engineering experience
  • Strong proficiency in JavaScript/TypeScript and frameworks such as React, Angular, or Vue for frontend development
  • Strong backend development experience with Node.js, Python, Java, or Go
  • Knowledge of containerization and orchestration (Docker, Kubernetes)
  • Experience building and consuming RESTful APIs and/or GraphQL
  • Familiarity with databases (SQL and NoSQL)
  • Understanding of software engineering best practices including CI/CD, version control (Git), testing, and code quality
  • Ability to work across the full stack and quickly adapt to new technologies
Job Responsibility
Job Responsibility
  • Designing, developing, testing, and deploying full-stack web applications
  • Building clean, responsive, and scalable user interfaces
  • Developing backend services, APIs, and data pipelines to support applications
  • Collaborating with cross-functional teams to gather requirements, define technical solutions, and deliver impactful software
  • Writing clean, maintainable, and well-documented code
  • Ensuring performance, security, and scalability of systems
  • Participating in code reviews, architecture discussions, and mentoring junior engineers
  • Staying current with modern frameworks, tools, and best practices in full-stack development
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Backend (AI Agent)

At Cresta, the AI Agent team is on a mission to create state-of-the-art AI Agent...
Location
Location
Canada , Toronto
Salary
Salary:
Not provided
cresta.com Logo
Cresta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science or a related field
  • 5+ years of experience in backend system architecture, cloud services, or related technology fields
  • Proficient in designing and maintaining clear and robust APIs with a strong understanding of protocols including gRPC, REST
  • Previous experience working with Virtual Agent or AI Agent systems
  • Experience in high-performance database schema design and query optimization, including knowledge of SQL and NoSQL databases
  • Experience in containerized application deployment using Kubernetes and Docker in microservices architectures
  • Experience with cloud environments such as AWS, Azure, or Google Cloud, with a strong understanding of cloud security and compliance standards
Job Responsibility
Job Responsibility
  • Design, develop, and maintain scalable and robust backend architectures for Cresta’s AI Agent solutions and proprietary models
  • Collaborate with cross-functional teams including frontend engineers, machine learning engineers to ensure seamless integration of AI Agents into Cresta’s customer solutions
  • Lead initiatives to enhance system scalability and reliability in production environments, focusing on backend services that support AI functionalities
  • Drive efforts to optimize server response times, process large volumes of data efficiently, and maintain high system availability
  • Innovate and implement security measures, cost-reduction strategies, and performance improvements in backend systems supporting AI Agents
What we offer
What we offer
  • We offer Cresta employees a variety of medical, dental, and vision plans, designed to fit you and your family’s needs
  • Paid parental leave to support you and your family
  • Monthly Health & Wellness allowance
  • Work from home office stipend to help you succeed in a remote environment
  • Lunch reimbursement for in-office employees
  • PTO: 3 weeks in Canada
Read More
Arrow Right

Senior Software Engineer, Forward Deployed

As a Senior Software Engineer, Forward Deployed Engineer (FDE) you'll work direc...
Location
Location
United States , Austin; New York; San Francisco Bay Area; Washington DC–Baltimore
Salary
Salary:
165000.00 - 266000.00 USD / Year
invisible.co Logo
Invisible Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of software engineering experience, including significant time spent building data, ML, or backend systems
  • Deep proficiency in Python with hands-on experience using Hugging Face, LangChain, OpenAI, Pinecone, and related ecosystems
  • Skilled in full-stack and API-based deployment patterns, including Docker, FastAPI, Kubernetes, and cloud environments (GCP, AWS)
  • Experienced with workflow orchestration libraries, pub/sub systems (Kafka), and schema governance
  • Expertise in data governance and operations, including Unity Catalog and policy management, cluster/job orchestration, data contracts and quality enforcement, Delta/ETL pipelines, and replay processes
  • Strong product and system design instincts — you understand business needs and how to translate them into technical architecture
  • Experience building usable systems from messy data and ambiguous requirements
  • Excellent communication and client-facing skills
  • you’ve led conversations with technical and non-technical stakeholders alike
  • Proven experience owning projects from scoping through deployment in ambiguous, high-stakes environments
Job Responsibility
Job Responsibility
  • Collaborate with delivery leaders to scope technical solutions to operational problems
  • Identify workflow optimizations through deep engagement with customer problems and work to build into a stable and scalable solution
  • Design and implement AI-powered workflows using LLMs, embedding models, retrieval systems, and automation tools
  • Translate messy real-world constraints (e.g., inconsistent data, latency requirements) into elegant engineering solutions
  • Iterate quickly based on real-time feedback from operators and clients
  • Build reusable tooling and infrastructure that accelerates future deployments
What we offer
What we offer
  • Bonuses and equity are included in offers above entry level
  • Fulltime
Read More
Arrow Right

Senior Software Engineer II

As a Senior Software Engineer II, you will collaborate closely with QA and the b...
Location
Location
Vietnam , Ho Chi Minh City
Salary
Salary:
Not provided
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in software engineering, with a strong focus on backend development and cloud-based systems
  • Proven experience in building test automation frameworks for complex, large-scale systems in cloud environments
  • Proficiency in programming languages such as Go, C#, Java, or similar
  • Deep understanding of distributed systems and cloud-native technologies (e.g., Kubernetes, Terraform, Kafka)
  • Experience using AI tools to improve test automation, software quality, and development pipelines
  • Strong communication skills to collaborate with cross-functional teams and articulate technical concepts effectively
Job Responsibility
Job Responsibility
  • Collaborate closely with QA and the broader engineering teams to develop a scalable test automation system for cloud environments
  • Drive the technical strategy for testing across DEMS, making it easier for developers to write tests and improve overall software quality
  • Utilize your backend engineering expertise to make architectural decisions, conduct code reviews, and contribute to continuous improvements in our development practices
  • Mentor junior engineers and help shape the technical direction of the team
What we offer
What we offer
  • Medical, Dental and Vision Insurance
  • Robust Paid Time Off policy
  • Bonuses
  • Lunch allowance
  • Cell phone stipend
  • Free LinkedIn Learning account or Udemy account
  • Access to 24/7 online emotional and mental support
  • Gym membership
  • Free parking
  • Stocked fridges and pantries - free coffee, cold beverages, snacks
  • Fulltime
Read More
Arrow Right