CrawlJobs Logo

Senior Software Engineer - Kubernetes AI

United States, Multiple Locations 119800.00 - 234700.00 USD / Year · Job Posted February 16, 2026
Apply Position
Job Link Share

Job Description

We are seeking experienced engineers to help build cloud‑native, open‑source AI frameworks and platforms that power AI/ML training, fine‑tuning, inference, and agentic applications at scale. This role focuses on designing and implementing Kubernetes‑native abstractions and operators that make advanced AI workloads reliable, scalable, and easy for developers to consume across cloud and hybrid environments. You will contribute to and help lead work in upstream open‑source communities while shaping and building production‑grade AI platforms used by internal teams and external customers.

Job Responsibility

  • Design, implement, and maintain Kubernetes operators and controllers for AI/ML workloads
  • Partner with product managers, business stakeholders, and users to understand user pain points deeply and create innovative solutions that delight your customers in an agile development environment
  • Contribute to applicable upstream open-source projects
  • Write technical design documents and participate in architecture reviews
  • Mentor team members and external contributors through code reviews
  • Debug and optimize distributed AI systems running at scale
  • Strive for excellence in everything you do: culture, collaboration, process, tools, design, engineering practices, customer experience, performance, security etc.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, or Python
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check

Nice to have

  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, or Python
  • OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, or Python
  • OR equivalent experience
  • Hands‑on experience building or operating AI/ML training, fine-tuning, and inference platforms in cloud‑native environments
  • Proficiency with Go and/or Python for building platform components, Kubernetes operators/controllers, and integrations in production environments
  • Demonstrated experience contributing to or maintaining open‑source software, especially in the Kubernetes, AI/ML, or cloud‑native ecosystem

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Software Engineer - Kubernetes AI

8 matching positions

Senior Software Engineer - Build AI Tools

This role sits within the newly formed GenAI Security team, which is responsible...
Location
Location
United Kingdom , Belfast
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Highly motivated self-starter with excellent interpersonal and problem-solving skills
  • Bachelor’s degree or equivalent work experience
  • Good oral and written communication skills
  • Significant relevant industry work experience
  • Experience of the full lifecycle of design, implementation and running of enterprise software solutions involving cross functional team collaboration
  • Expertise in a major programming language such as Python and/or Go, and associated tooling (Git, Maven, IDEs, Jenkins, Bitbucket etc)
  • Expertise in designing and implementing secure APIs and libraries
  • Experience in Generative AI, LLM frameworks, LLM prompt engineering and/or adversarial testing is a bonus
  • Experience with Cyber engineering and Operations, which could include DevSecOps or MLSecOps
  • Experience contributing to the architecture and design (architecture, design patterns, reliability, scaling) of new and current systems
Job Responsibility
Job Responsibility
  • Designing, developing, optimizing, and enhancing a GenAI prompt security platform to protect firm AI/LLM-based applications from adversarial attacks and prompt injections
  • Building and automating a security testing framework to validate protection mechanisms for various LLM use cases
  • Owning solutions that are expected to operate and perform at scale across the organisation
  • Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation, across different time zones
What we offer
What we offer
  • 27 days annual leave (plus bank holidays)
  • A discretional annual performance related bonus
  • Private Medical Care & Life Insurance
  • Employee Assistance Program
  • Pension Plan
  • Paid Parental Leave
  • Special discounts for employees, family, and friends
  • Access to an array of learning and development resources
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Observability

The Observability team at Airtable ensures that engineers have the tools they ne...
Location
Location
United States , San Francisco; New York; Seattle
Salary
Salary:
196000.00 - 270000.00 USD / Year
airtable.com Logo
Airtable
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of software engineering experience
  • 3+ years focused on observability or infrastructure at scale
  • Demonstrated success implementing and running production-grade logging, metrics, or tracing systems
  • Proficiency in distributed systems concepts, data streaming pipelines, and container orchestration (Kubernetes)
  • Deep hands-on knowledge of tools such as Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, or ClickHouse
  • Comfort with at least one programming language (e.g., Go, Python, Java) to build and maintain observability tooling
  • Experience mentoring engineers and collaborating across multiple teams
  • Strong communication skills
  • Eagerness to own high-impact initiatives
  • Proven ability to balance short-term fixes with long-term strategic vision
Job Responsibility
Job Responsibility
  • Architect and scale core observability systems
  • Lead the design and evolution of logging, metrics, and tracing pipelines
  • Evaluate and integrate new technologies (e.g., OpenTelemetry, ClickHouse, ELK stack)
  • Guide and mentor a growing team of infrastructure engineers
  • Define and uphold coding standards and operational excellence
  • Partner with Deploy Infrastructure, Service Orchestration, and Product teams
  • Align infrastructure decisions with business goals
  • Own end-to-end reliability for observability tools and establish SLAs, SLOs, and error budgets
  • Optimize performance and cost of large-scale data pipelines
  • Shape the observability roadmap
What we offer
What we offer
  • Opportunity to receive benefits
  • Restricted stock units
  • May include incentive compensation
  • Comprehensive benefit offerings
  • Fulltime
Read More
Arrow Right

Senior AI Engineer

We are seeking a Senior AI Engineer (L4, Individual Contributor) to design, buil...
Location
Location
India , Chennai
Salary
Salary:
Not provided
arcadia.com Logo
Arcadia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years of professional software engineering experience
  • 3+ years in AI/ML development
  • Strong expertise in Python, PyTorch/TensorFlow, scikit-learn, and ML tooling (MLflow, LangChain)
  • Proficiency with SQL, cloud services (AWS), containers (Docker, Kubernetes), and distributed systems
  • Understanding of modern AI research (LLMs, diffusion models, transformers)
  • Experience deploying ML models in production with CI/CD
  • Strong analytical skills, ability to balance speed and rigor in experimentation
  • A passion for sustainability and the clean-energy mission
  • Experienced with building agentic pipelines with the latest models from Anthropic, Google, OpenAI, and more
Job Responsibility
Job Responsibility
  • Integrate with LLMs and be an expert in prompt engineering to derive the right results from the models with limited hallucination
  • Design and train ML/AI models (forecasting, NLP, graph learning, generative AI) to improve data quality, cost effectiveness, and system scalability
  • Deploy and optimize models for large-scale production workloads using Python-based services in AWS/Kubernetes environments
  • Build robust, automated data pipelines and ML Ops workflows for continuous training and deployment
  • Research and experiment with modern AI methods (transformers, foundation models, reinforcement learning) and adapt them to energy-sector challenges not limited to utility statements
  • Drive performance improvements in model accuracy, latency, and cost efficiency
  • Collaborate with Product, SRE, and Analytics teams to deliver AI-enabled features across Arcadia’s platform
  • Write clean, maintainable code, contribute to architecture reviews, and mentor junior engineers
  • Build true agentic workflows with multi-step processing incorporating RAG pipelines and MCPs
What we offer
What we offer
  • Competitive compensation and employee stock options
  • Hybrid/remote-first working model (India-based role, with global collaboration)
  • Flexible leave policy
  • Comprehensive medical insurance (self + family members)
  • Annual performance cycle + quarterly recognition awards
  • A supportive, diverse engineering culture grounded in empathy, teamwork, and innovation
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

As a Full-Stack Software Engineer in the Archer AI team, you will design, develo...
Location
Location
United States , San Jose
Salary
Salary:
134400.00 - 168000.00 USD / Year
archer.com Logo
Archer Aviation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • B.S. or M.S. degree in Computer Science, Software Engineering, or related field
  • 5+ years of professional software engineering experience
  • Strong proficiency in JavaScript/TypeScript and frameworks such as React, Angular, or Vue for frontend development
  • Strong backend development experience with Node.js, Python, Java, or Go
  • Knowledge of containerization and orchestration (Docker, Kubernetes)
  • Experience building and consuming RESTful APIs and/or GraphQL
  • Familiarity with databases (SQL and NoSQL)
  • Understanding of software engineering best practices including CI/CD, version control (Git), testing, and code quality
  • Ability to work across the full stack and quickly adapt to new technologies
Job Responsibility
Job Responsibility
  • Designing, developing, testing, and deploying full-stack web applications
  • Building clean, responsive, and scalable user interfaces
  • Developing backend services, APIs, and data pipelines to support applications
  • Collaborating with cross-functional teams to gather requirements, define technical solutions, and deliver impactful software
  • Writing clean, maintainable, and well-documented code
  • Ensuring performance, security, and scalability of systems
  • Participating in code reviews, architecture discussions, and mentoring junior engineers
  • Staying current with modern frameworks, tools, and best practices in full-stack development
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Backend (AI Agent)

At Cresta, the AI Agent team is on a mission to create state-of-the-art AI Agent...
Location
Location
Canada , Toronto
Salary
Salary:
Not provided
cresta.com Logo
Cresta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science or a related field
  • 5+ years of experience in backend system architecture, cloud services, or related technology fields
  • Proficient in designing and maintaining clear and robust APIs with a strong understanding of protocols including gRPC, REST
  • Previous experience working with Virtual Agent or AI Agent systems
  • Experience in high-performance database schema design and query optimization, including knowledge of SQL and NoSQL databases
  • Experience in containerized application deployment using Kubernetes and Docker in microservices architectures
  • Experience with cloud environments such as AWS, Azure, or Google Cloud, with a strong understanding of cloud security and compliance standards
Job Responsibility
Job Responsibility
  • Design, develop, and maintain scalable and robust backend architectures for Cresta’s AI Agent solutions and proprietary models
  • Collaborate with cross-functional teams including frontend engineers, machine learning engineers to ensure seamless integration of AI Agents into Cresta’s customer solutions
  • Lead initiatives to enhance system scalability and reliability in production environments, focusing on backend services that support AI functionalities
  • Drive efforts to optimize server response times, process large volumes of data efficiently, and maintain high system availability
  • Innovate and implement security measures, cost-reduction strategies, and performance improvements in backend systems supporting AI Agents
What we offer
What we offer
  • We offer Cresta employees a variety of medical, dental, and vision plans, designed to fit you and your family’s needs
  • Paid parental leave to support you and your family
  • Monthly Health & Wellness allowance
  • Work from home office stipend to help you succeed in a remote environment
  • Lunch reimbursement for in-office employees
  • PTO: 3 weeks in Canada
Read More
Arrow Right

Senior Software Engineer, Forward Deployed

As a Senior Software Engineer, Forward Deployed Engineer (FDE) you'll work direc...
Location
Location
United States , Austin; New York; San Francisco Bay Area; Washington DC–Baltimore
Salary
Salary:
165000.00 - 266000.00 USD / Year
invisible.co Logo
Invisible Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of software engineering experience, including significant time spent building data, ML, or backend systems
  • Deep proficiency in Python with hands-on experience using Hugging Face, LangChain, OpenAI, Pinecone, and related ecosystems
  • Skilled in full-stack and API-based deployment patterns, including Docker, FastAPI, Kubernetes, and cloud environments (GCP, AWS)
  • Experienced with workflow orchestration libraries, pub/sub systems (Kafka), and schema governance
  • Expertise in data governance and operations, including Unity Catalog and policy management, cluster/job orchestration, data contracts and quality enforcement, Delta/ETL pipelines, and replay processes
  • Strong product and system design instincts — you understand business needs and how to translate them into technical architecture
  • Experience building usable systems from messy data and ambiguous requirements
  • Excellent communication and client-facing skills
  • you’ve led conversations with technical and non-technical stakeholders alike
  • Proven experience owning projects from scoping through deployment in ambiguous, high-stakes environments
Job Responsibility
Job Responsibility
  • Collaborate with delivery leaders to scope technical solutions to operational problems
  • Identify workflow optimizations through deep engagement with customer problems and work to build into a stable and scalable solution
  • Design and implement AI-powered workflows using LLMs, embedding models, retrieval systems, and automation tools
  • Translate messy real-world constraints (e.g., inconsistent data, latency requirements) into elegant engineering solutions
  • Iterate quickly based on real-time feedback from operators and clients
  • Build reusable tooling and infrastructure that accelerates future deployments
What we offer
What we offer
  • Bonuses and equity are included in offers above entry level
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Backend

As a Senior Software Engineer, Backend specializing in database architecture and...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 240000.00 USD / Year
chefrobotics.ai Logo
Chef Robotics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
  • 7+ years of professional experience in backend development roles with demonstrated leadership experience
  • Expert knowledge of relational databases (MySQL, PostgreSQL) including schema design, optimization, and administration
  • Strong proficiency with Python and JavaScript/TypeScript with advanced software engineering skills
  • Extensive experience leading projects with at least two web frameworks: Flask, FastAPI, Django, Node.js, or Next.js
  • Proven experience designing and implementing RESTful and GraphQL APIs at scale
  • Advanced understanding of containerization (Docker) and orchestration (Kubernetes) technologies
  • Experience with cloud infrastructure and deployment (AWS, GCP, or Azure) in production environments
  • Proven experience leading complex backend projects and mentoring junior engineers
  • Understanding of data requirements for robotics or automation systems
Job Responsibility
Job Responsibility
  • Lead the design, implementation, and optimization of database schemas to support robot operations, telemetry, recipe management, and system analytics
  • Develop robust data migration strategies and version control for database schema evolution
  • Implement efficient query optimization and indexing strategies to support high-throughput robot operations
  • Establish data integrity protocols and backup systems to ensure operational continuity across customer deployments
  • Create scalable data access layers that balance security, performance, and maintainability
  • Mentor team members on database design patterns and optimization techniques
  • Lead the development and maintenance of scalable APIs to serve robot control systems, dashboards, and monitoring tools
  • Design and implement secure authentication and authorization mechanisms across backend services
  • Develop robust middleware for processing and validating data between robotics subsystems
  • Create service interfaces that enable efficient communication between robotics components and cloud services
What we offer
What we offer
  • medical, dental, and vision insurance
  • commuter benefits
  • flexible paid time off (PTO)
  • catered lunch
  • 401(k) matching
  • early-stage equity
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Generalist

As a Senior Software Engineer, Generalist at Chef Robotics, you'll play a pivota...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 240000.00 USD / Year
chefrobotics.ai Logo
Chef Robotics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
  • 7+ years of professional experience in software development with demonstrated full-stack capabilities
  • Proven experience in software development, with a focus on autonomous systems, robotics, or related fields
  • Strong proficiency in programming languages such as Python and JavaScript/TypeScript, with emphasis on object-oriented design and development
  • Experience with software development tools and frameworks commonly used in robotics and autonomous systems (e.g., ROS, OpenCV, TensorFlow, etc.)
  • Familiarity with sensor fusion techniques, perception algorithms, and other technologies relevant to autonomous robotics operations
  • Advanced understanding of cloud infrastructure and deployment (AWS, GCP, or Azure)
  • Experience with containerization (Docker) and orchestration (Kubernetes) technologies
  • Proven experience leading complex, multi-disciplinary software projects from conception to deployment
  • Strong background in system architecture design and cross-functional technical decision-making
Job Responsibility
Job Responsibility
  • Collaborate with robotics engineers, hardware engineers, and other software engineers across the tech stack to design, develop, and deploy software solutions for food automation robots
  • Participate in all phases of the software development lifecycle, including requirements gathering, design, implementation, testing, deployment, and maintenance
  • Develop robust, scalable, and maintainable software systems that meet the unique challenges of commercial food production environments
  • Implement algorithms for perception, manipulation, motion planning, and control to enable autonomous food preparation behavior
  • Work across frontend dashboards, backend APIs, and cloud infrastructure to build comprehensive solutions that integrate with robotics hardware and AI systems
  • Design and optimize database schemas to support robot operations, telemetry, recipe management, and system analytics
  • Implement efficient data pipelines between on-device robotics systems and cloud services
  • Create data access layers and APIs that enable seamless integration across multiple subsystems
  • Develop real-time data processing systems for robotics telemetry and performance monitoring
  • Establish data integrity protocols and backup systems across distributed robotics deployments
What we offer
What we offer
  • medical, dental, and vision insurance
  • commuter benefits
  • flexible paid time off (PTO)
  • catered lunch
  • 401(k) matching
  • early-stage equity
  • Fulltime
Read More
Arrow Right