Senior Software Engineer, Cloud Platform Job at Chef Robotics (San Francisco)

Software Engineer II or Senior Software Engineer - Simulation Platform

The AI Frameworks team at Microsoft develops AI software that enables running AI...

Location

United States , Redmond

Salary:

100600.00 - 199000.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C++, C, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Job Responsibility

Developing hardware simulator of next generation AI chips
Technical contribution to design, implementation, verification, and documentation of code ensuring on-time deliveries of simulator releases used daily by parter teams (C++ and Python)
Collaborate broadly across multiple disciplines and with various partner teams from hardware designers to AI models developers
Identify requirements, scope solutions, estimate work, schedule deliverables

Fulltime

Software Engineer II and Senior Software Engineer- Microsoft Security - Platform Team

We have multiple positions open for Software Engineers and Senior Software Engin...

Location

Israel , Tel Aviv, Herzliya

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

B.Sc. or M.Sc. in computer science, software engineering, or equivalent experience
3+ years of professional hands-on software development experience, primarily focused on developing and designing backend services in cloud or on-premises environments
Experience working with Kubernetes and Containers
Experience in working with cloud infrastructure and services

Job Responsibility

Contribute to business-critical initiatives in Microsoft Security
Requiring deep technical skills and the ability to quickly adapt to new areas
Will improve the end-to-end lifecycle of services
Analyze complex system behavior, and apply modern engineering practices to streamline deployments and reduce costs
Working on high-end technologies and collaborating across disciplines to deliver impactful features
Collaborate with multiple teams across Microsoft to deliver key customer solutions and support technology

Fulltime

Senior Cloud Platform Software Engineer

We are seeking a Senior Cloud Platform Software Engineer to join our team and be...

Location

United Kingdom , London

Salary:

Not provided

Zenobē

Expiration Date

Until further notice

Requirements

Strong hands-on experience with AWS services such as EC2, S3, IAM, RDS, Control Tower etc.
Hands-on experience and daily management of Kafka
A working knowledge of Kubernetes
Proficiency in Terraform for managing cloud infrastructure at scale
Familiarity with monitoring/logging tools (e.g., Prometheus, Grafana, ELK, CloudWatch)
Strong automation skills (e.g., Ansible, GitHub Actions) for reliability and operational tasks
Solid understanding and practical experience with GitOps principles and tools, CI/CD pipelines and DevOps best practices
Proficient with version control using Git and collaboration via Git-based workflows
Excellent communication skills, able to present technical information clearly to non-technical stakeholders
Experience mentoring junior engineers and leading others by example

Job Responsibility

Designing, implementing, and managing scalable, secure, and highly available cloud infrastructure
Help the development of our AWS cloud architecture using automation and DevOps practices
Collaborating closely with development teams to troubleshoot complex issues, optimise performance, and enforce compliance with industry standards
Evaluating emerging cloud technologies to align with business goals and drive innovation
Mentoring other engineers, helping your team grow, and taking on some team and project leadership activities
Being a go-to person when another team or another Cloud team member is facing an unknown issue with a production or pre-production workload
Planning, leading and executing on our ideas for a more reliable and scalable usage of AWS
Collaborate across teams to deliver scalable, real-time and batch data pipelines that support our products and analytics
Support and mentor teammates, sharing knowledge and reviewing designs and code
Contribute to the architecture and evolution of our data platform

What we offer

Up to 33% annual bonus
25 days holiday, increasing with length of service up to 30 days, plus bank holidays
Private Medical Insurance
£1,500 training budget per year
EV Salary Sacrifice Scheme
Pension scheme, up to 8% matched contributions
Enhanced parental leave
Cash back health plan

Senior Staff Engineer Software (Cloud Platform, Production & Reliability – Machine Identity Security)

The Production Engineering team is responsible for building, scaling, and operat...

Location

United States , Santa Clara

Salary:

126000.00 - 203500.00 USD / Year

Palo Alto Networks

Expiration Date

Until further notice

Requirements

5+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering (SRE)
Strong experience designing and operating cloud infrastructure on AWS, Azure, or GCP
Deep expertise managing and scaling Kubernetes environments (EKS, AKS, or GKE)
Strong experience with Infrastructure as Code tools (Terraform, Ansible, or Pulumi)
Proven experience designing and maintaining complex CI/CD systems (Jenkins, GitLab CI, ArgoCD, GitHub Actions)
Strong programming/scripting skills (Python, Go, or similar) for automation and tooling
Experience operating in high-scale, 24/7 production environments with ownership of incident response and reliability
Solid understanding of Linux systems and networking fundamentals (DNS, TCP/IP, load balancing, VPC, mTLS)
Strong problem-solving skills and ability to work across teams

Job Responsibility

Design, build, and evolve highly available cloud infrastructure platforms with a focus on scalability, resilience, and reliability
Lead improvements across production systems, including performance, availability, and incident response
Drive and standardize Infrastructure as Code (IaC) practices to improve consistency and reduce operational overhead
Design and optimize CI/CD pipelines to support fast, secure, and reliable software delivery at scale
Partner with development teams to improve system reliability, observability, and cloud-native design patterns
Define and implement monitoring, alerting, and observability strategies across distributed systems
Lead incident response efforts, including root cause analysis and long-term remediation strategies
Identify and eliminate operational toil through automation and system improvements
Mentor engineers and contribute to raising the bar for production engineering practices

What we offer

restricted stock units
bonus

Fulltime

Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

We are looking for a highly skilled engineer with deep expertise in building and...

Location

United States , San Francisco

Salary:

166000.00 - 201000.00 USD / Year

Crusoe

Expiration Date

Until further notice

Requirements

7+ years of experience in infrastructure or platform engineering, with a focus on observability and monitoring systems
Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex), logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch), and tracing platforms (Jaeger, Tempo, OpenTelemetry)
Strong programming skills in Go or Python for automation, operators, and custom integrations
Experience running observability platforms on Kubernetes and operating them at scale across multi-datacenter environments
Proven ability to design, optimize, and scale telemetry pipelines handling high cardinality and high throughput data
Solid understanding of distributed systems, performance engineering, and debugging complex workloads
Strong collaboration skills and the ability to influence engineering teams to adopt observability best practices

Job Responsibility

Designing and operating scalable observability systems (metrics, logging, tracing) across multi-datacenter Kubernetes environments
Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrating with service meshes, load balancers, and APIs
Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
Partnering with engineering teams to embed observability into applications, services, and infrastructure

What we offer

Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement

Fulltime

Senior Software Engineer - Together Cloud Platform

About the Role: Together AI is building the AI Acceleration Cloud, an end-to-end...

Location

United States , San Francisco

Salary:

160000.00 - 230000.00 USD / Year

Together AI

Expiration Date

Until further notice

Requirements

5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems and API microservices
Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources
Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
Demonstrated experience with building and operating high-performance and/or globally distributed microservice architectures across one or more cloud providers (AWS, Azure, GCP)
Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale
Experience developing against and managing a relational database, such as PostgreSQL
Expert-level programmer in one or more of programming language (Golang preferred)
Proficiency in version control practices and integrating IaC with CI/CD pipelines
Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience

Job Responsibility

Identify, design, and develop foundational backend services that power Together’s cloud platform
Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
Partner with product teams to understand functional requirements and deliver solutions that meet business needs
Write clear, well-tested, and maintainable software and IaC for both new and existing systems
Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance
Participate in an on-call rotation to address critical incidents when necessary

What we offer

competitive compensation
startup equity
health insurance
flexibility in terms of remote work

Fulltime

Senior Software Engineer II - Cloud Compute Platform

As a Software Engineer on the Compute Platform team, you will be a key technical...

Location

United States

Salary:

197400.00 - 232000.00 USD / Year

Confluent

Expiration Date

Until further notice

Requirements

8+ years of experience delivering scalable software solutions
Proven track record of leading the delivery of large-scale, highly available, low-latency systems
Deep expertise in Kubernetes including controller development, operator patterns, and multi-cluster architectures
Strong proficiency in Go with experience building production-grade distributed systems
Experience with multi-tenant platform architectures and security isolation patterns
Familiarity with gRPC, Protobuf, and API design for internal platform services
Experience with observability tools and operational excellence practices
Experience with multi-cloud environments (AWS, GCP, Azure) and cloud-provider integrations
Track record of providing technical leadership and mentorship
Track record of working collaboratively across teams including product management, SRE, and other engineering teams

Job Responsibility

Drive the overall technical charter for the Compute Platform, including multi-cluster orchestration, workload placement, and security architecture
Design and implement platform APIs and Kubernetes operators using Go to support evolving workload requirements
Work closely with product management and engineering leadership to build and drive the roadmap for Confluent's Compute Platform, enabling new business opportunities across Confluent
Deliver high-impact initiatives in areas such as workload scheduling, disruption management, network isolation, rolling update strategies, and cross-cluster resource management
Lead technical design reviews and drive architectural decisions across organizational boundaries
Mentor and grow other engineers on the team through code reviews, pairing, and technical guidance
Own operational aspects including availability, reliability, performance monitoring, emergency response, and disaster recovery for our global compute infrastructure

What we offer

Remote-First Work
Robust Insurance Benefits
Flexible Time Away
The Best Teammates
Experience Ambassadors
Open and Honest Culture
Well-Being and Growth
Offers Equity

Fulltime

Senior Principal Software Engineer ( Cloud Infrastructure and Platform Engineering )

Your Career At Palo Alto Networks, Secure Cloud and AI infrastructure is the fou...

Location

United States , Santa Clara

Salary:

Not provided

Palo Alto Networks

Expiration Date

Until further notice

Requirements

BS, MS, or PhD in Computer Science or a related technical field, or equivalent experience
9+ years of relevant software engineering experience, with a proven track record of technical leadership and innovation
Demonstrated experience defining and leading large-scale, cross-organizational technical initiatives from concept to completion
Experience building and scaling platforms that serve thousands of engineers in complex environments
Strong foundation in application and infrastructure security, including secrets management, supply chain security, and secure-by-default platform design
Recognized expertise in developer platforms, cloud-native infrastructure, container orchestration technologies (e.g Kubernetes) and CI/CD
Deep proficiency with a major cloud platform (GCP preferred), including IAM, managed databases, networking, and Workload Identity
Experience designing and maintaining Infrastructure as Code (e.g. Terraform) at scale, including module architecture and state management
Expertise in authentication/authorization systems: OAuth 2.0, OIDC, token lifecycle management, and zero-trust patterns
Hands-on experience applying AI/ML/GenAI to solve complex software engineering problems

Job Responsibility

Define the Vision: Architect and own the technical roadmap for AI-enhanced developer tools and infrastructure in CIPE at Palo Alto Networks
Evaluate and Execute Solutions: Lead the design and implementation of novel systems that leverage Large Language Models (LLMs), static/dynamic analysis, and machine learning to create a world-class, intelligent developer experience
Drive Organization-Wide Impact: You are a builder, so you won't just stop at ideation. Beyond concepts, ensure your builds show step-change improvements in key engineering metrics like including code velocity, review cycle time, test effectiveness, incident reduction, and overall feature launches
Lead Cross-Functional Initiatives: Spearhead complex, cross-functional projects that require influencing and aligning multiple engineering organizations and their leadership
Enable Secure Innovation: Develop foundational AI platforms that empower teams to prototype, deploy, and scale threat-intelligent cloud features, embedding Palo Alto Networks' security natively
Serve as Technical Authority: Act as the go-to expert on AI-augmented cloud platforms, mentoring senior engineers and infusing industry-leading practices into our high-stakes ecosystem
Innovate at Enterprise Scale: Address intricate challenges in multi-cloud environments (AWS, Azure, GCP, and OCI) supporting thousands of microservices, secure workloads, and global threat detection pipelines

What we offer

restricted stock units
bonus

Fulltime

Select Country

Senior Software Engineer, Cloud Platform

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?