Platform Engineer - Compute Job at Feedzai

Advanced Platform Engineer - Compute

Feedzai is the world’s first RiskOps platform for financial risk management, and...

Location

Portugal

Salary:

Not provided

Feedzai

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Information Systems, or the equivalent combination of education, experience, and training
6+ years of hands-on experience in platform engineering, DevOps, or cloud infrastructure
Strong programming skills in Go, Java, or similar, with a track record of designing and delivering maintainable systems
Deep experience with container technologies and orchestration (Docker, Kubernetes), including operator development or ecosystem tooling
Proven experience with CI/CD (e.g. Jenkins, GitLab) and GitOps (e.g. FluxCD, Argo CD)
Substantial experience with at least one major cloud provider (AWS, GCP, Azure) and familiarity with cloud-native patterns
Strong experience with monitoring and observability (e.g. Grafana, Prometheus) and using data to drive reliability and performance
Solid experience with Infrastructure-as-Code (e.g. Terraform, Crossplane) and platform lifecycle management
Track record of leading projects, driving technical decisions, and mentoring others
Self-driven, collaborative, and motivated to improve how we build and run the platform

Job Responsibility

Lead the design, implementation, and evolution of Kubernetes Operators and platform services, including deployment, monitoring, and operations
Drive development in Go or similar languages, setting standards and best practices for the team
Own and evolve automation for cloud infrastructure and incident response, and champion self-healing and reliability improvements
Define and improve playbooks, runbooks, and alerting strategies to streamline response and reduce toil
Own and advance the product deployment pipeline and GitOps practices (e.g. FluxCD, Argo CD)
Lead or coordinate incident response, root cause analysis, and post-incident reviews
drive preventive measures
Work with AI-assisted development tools (e.g. Cursor) as part of your daily workflow to ship faster and iterate effectively
Own and extend Infrastructure as Code (IaC) and platform lifecycle (monitoring, alerting, security, cost, configuration, backup) in production
Contribute to developer experience and internal platform capabilities so product teams can ship with less friction

Fulltime

Software Engineer, Compute Platform

We are seeking talented distributed systems engineers who are passionate about b...

Location

United States , Foster City

Salary:

130000.00 - 290000.00 USD / Year

Replit

Expiration Date

Until further notice

Requirements

Distributed systems: Track record of working with platform-as-a-service, distributed storage, or information retrieval systems. Experience in designing scalable architectures and optimizing systems for latency or cost
Problem-solving mindset: Ability to approach complex challenges pragmatically and devise effective solutions
Self-directed and autonomous: Able to work independently, set priorities, and drive projects forward
Versatility and flexibility: Able to wear multiple hats and tackle a wide range of challenges
Continuous learning and adaptability: Passionate about staying up-to-date with industry trends and expanding your skill set

Job Responsibility

Expand Replit's cloud infrastructure offerings: Launch new cloud products to be used by Replit Agent to build complex apps
Enhance reliability and scalability: Identify bottlenecks, optimize critical paths, and implement robust monitoring and alerting systems
Improve utilization of cloud infrastructure: Analyze our infrastructure costs and identify opportunities for optimization

What we offer

Competitive Salary & Equity
401(k) Program with a 4% match
Health, Dental, Vision and Life Insurance
Short Term and Long Term Disability
Paid Parental, Medical, Caregiver Leave
Commuter Benefits
Monthly Wellness Stipend
Autonomous Work Environment
In Office Set-Up Reimbursement
Flexible Time Off (FTO) + Holidays

Fulltime

Senior Software Engineer - Compute Platform

We are seeking a strong Senior Engineer to contribute to the design, development...

Location

India , Bangalore

Salary:

Not provided

Uber

Expiration Date

Until further notice

Requirements

8+ years of software engineering experience, including expertise in distributed systems or infrastructure engineering
Bachelors degree in Compute Science or related field
Experience in Golang, Java, Python, C/C++
Background in large-scale backend infrastructure
Knowledge of cluster management solutions such as Mesos or Kubernetes
Understanding of container technologies such as docker or containerd
Knowledge of operating systems and linux kernel

Job Responsibility

Design, build, and enhance core components of Uber’s Kubernetes-based Compute Platform, focusing on reliability, scalability, and global availability
Implement and optimize Kubernetes controllers, operators, CRDs, and multi-cluster management features to support diverse workloads across on-prem and cloud environments
Work on runtime systems—containerd, Docker, CRI-O—improving image lifecycle, sandboxing, security, and end-to-end pod execution performance
Develop and evolve the infrastructure abstraction layers and APIs that enable developers to deploy, manage, and scale stateful, batch, and mission-critical services with minimal operational overhead
Lead technical initiatives around scheduling, autoscaling, resource management, and workload placement to improve cluster efficiency and ensure high availability
Collaborate with cross-functional teams including Networking, Storage, ML Infra, Developer Productivity, and Data Platform to build solutions and elevate the overall developer experience
Debug, troubleshoot, and resolve complex issues across Linux systems, container runtimes, Kubernetes control plane, and distributed compute workflows
Contribute to architectural discussions, influence long-term design decisions, and help maintain a high technical bar within the Compute Platform team

Sr Staff Software Engineer - Compute Platform

We are seeking a highly experienced Senior Staff Engineer to lead the technical ...

Location

United States , Sunnyvale

Salary:

267000.00 - 297000.00 USD / Year

Uber

Expiration Date

Until further notice

Requirements

10+ years of software engineering experience, including expertise in distributed systems or infrastructure engineering
Deep expertise in Kubernetes internals, container runtimes, and cloud-native compute platforms
Strong background in containerization, resource scheduling, and cluster management at scale
Hands-on experience with performance tuning, reliability engineering, and cost optimization in compute environments
Excellent leadership, communication, and organizational skills, with a track record of building and mentoring high-performing teams
Strong coding proficiency in one or more languages such as Go, Java, or Python
Demonstrated ability to drive cross-functional technical initiatives and deliver impactful results

Job Responsibility

Own the technical vision, architecture, and strategy for the global compute platform org
Define and execute the roadmap for our compute platform, focusing on scalability, performance, and efficiency
Drive architectural decisions and set technical direction for compute scheduling, resource allocation, and container orchestration systems
Ensure high availability and reliability of the compute platform through best-in-class observability, automation, and incident response practices
Drive adoption of best practices in scalability, availability, and security for multi-tenant compute environments
Evaluate emerging technologies in cloud-native ecosystems and guide their integration into the platform
Partner with product and infrastructure teams to deliver high-impact, cross-organizational initiatives
Mentor and coach engineers, helping grow their technical depth and leadership skills
Influence company-wide engineering standards and practices

What we offer

Eligible to participate in Uber's bonus program
May be offered an equity award & other types of comp
All full-time employees are eligible to participate in a 401(k) plan
Eligible for various benefits

Fulltime

Senior Software Engineer II - Cloud Compute Platform

As a Software Engineer on the Compute Platform team, you will be a key technical...

Location

United States

Salary:

197400.00 - 232000.00 USD / Year

Confluent

Expiration Date

Until further notice

Requirements

8+ years of experience delivering scalable software solutions
Proven track record of leading the delivery of large-scale, highly available, low-latency systems
Deep expertise in Kubernetes including controller development, operator patterns, and multi-cluster architectures
Strong proficiency in Go with experience building production-grade distributed systems
Experience with multi-tenant platform architectures and security isolation patterns
Familiarity with gRPC, Protobuf, and API design for internal platform services
Experience with observability tools and operational excellence practices
Experience with multi-cloud environments (AWS, GCP, Azure) and cloud-provider integrations
Track record of providing technical leadership and mentorship
Track record of working collaboratively across teams including product management, SRE, and other engineering teams

Job Responsibility

Drive the overall technical charter for the Compute Platform, including multi-cluster orchestration, workload placement, and security architecture
Design and implement platform APIs and Kubernetes operators using Go to support evolving workload requirements
Work closely with product management and engineering leadership to build and drive the roadmap for Confluent's Compute Platform, enabling new business opportunities across Confluent
Deliver high-impact initiatives in areas such as workload scheduling, disruption management, network isolation, rolling update strategies, and cross-cluster resource management
Lead technical design reviews and drive architectural decisions across organizational boundaries
Mentor and grow other engineers on the team through code reviews, pairing, and technical guidance
Own operational aspects including availability, reliability, performance monitoring, emergency response, and disaster recovery for our global compute infrastructure

What we offer

Remote-First Work
Robust Insurance Benefits
Flexible Time Away
The Best Teammates
Experience Ambassadors
Open and Honest Culture
Well-Being and Growth
Offers Equity

Fulltime

Site Reliability Engineer Platform Engineer

Join a mission-driven, national financial services organization at the heart of ...

Location

United States , Reston

Salary:

Not provided

Tier4 Group

Expiration Date

Until further notice

Requirements

5+ years hands-on operating and managing Kubernetes and OpenShift clusters
Strong experience with Microsoft Azure (compute, networking, storage, and data services)
Proven skills in automation and Infrastructure-as-Code (Terraform, Ansible, GitOps)
Proficiency with observability tooling (Datadog, Prometheus, Grafana)
Scripting/coding ability in Bash, Python, or Go

Job Responsibility

Operate, tune, and optimize OpenShift/Kubernetes clusters (scheduling, ingress, upgrades, quotas, policies)
Stand up and/or refine observability (Datadog, Prometheus, Grafana)—dashboards, alerts, SLOs, runbooks
Map current hybrid topology and critical delivery pipelines
identify toil and prioritize automation (Terraform/Ansible)
Begin supporting Azure environments (compute, networking, storage, data services) used by analytics teams
Drive GitOps-first workflows
harden CI/CD with ArgoCD/Jenkins/GitHub Actions and policy-as-code guardrails
Implement or enhance platform services (Vault, Kafka/AMQ, ingress, service mesh) for dev and data teams
Lead incident response and postmortems
institutionalize RCA, blameless learning, and continuous improvement

Fulltime

Senior ML Platform Engineer, AI Platform

We are seeking a skilled and passionate ML Platform Engineer to join our team an...

Location

Singapore , Singapore

Salary:

Not provided

Airwallex

Expiration Date

Until further notice

Requirements

5+ years in backend software development
at least 2+ years focus on AI/ML Platform or MLOps infrastructure
deep expertise in MLOps practices, including automated deployment pipelines, model optimization, and production lifecycle management
proven experience designing and implementing low-latency model serving solutions
proficiency in Python
skill in writing high-quality, maintainable code
experience in design and development of large-scale distributed, high concurrency, low-latency inference, high availability systems
excellent communication and mentoring abilities
a relevant degree in Computer Science, Mathematics or related fields

Job Responsibility

Platform Development: Design, build, and maintain the end-to-end MLOps platform using Kubernetes and Cloud Services
Infrastructure as Code (IaC): Use Terraform or similar tools to manage, provision, and scale all ML-related infrastructure securely and efficiently
Pipeline Automation: Implement and optimize CI/CD/CT (Continuous Integration, Delivery, Training) pipelines to automate model training, testing, packaging, and deployment using tools like Argo and Kubeflow Pipelines
Serving Infrastructure: Build highly available, low-latency, and high-throughput model serving infrastructure
Observability: Implement robust monitoring, alerting, and logging solutions to track infrastructure health, model performance, and data/model drift
Tooling & Support: Evaluate, integrate, and support ML tools such as Feature Stores and distributed model training pipelines
Security & Compliance: Ensure platform security, implement RBAC (Role-Based Access Control), and manage secrets for sensitive data and production environments
Collaboration: Work closely with Data Scientists and ML Engineers to understand their needs and provide technical guidance on best practices for scaling their models

Fulltime

Platform Engineer - Data Science Platform

Location

United States , Columbus, OH or Dallas, TX or Minneapolis, MN

Salary:

Not provided

Robert Half

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science or a related field, or equivalent practical experience
5+ years of experience supporting Data Science infrastructure
5+ years of hands-on experience with AWS-hosted Data Lake, Data Science, or AI/ML platforms
5+ years of working knowledge with Kubernetes
AWS services such as SageMaker, Glue, Lambda, Athena
CI/CD tools such as Azure DevOps
Infrastructure as Code tools such as Terraform
Container technologies including Docker and Amazon ECR
Security tools such as AQUA and Kenna
Experience producing technical documentation and written solutions

Job Responsibility

Support and maintain ongoing Data Science infrastructure operations
Design, build, and deploy AWS environments using automated CI/CD pipelines
Manage and scale large, secure cloud environments to support current and future Data Science initiatives
Implement, own, and improve the image management lifecycle process
Assist with the setup and ongoing management of AWS accounts dedicated to the Data Science platform
Develop and maintain infrastructure pipelines using CI/CD tools (e.g., Azure DevOps)
Build and manage environments using Infrastructure as Code (IaC) tools such as Terraform
Develop scripts and applications using programming languages such as Python
Manage and support database technologies including Athena, Oracle, MySQL, and PostgreSQL
Leverage AWS services to enable Data Lake, Data Science, and AI/ML workloads

What we offer

medical
vision
dental
life and disability insurance
401(k) plan

Fulltime

Select Country

Platform Engineer - Compute

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?