CrawlJobs Logo

Distributed Systems Cluster Security Software Engineering Lead

cerebras.net Logo

Cerebras Systems

Location Icon

Location:
United States , Sunnyvale

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

140000.00 - 240000.00 USD / Year

Job Description:

In this role, you will be the security czar for the Cerebras’s AI cluster product. Such AI clusters have 100’s of Wafer-scale accelerator systems, 1000’s of high-end servers, and several 1000’s of networking ports including switches. Plus, there will be network attached storage, all in a large-scale datacenter. You will ensure that Cerebras’s large-scale AI clusters are secured through first-principles, best practices, security-first based engineering. Cerebras cluster involves complex HW components, networking and a vertically integrated cluster management software stack – all the way from a bare-metal deployment that brings up an operational cluster to a suite of cluster management software that enables multi-tenant higher-level training and inference services to be hosted on such large clusters. Your role will be to ensure both end-to-end security as well as privacy of such cluster use-cases. You will develop security engineering solutions that have the necessary network access control, user access controls, and world-class multi-tenancy solution

Job Responsibility:

  • Be the primary engineering face and owner of cluster security
  • Provide strong technical leadership in cluster security for the company
  • Actively work with corporate security, and customers to identify and define security enhancements needed
  • Build engineering driven software that will provide guardrails, detection solution and response tools for vulnerabilities at all layers of vertical stack (includes HW and SW)
  • Straddle vertically and horizontally cross functional collaboration to ensure end-to-end cluster software is secure
  • Develop, maintain and execute roadmap of the cluster security product
  • Build an outstanding engineering team to deliver world-class security solution

Requirements:

  • 3+ years of demonstrated engineering leadership/management role in distributed systems security
  • Proven track record of delivering product, launching and deploying secured distributed solutions to customers
  • Excellent communication, articulation, collaboration and ability to act as a stakeholder
  • Tough decision-making skills with data and trade-off analysis
  • Outstanding sense for product and user journeys, out-of-box thinker
  • Outstanding road map and schedule execution skills under tight timeline and budgets
  • Strong background in multi-tenancy of large scale clusters is necessary
  • Strong technical experience in computer and cluster networks is necessary
  • Strong technical background in distributed systems software development (K8s and its ecosystem) is preferred
  • Technical experience with bare metal cluster management software and related monitoring is preferred
What we offer:
  • Build a breakthrough AI platform beyond the constraints of the GPU
  • Publish and open source their cutting-edge AI research
  • Work on one of the fastest AI supercomputers in the world
  • Enjoy job stability with startup vitality
  • Our simple, non-corporate work culture that respects individual beliefs

Additional Information:

Job Posted:
February 17, 2026

Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Distributed Systems Cluster Security Software Engineering Lead

Lead Data Engineer

Sparteo is an independent suite of AI-powered advertising technologies built on ...
Location
Location
Salary
Salary:
Not provided
corporate.sparteo.com Logo
Sparteo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in distributed data systems
  • Proficient in clustering, various table types, and data types
  • Strong understanding of materialized views concepts
  • Skilled in designing table sorting keys
  • Solid programming skills in Python, Java, or Scala
  • Expertise in database technologies (SQL, NoSQL)
  • You are comfortable using AI-assisted development tools (e.g., GitHub Copilot, Tabnine)
  • Proven experience leading data teams in fast-paced environments
  • Ability to mentor junior engineers and foster a culture of growth and collaboration
  • Data-driven decision-making abilities aligned with Sparteo's focus on results and improvement
Job Responsibility
Job Responsibility
  • Data Infrastructure Design and Optimization
  • Lead the design, implementation, and optimization of data architectures to support massive data pipelines
  • Ensure the scalability, security, and performance of the data infrastructure
  • Collaborate with software and data scientists to integrate AI-driven models into data workflows
  • Leadership and Team Management
  • Manage and mentor a team of 2 data engineers, fostering a culture of continuous improvement
  • Oversee project execution and delegate responsibilities within the team
  • Guide technical decisions and promote best practices in data engineering
  • Collaboration and Cross-Functional Engagement
  • Work closely with product managers, developers, and analytics teams to define data needs and ensure alignment with business objectives
What we offer
What we offer
  • A convivial and flexible working environment, with our telecommuting culture integrated into the company's organization
  • A friendly and small-sized team that you can find in our offices near Lille or in Paris
  • Social gatherings and company events organized throughout the year
  • Sparteo is experiencing significant growth both in terms of business and workforce, especially internationally
  • Additional benefits include an advantageous compensation system with non-taxable and non-mandatory overtime hours, as well as a Swile restaurant ticket card
  • Fulltime
Read More
Arrow Right

Windows System Engineer with vRealize / vROPS

Installation and administration of Windows Server infrastructure, VMWare Aria (v...
Location
Location
Romania , Bucharest
Salary
Salary:
Not provided
https://www.inetum.com Logo
Inetum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Install and configure tools and software related to Windows Operating System
  • Create or perform annotations to the installation manuals and procedures
  • Review existing procedures and suggest improvements
  • Perform functional tests of the components that have just been installed, based on written test cases that are present in existing procedures
  • Update and communicate all elements needed by the Technical Lead and support him / her in building the monitoring reports
  • Keep up to date the job descriptions, operating procedures and the documentation related to your day-to-day activities
  • Occasionally provide on-call support outside business hours and document these activities in the client tools
  • Demonstrated previous experience with Windows Server installation, configuration and troubleshooting
  • Administration of Windows Fileserver Cluster
  • Management of Active Directory
Job Responsibility
Job Responsibility
  • Installation and administration of Windows Server infrastructure, VMWare Aria (vRealize / vROPS), MSSQL Cluster and Active Directory
  • Design, build, operate, and improve workflows to automate server deployments, maintenance, and operations
  • Transform traditional infrastructure into a cloud-ready environment
  • Support the integration of the cloud as a service into the infrastructure team's portfolio
  • Comprehensive standardization and automation of administration and configurations
  • Processing and rectification of faults and processing of service requests as 2nd level support as part of incident management
  • Classification and analysis of the causes of errors in the context of problem management
  • Participation in the configuration and asset management process for the configuration items in the area of responsibility
  • Monitoring and compliance with applicable service level agreements and processes in this area
  • Software distribution and update of system infrastructure with patches and security updates
What we offer
What we offer
  • Full access to foreign language learning platform
  • Personalized access to tech learning platforms
  • Tailored workshops and trainings to sustain your growth
  • Medical subscription
  • Meal tickets
  • Monthly budget to allocate on flexible benefit platform
  • Access to 7 Card services
  • Wellbeing activities and gatherings
  • Fulltime
Read More
Arrow Right

Platform Engineer

At evroc, we are building a secure, sovereign, and sustainable hyperscale cloud ...
Location
Location
Sweden , Stockholm
Salary
Salary:
Not provided
evroc.com Logo
Evroc
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proficiency in distributed systems and Linux systems engineering
  • Strong understanding of various infrastructure technologies, including virtualization, containerization, and cloud computing
  • Coding in programming languages such as (but not exclusively) Golang or Rust
  • Experience in building and enhancing compute, storage, and data platforms with exposure to open source products like Kubernetes, Knative, Ceph, Rook and the like
  • Hands-on with infrastructure-as-code tools and automation, such as Terraform, Ansible, or Helm
  • Familiarity with software build processes and secure supply systems, like OpenSSF
  • Strong problem-solving and communication skills to effectively address complex platform engineering challenges
  • Applicants must possess a valid work permit.
Job Responsibility
Job Responsibility
  • Build and design the foundational infrastructure for other engineering teams and customers to build on
  • Create Infrastructure-as-Code deployments and large scale cluster configurations for managing our networking, storage, and compute resources
  • Seamlessly integrate and upkeep open-source components within our evolving tech stack
  • Team up with fellow engineers to craft tailored solutions meeting our unique challenges
  • Forge and refine tools that power team efficiency - this includes CI/CD, local development setups, build toolchains, and essential infrastructure
  • Plot the roadmap for software component development, aligning with team priorities and vision
  • Lead the charge in defining and achieving our technical benchmarks.
What we offer
What we offer
  • We offer a competitive salary and an equity package to attract the best
  • At evroc, diversity is our strength. We champion an inclusive environment where every background - ethnicity, age, gender identity, beliefs, and culture - is celebrated.
  • Fulltime
Read More
Arrow Right

Software Engineer

The Software Engineering team delivers next-generation application enhancements ...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
dell.com Logo
Dell
Expiration Date
February 28, 2026
Flip Icon
Requirements
Requirements
  • 4 - 20 years of experience in systems programming and distributed systems fundamentals (concurrency, networking, storage, consistency, fault tolerance)
  • Proficiency in at least one of C/C++, Java, or Python
  • willingness to learn across the stack
  • Experience with Linux or BSD development and debugging (e.g., performance, strace/dtrace/eBPF, tcpdump)
  • Ability to write clean, testable code
  • familiarity with unit/integration/system testing and CI/CD
  • Must have experience designing subsystems, leading cross-team feature delivery, setting quality bars, improving observability and performance, and driving root-cause and reliability initiatives with clear communication, collaboration, and a bias for action
Job Responsibility
Job Responsibility
  • Own problems end-to-end across design, implementation, testing, deployment, and supportability—within a cluster storage system
  • Build and harden distributed services: durability, consistency, replication, data paths, metadata, control planes, scheduling, placement, and lifecycle management
  • Optimize performance across computer, memory, IO, networking (including RDMA), and storage media (NVMe/SSD/HDD/AFA)
  • drive latency and throughput improvements with data-driven profiling
  • Advance reliability through observability, telemetry, failure injection, chaos testing, and automated remediation
  • raise the bar on serviceability and supportability
  • Collaborate in scrum teams
  • write clear design docs, PRDs, and RFCs
  • perform code reviews and mentor peers
  • Raise product quality via automated tests, CI/CD pipelines, build hygiene, and release engineering
What we offer
What we offer
  • Comprehensive Healthcare Programs
  • Award Winning Financial Wellness Tools and Resources
  • Generous Leave of Absence for New Parents and Caregivers
  • Industry Leading Wellness Platform
  • Employee Assistance Program
Read More
Arrow Right

Senior Engineering Manager

Atlassians can choose where they work – whether in an office, from home, or a co...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A track record building and leading high-performing software development teams in a technical capacity
  • Experience supporting the growth and development of team members including performance management
  • Strong organizational, contribution, communication and project management skills
  • The ability to drive technical excellence, pushing innovation and quality
  • You’re able to spar with senior engineers on systems design, pulling from your background as a hands-on engineer
  • Familiarity with agile software development methodologies
  • A strong customer mindset, and a passion to help your team better understand and support the needs of their customers
  • An ability to be able to pivot from the "big picture" and zoom in on the detail, as required
  • Experience with large scale distributed systems and microservices at scale using cloud-provider-based infrastructure
  • Proficiency in containerized workloads and cluster management software like Kubernetes
Job Responsibility
Job Responsibility
  • Lead, hire and grow a team of high performing engineers including technical leaders
  • Work with leaders across the organization and Principal Engineers/Architects to guide the technical roadmap for scaling and evolving the services
  • Accountable for reliability, security, performance and scale of all the services you own
  • Work with teams across the company to drive adoption of services you own
  • Drive cultural change through technical excellence, quality and efficiency
  • Support teams in driving large projects with complex dependencies and multiple stakeholders
  • Partner across engineering teams to tackle company-wide initiatives spanning multiple projects
  • Help uplift Atlassian’s cloud security, reliability and compliance footprint
What we offer
What we offer
  • health coverage
  • paid volunteer days
  • wellness resources
  • Fulltime
Read More
Arrow Right

Senior Engineering Manager, Micros Foundations

Atlassians can choose where they work – whether in an office, from home, or a co...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A track record building and leading high-performing software development teams in a technical capacity
  • Experience supporting the growth and development of team members including performance management
  • Strong organizational, contribution, communication and project management skills
  • The ability to drive technical excellence, pushing innovation and quality
  • You’re able to spar with senior engineers on systems design, pulling from your background as a hands-on engineer
  • Familiarity with agile software development methodologies
  • A strong customer mindset, and a passion to help your team better understand and support the needs of their customers
  • An ability to be able to pivot from the 'big picture' and zoom in on the detail, as required
  • Experience with large scale distributed systems and microservices at scale using cloud-provider-based infrastructure
  • Proficiency in containerized workloads and cluster management software like Kubernetes
Job Responsibility
Job Responsibility
  • Lead, hire and grow a team of high performing engineers including technical leaders
  • Work with leaders across the organization and Principal Engineers/Architects to guide the technical roadmap for scaling and evolving the services
  • Accountable for reliability, security, performance and scale of all the services you own
  • Work with teams across the company to drive adoption of services you own
  • Drive cultural change through technical excellence, quality and efficiency
  • Support teams in driving large projects with complex dependencies and multiple stakeholders
  • Partner across engineering teams to tackle company-wide initiatives spanning multiple projects
  • Help uplift Atlassian’s cloud security, reliability and compliance footprint
What we offer
What we offer
  • health coverage
  • paid volunteer days
  • wellness resources
  • Fulltime
Read More
Arrow Right

Principal Software Engineering Manager

The HPC/AI (High-Performance Computing and Artificial Intelligence) organization...
Location
Location
United States , Multiple Locations
Salary
Salary:
139900.00 - 274800.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 4+ years people management experience
  • 10+ years of professional software design and development experience in large-scale distributed systems
  • Experience building and operating networking infrastructure for hyperscale datacenters or AI clusters
  • Hands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, MRC, NVLink)
  • In-depth understanding of networking protocols (e.g., Ethernet, TCP/IP, RDMA, gRPC) and distributed systems
  • Familiarity with network virtualization, software-defined networking (SDN), or network performance tuning
  • Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs, and how they interact with networking infrastructure
Job Responsibility
Job Responsibility
  • Hire, manage, and grow a high-performing team of software engineers, fostering a culture of excellence, inclusion, and innovation
  • Lead the design and development of large-scale distributed systems and services that power Azure’s AI infrastructure
  • Drive engineering planning and execution while ensuring alignment with organizational OKRs and long-term strategy
  • Establish lean, scalable, and efficient processes that promote innovation and engineering rigor
  • Deliver best-in-class engineering by ensuring services and components are modular, secure, reliable, diagnosable, observable, and reusable
  • Improve test coverage, automation, and integration testing to proactively identify and resolve reliability gaps
  • Ensure live-site reliability and service health through robust monitoring, telemetry, and automation
  • Collaborate across Microsoft and partner organizations to deliver cohesive, end-to-end infrastructure solutions
  • Apply data-driven insights to optimize performance, scalability, and customer satisfaction
  • Champion Microsoft’s culture by modeling, coaching, and caring—nurturing diversity, inclusion, and continuous growth for your team and peers
  • Fulltime
Read More
Arrow Right
New

Senior Software Engineer II - Cloud Compute Platform

As a Software Engineer on the Compute Platform team, you will be a key technical...
Location
Location
United States
Salary
Salary:
197400.00 - 232000.00 USD / Year
confluent.io Logo
Confluent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience delivering scalable software solutions
  • Proven track record of leading the delivery of large-scale, highly available, low-latency systems
  • Deep expertise in Kubernetes including controller development, operator patterns, and multi-cluster architectures
  • Strong proficiency in Go with experience building production-grade distributed systems
  • Experience with multi-tenant platform architectures and security isolation patterns
  • Familiarity with gRPC, Protobuf, and API design for internal platform services
  • Experience with observability tools and operational excellence practices
  • Experience with multi-cloud environments (AWS, GCP, Azure) and cloud-provider integrations
  • Track record of providing technical leadership and mentorship
  • Track record of working collaboratively across teams including product management, SRE, and other engineering teams
Job Responsibility
Job Responsibility
  • Drive the overall technical charter for the Compute Platform, including multi-cluster orchestration, workload placement, and security architecture
  • Design and implement platform APIs and Kubernetes operators using Go to support evolving workload requirements
  • Work closely with product management and engineering leadership to build and drive the roadmap for Confluent's Compute Platform, enabling new business opportunities across Confluent
  • Deliver high-impact initiatives in areas such as workload scheduling, disruption management, network isolation, rolling update strategies, and cross-cluster resource management
  • Lead technical design reviews and drive architectural decisions across organizational boundaries
  • Mentor and grow other engineers on the team through code reviews, pairing, and technical guidance
  • Own operational aspects including availability, reliability, performance monitoring, emergency response, and disaster recovery for our global compute infrastructure
What we offer
What we offer
  • Remote-First Work
  • Robust Insurance Benefits
  • Flexible Time Away
  • The Best Teammates
  • Experience Ambassadors
  • Open and Honest Culture
  • Well-Being and Growth
  • Offers Equity
  • Fulltime
Read More
Arrow Right