Distributed Systems Cluster Security Software Engineering Lead Job at Cerebras Systems (Sunnyvale)

Big Data / PySpark Engineering Lead - Vice President

The Applications Development Technology Lead Analyst is a senior level position ...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Highly experienced and skilled technical lead with 12+years of experience with software building and platform engineering
Experience in Data Engineering, focused on Big Data ecosystems
Knowledge in Hadoop, YARN, Hive, Impala, Spark, and Spark SQL with extensive high volume of data processing pipeline development
Programming Expert level and hand on experience in Python
Familiarity with data formats like Avro, Parquet, CSV, JSON
Hands-on experience in writing SQL queries
Highly experienced with Unix based operating systems and shell scripting
Experience with source code management tools such as Bitbucket, Git etc
Big Data Tech Proficiency and hands-on in Hadoop, Spark, Hive, Kafka, and NoSQL databases (MongoDB, HBase)
Experience working with query engines like Trino, Presto, Starburst

Job Responsibility

Design and implement scalable, fault-tolerant batch and real-time data processing pipelines
Develop robust data models and schema designs optimized for both performance and storage efficiency
Evaluate and integrate emerging tools and frameworks (e.g., Spark, Flink, Kafka) into the existing stack
Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
Legacy Systems Decommissioning: Lead the strategic migration of data and logic from legacy platforms (e.g. on-premises SQL Servers) to a modern Data Lakehouse environment
ETL/ELT Transformation: Re-engineer existing stored procedures and complex legacy ETL jobs into scalable, distributed processing frameworks using Spark (Python) and Starburst/Trino
Validation & Parity Testing: Design and implement automated frameworks for Data Parity Testing to ensure 100% accuracy and consistency between legacy outputs and new big data results
Schema Evolution: Map and transform rigid, legacy relational schemas into flexible, high-performance formats optimized for the cloud (e.g., Parquet, Avro, or Iceberg)
Phased Cutover Management: Orchestrate a phased migration strategy (Parallel Run, Shadow Execution) to ensure zero downtime for downstream business applications and reporting tools

Fulltime

Principal Software Engineering Manager

The HPC/AI (High-Performance Computing and Artificial Intelligence) organization...

Location

United States , Multiple Locations

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
4+ years people management experience
10+ years of professional software design and development experience in large-scale distributed systems
Experience building and operating networking infrastructure for hyperscale datacenters or AI clusters
Hands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, MRC, NVLink)
In-depth understanding of networking protocols (e.g., Ethernet, TCP/IP, RDMA, gRPC) and distributed systems
Familiarity with network virtualization, software-defined networking (SDN), or network performance tuning
Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs, and how they interact with networking infrastructure

Job Responsibility

Hire, manage, and grow a high-performing team of software engineers, fostering a culture of excellence, inclusion, and innovation
Lead the design and development of large-scale distributed systems and services that power Azure’s AI infrastructure
Drive engineering planning and execution while ensuring alignment with organizational OKRs and long-term strategy
Establish lean, scalable, and efficient processes that promote innovation and engineering rigor
Deliver best-in-class engineering by ensuring services and components are modular, secure, reliable, diagnosable, observable, and reusable
Improve test coverage, automation, and integration testing to proactively identify and resolve reliability gaps
Ensure live-site reliability and service health through robust monitoring, telemetry, and automation
Collaborate across Microsoft and partner organizations to deliver cohesive, end-to-end infrastructure solutions
Apply data-driven insights to optimize performance, scalability, and customer satisfaction
Champion Microsoft’s culture by modeling, coaching, and caring—nurturing diversity, inclusion, and continuous growth for your team and peers

Fulltime

New

Senior Database Administrator

We are seeking a highly accomplished Senior Database Administrator to take full ...

Location

Canada , Toronto

Salary:

Not provided

Randstad

Expiration Date

August 27, 2026

Requirements

10+ years of progressive professional experience in enterprise-level database administration, database design, software implementation, maintenance, and multi-tiered vendor support
5+ years of verified, very strong experience writing PL/SQL configurations, constructing complex data transformation routines (ETL), and applying rigorous database query optimization techniques
Direct, hands-on administrative expertise installing, migrating, and configuring enterprise database engines across both Oracle Database and Microsoft SQL Server environments
Practical infrastructure engineering experience configuring and maintaining high-availability technologies, specifically Oracle RAC, GoldenGate, and Data Guard
Proven experience translating on-premise database workloads into production-ready Microsoft Azure and Oracle Cloud Infrastructure (OCI) platforms
High proficiency leveraging industry-standard administrative interfaces and query utilities, explicitly TOAD, Oracle Enterprise Manager (OEM), and DB Artisan
Solid practical background executing database tasks under formalized Incident, Problem, Change, Configuration, and Capacity Management workflows
Outstanding analytical, decision-making, and verbal/written communication mechanics, with an established track record of tracking data dependencies and meeting strict targets under compressed timelines

Job Responsibility

Enterprise DBMS Administration: Lead the installation, multi-node configuration, maintenance, technical cloning, and complex structural upgrade of enterprise relational database environments
Advanced Database Design & Modeling: Author low-level physical database designs, logical storage models, and distributed data layouts to align with enterprise architecture principles
High-Availability & Replication Engineering: Implement, patch, and manage robust disaster recovery and data clustering matrices using Oracle Real Application Clusters (RAC), GoldenGate, and Data Guard technologies
Complex Programmable Logic & ETL: Design, develop, and maintain high-performance, very strong PL/SQL data extraction procedures, database triggers, custom functions, and modular packages
Performance Tuning & Diagnostic Analysis: Run deep-dive system query optimizations, execute schema indexing tuning, audit execution plans, and eliminate wait-state bottlenecks across Oracle and Microsoft SQL Server systems
Cloud Migration & Hybrid Integration: Architect and execute structured data migrations from legacy on-premise iron configurations into Microsoft Azure and Oracle Cloud Infrastructure (OCI) platforms
Proactive Capacity & Infrastructure Planning: Formulate comprehensive hardware and storage projections, compute scaling factors, and manage structural capacity planning parameters
Security Remediation & Patch Management: Research, evaluate, and apply critical technology stack patch sets, maintenance packs, and cybersecurity vulnerability remediations
Operations, Backup, & Recovery Governance: Enforce rigid database rules, oversee enterprise-level automated backup and point-in-time recovery workflows, and manage environment status logs via ITIL procedures (Incident, Problem, Change, and Configuration management)
Cross-Functional Collaboration & Technical Mentorship: Utilize advanced information retrieval packages and administration interfaces (TOAD, Oracle Enterprise Manager - OEM, and DB Artisan) to triage technical problems alongside Cloud Architects, Middleware Engineers (WebLogic), and Project Managers

Fulltime

Staff Software Architect – DevOps

As a Continuous Integration Developer (Architect Level), you will define the arc...

Location

Canada , Markham; Oshawa

Salary:

137300.00 - 203000.00 USD / Year

General Motors

Expiration Date

Until further notice

Requirements

Bachelor's or Master's degree in Engineering, Computer Science, or related field
10+ years of relevant experience in CI/CD, DevOps, automation, or software architecture
Expertise in CI/CD architecture for large distributed systems
Deep Kubernetes knowledge including cluster operations
Strong Terraform/IaC experience with production-grade module design
Experience with embedded or simulation workflows (vECUs, FMUs, SIL)
Cloud architecture experience across AWS, Azure, or GCP
Knowledge of security frameworks such as SLSA or OPA

Job Responsibility

Architect enterprise-scale CI/CD systems for embedded, simulation, and cloud workloads
Define branching, release, and quality-gate standards across multiple engineering domains
Design GitOps-based delivery for multi-cluster Kubernetes environments
Lead progressive delivery strategy including automated rollback and analysis
Establish IaC architecture using Terraform modules and multi-environment patterns
Integrate security and compliance into pipelines and artifact flows
Define observability standards for pipelines, simulation workloads, and runtime systems
Mentor DevOps and simulation developers across the organization

What we offer

Paid time off including vacation days, holidays, and supplemental benefits for pregnancy, parental and adoption leave
Healthcare, dental, and vision benefits
Life insurance plans to cover you and your family
Company and matching contributions to a Defined Contribution Pension plan to help you save for retirement
GM Vehicle Purchase Plan for you, your family and friends

Fulltime

Systems Engineer 3

The Principal Systems Engineer functions as a senior technical authority respons...

Location

United States , Annapolis Junction

Salary:

190000.00 - 225000.00 USD / Year

Columbia Technology Partners

Expiration Date

Until further notice

Requirements

U.S. Citizenship is required for all applicants
All applicants and employees are subject to random drug testing in accordance with Executive Order 12564
Employment is contingent upon successful completion of a security background investigation and polygraph
DOD 8570 Certification
This position requires an active Security Clearance with appropriate Polygraph
Minimum of twenty (20) years of Systems Engineering experience supporting programs of comparable size, scope, and complexity
Bachelor's degree in Systems Engineering, Computer Science, Information Technology, or a related technical discipline
five (5) additional years of relevant Systems Engineering experience may substitute for a degree
Demonstrated experience integrating, sustaining, or supporting large-scale analytic, SIGINT, or mission processing systems (e.g., XKEYSCORE or equivalent) in high-availability, secure environments

Job Responsibility

Architect, integrate, test, deploy, and sustain packet-based hardware and software systems utilizing COTS and GOTS solutions in large-scale distributed environments supporting XKEYSCORE or similar mission platforms
Lead end-to-end systems integration across compute, storage, network, and software components to ensure interoperability, scalability, resilience, and optimized analytic performance
Develop and execute system integration, validation, verification, and deployment strategies to ensure operational readiness and compliance with technical and mission requirements
Design, maintain, and enhance BASH and PERL automation scripts to support system provisioning, configuration management, and lifecycle maintenance of core servers and clustered computing environments
Drive automation initiatives and process improvements to improve system reliability, efficiency, and long-term maintainability across distributed infrastructures
Provide senior-level technical leadership and advanced troubleshooting support, including system upgrades, patching, performance tuning, and lifecycle sustainment in operational settings
Deliver technical training and mentorship to administrators, operators, and engineering teams while enforcing security, compliance, and configuration management requirements for classified systems

What we offer

Medical: 3 superior plans
Vision + Dental: free to you + paid in full by CTP
Retirement: 401k - 6% company contribution
PTO + Leave: customizable leave plans, Jury Duty, Bereavement + Military Leave
Career Growth: Up to $10,000 for approved career-related learning, training, education, and/or tuition
Life and AD&D Insurance/Short-Term & Long-Term Disability: at zero cost to you
Profit Sharing Bonus: End of year cash
Referral Bonus Program: bonuses range from $7,000-$20,000

Fulltime

Platform Engineer

We are seeking a highly progressive Platform Engineer specializing in AI infrast...

Location

Canada , Vancouver

Salary:

43.79 - 58.39 USD / Hour

Randstad

Expiration Date

July 25, 2026

Requirements

3-5 years of dedicated cloud platform engineering or SRE experience working with high-volume distributed systems natively in AWS and Azure
Elite proficiency with Terraform, with an emphasis on creating modular, reusable code structures and multi-environment pipelines
Coding proficiency in Python or Go, with a solid history of integrating with complex REST/JSON APIs
Strong operational working knowledge of GitLab CI/CD, Docker containerization, and cloud orchestration layers
Proven, hands-on exposure to AI/LLM development concepts (advanced prompting, tool/skill integration, and Retrieval-Augmented Generation [RAG])
Extensive experience leveraging AI and Agentic Coding tools to accelerate software delivery and maintain platform scripts

Job Responsibility

Build integration patterns, API mediation layers, and approval workflows supporting autonomous AI agent tool execution and runtime function calling
Integrate advanced distributed telemetry for agent runs (execution traces, evaluation metrics, latency logs, and token cost analytics)
Establish runtime safety controls for AI applications, embedding automated rollback scripts, cost control ceilings, and master kill-switches
Build and scale highly secure, automated multi-cloud landing zones (AWS and Azure) utilizing reusable Terraform modules
Construct and maintain robust GitLab CI/CD pipelines, package registries, and automated infrastructure release strategies
Implement strict automated infrastructure guardrails using Open Policy Agent (OPA), Conftest, or Azure Policies to guarantee security without breaking developer velocity
Embed least-privileged access, zero-trust network segmentation, private endpoints, KMS encryption keys, and advanced secrets management
Champion Site Reliability Engineering standards by managing Service Level Objectives (SLOs), calculating error budgets, configuring autoscaling matrices, and leading chaos engineering simulations
Apply cloud financial management protocols (structured resource tagging, budget alarms, anomaly detection, and cluster right-sizing)
Author clear, accessible developer guides and self-service templates that streamline the adoption of core AI platform features

What we offer

Pioneering Technical Landscape
Elite Multi-Cloud Exposure
High Extensibility Indicators
Premier Workspace

Fulltime

Java Developer

As a Programmer, the beneficiary's job duties will include: • Write, analyze, re...

Location

United States , Melissa

Salary:

59.20 - 59.50 USD / Hour

SAR Tech

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Information Technology, Engineering, or a closely related field
Related experience in Java development, enterprise application development, and backend systems implementation
Strong experience working with Java, Spring Boot, Hibernate, RESTful APIs, and Microservices architecture
Strong knowledge of SQL, database design, query optimization, and experience working with relational databases such as Oracle, MySQL, or PostgreSQL
Experience with cloud platforms such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP)
Knowledge of software development lifecycle (SDLC), Agile/Scrum methodologies, and version control systems such as Git
Understanding of application security, authentication, authorization, and secure coding best practices
Strong analytical, troubleshooting, problem-solving, organizational, communication, and teamwork skills

Job Responsibility

Write, analyze, review, and rewrite program code using Java and Spring Boot frameworks, working from specifications drawn up by architects and technical leads to build and maintain backend microservices and RESTful APIs
Correct errors by making appropriate changes and rechecking program logic, reviewing application logs and distributed traces to identify root causes and validating fixes to ensure desired results are produced
Perform revision, repair, and expansion of existing programs to increase operating efficiency or adapt to new requirements, including refactoring service logic, optimizing database queries, and implementing caching strategies to reduce API response times
Write, update, and maintain computer programs to handle specific jobs such as storing, locating, and retrieving data including code that interfaces with Google Cloud Platform services (AlloyDB, BigQuery) for transactional and analytical workloads
Consult with managerial, engineering, and technical personnel to clarify program intent, identify problems, and suggest changes, including participating in sprint planning, design reviews, and architectural discussions with cross-functional teams
Conduct trial runs of programs and software applications to ensure they will produce the desired results, including writing and executing unit tests, integration tests, and regression test suites validated through CI/CD pipeline runs
Prepare and maintain workflow diagrams and logical operation documents that describe system data flows, API contracts, and microservice interaction patterns, converting them into coded implementations in Java
Compile and write documentation of program development and subsequent revisions in Confluence, inserting inline code comments so others can understand program logic, integration patterns, and configuration details
Design and implement PETE API endpoints to collect structured log data from microservices and store it in BigQuery, enabling centralized monitoring, reporting, and operational observability
Integrate Akeyless Vault within application code to securely retrieve database passwords, API tokens, and service authentication keys, eliminating hard-coded credentials across distributed microservices

What we offer

Medical Insurance
401(k) Retirement Plan

Fulltime

Technical Lead

At Spectro Cloud, we are in search of a talented individual to become an integra...

Location

India , Bengaluru

Salary:

Not provided

Spectro Cloud

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science or related technical field
7+ years of software development experience (or 5+ years with a Master's degree)
Expert-level proficiency in Go
familiarity with Java or similar modern languages
Deep knowledge of Kubernetes architecture, operators, controllers, and custom resources
Hands-on experience with edge computing platforms and infrastructure management patterns
Experience with edge networking technologies, SDN, and distributed network architectures
Understanding of networking protocols, load balancing, and service mesh technologies for edge deployments
Experience with edge infrastructure provisioning, local cluster management, and edge-to-cloud connectivity
Strong architectural and design skills for distributed systems and microservices

Job Responsibility

Designing, optimizing, and streamlining GoLang-based microservices that serve as the foundation of our platform
Ensuring the seamless operation of our platform through a combination of automation, scripting, and rigorous testing
Producing clean and efficient code
Working closely with cross-functional teams to create scalable, dependable, and secure solutions
Staying current with industry trends and emerging technologies

Fulltime

Select Country

Distributed Systems Cluster Security Software Engineering Lead

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?