Software Architect, Reliability Engineering Job at Stytch

Principal Software Engineering Architect

Step into a role where your ideas spark innovation and your impact is demonstrat...

Location

United States , Redmond

Salary:

142800.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
5+ years of experience designing and operating large-scale enterprise services, including production systems
Experience building and operating large-scale infrastructure and network management systems
Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, ARM, CloudFormation) to automate deployment and configuration
Experience designing resilient, secure, and highly available architectures in cloud or hybrid environments
Experience applying AI/ML or generative AI technologies (e.g., LLMs) to real-world engineering problems
Experience building solutions from concept to production
Experience improving monitoring, observability, and incident response for mission-critical systems

Job Responsibility

Partner with stakeholders to define user requirements across key scenarios, with an emphasis on AI-driven operations, intelligent automation, and agent-enabled user experiences
Lead the identification of dependencies and drive the development of design documents for a product, application, service, or platform, incorporating AI-first and agentic architectures that enable autonomous operations and continuous optimization
Mentor others to write and review high-quality, maintainable, and extensible code, while embedding AI-assisted development practices and enabling engineers to effectively leverage copilots and intelligent agents
Collaborate with cross-functional teams to drive project plans, release plans, and execution, integrating AI-powered insights and agent-driven workflows to accelerate delivery and improve decision making
Take end-to-end ownership of services as a Designated Responsible Individual (DRI), including on-call responsibilities, while advancing autonomous operations through agent-based monitoring, incident detection, and response to improve reliability and resilience
Continuously learn and apply new technologies and best practices to improve availability, scalability, and operational excellence, driving adoption of AI-driven observability, predictive insights, and self-healing systems at scale
Embody our culture and values.

Fulltime

Staff Engineer, Software Reliability Engineering

We are seeking a Staff Engineer to join our dynamic team in Bengaluru, India. In...

Location

India , Bengaluru

Salary:

Not provided

Sandisk

Expiration Date

Until further notice

Requirements

Bachelor's degree in CSE or ECE or EEE, Software Engineering, or related field
Master's degree preferred
5 years of software development experience of python scripting and test case development
Advanced proficiency in programming languages such as Java, Python, or C++
Proficient in version control systems, preferably GitHub
Solid understanding of software architecture and design patterns
Experience with API development and integration
Strong skills in performance optimization and debugging
Experience with Agile methodologies and full software development lifecycle
Excellent problem-solving and analytical skills

Job Responsibility

Architect, design, and implement high-performance, scalable test suite for Reliability testing
Collaborate with cross-functional teams to define and implement new features and products
Lead code reviews and provide mentorship to junior developers
Optimize test performance and ensure high-quality, efficient code
Troubleshoot and resolve complex technical issues
Stay current with emerging technologies and industry trends, recommending improvements to our technology stack
Contribute to the development of technical standards and best practices
Participate in Agile ceremonies and help drive continuous improvement in our development processes

Fulltime

Engineering Manager - Observability & Reliability Engineering Obsession

We are looking for an Engineering Manager to join the OREO (Observability Reliab...

Location

France , Paris

Salary:

Not provided

Doctolib

Expiration Date

Until further notice

Requirements

At least 5+ years of software engineering or SRE experience, with a strong technical background in cloud-native environments (preferably AWS, GCP, and/or Kubernetes-based)
3+ years of engineering management experience, leading technical teams (ideally SRE, platform, or infrastructure teams)
Deep understanding of observability tooling and architecture (Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Prometheus, Thanos, Datadog)
Experience with infrastructure as code (Terraform, OpenTofu) and secrets management systems (Vault, AWS Secrets Manager)
Proven ability to balance technical depth with people leadership, able to mentor engineers, review technical designs, and guide architectural decisions

Job Responsibility

Lead, coach, and grow a team of Site Reliability Engineers, supporting their technical development and career progression
Create a culture of operational excellence, continuous improvement, and psychological safety within the team
Conduct regular 1:1s, performance reviews, and career development conversations
Recruit, onboard, and retain top SRE talent aligned with Doctolib's mission and values
Partner with SREs and senior engineers to define and evolve the observability strategy across the platform, focusing on logging, metrics, tracing, and alerting
Own the strategy and evolution of critical transversal services including HashiCorp Vault and Terraform Enterprise
Drive prioritization and roadmap planning for large-scale reliability and observability initiatives
Ensure alignment between team objectives and broader engineering and business goals
Advocate for and allocate resources toward reducing technical debt and improving developer experience
Own the team's on-call experience and contribute to the incident response processes, ensuring sustainable practices and continuous improvement

What we offer

Free comprehensive health insurance for you and your children
Parent Care Program: receive one additional month of leave on top of the legal parental leave
Free mental health and coaching services through our partner Moka.care
For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
Work Council subsidy to refund part of sport club membership or creative class
Up to 14 days of RTT
A subsidy from the work council to refund part of the membership to a sport club or a creative class
Lunch voucher with Swile card

Fulltime

Engineering Manager - Observability & Reliability Engineering Obsession

We are looking for an Engineering Manager to join the OREO (Observability Reliab...

Location

Germany , Berlin

Salary:

Not provided

Doctolib

Expiration Date

Until further notice

Requirements

At least 5+ years of software engineering or SRE experience, with a strong technical background in cloud-native environments (preferably AWS, GCP, and/or Kubernetes-based)
3+ years of engineering management experience, leading technical teams (ideally SRE, platform, or infrastructure teams)
Deep understanding of observability tooling and architecture (Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Prometheus, Thanos, Datadog)
Experience with infrastructure as code (Terraform, OpenTofu) and secrets management systems (Vault, AWS Secrets Manager)
Proven ability to balance technical depth with people leadership, able to mentor engineers, review technical designs, and guide architectural decisions

Job Responsibility

Lead, coach, and grow a team of Site Reliability Engineers, supporting their technical development and career progression
Create a culture of operational excellence, continuous improvement, and psychological safety within the team
Conduct regular 1:1s, performance reviews, and career development conversations
Recruit, onboard, and retain top SRE talent aligned with Doctolib's mission and values
Partner with SREs and senior engineers to define and evolve the observability strategy across the platform, focusing on logging, metrics, tracing, and alerting
Own the strategy and evolution of critical transversal services including HashiCorp Vault and Terraform Enterprise
Drive prioritization and roadmap planning for large-scale reliability and observability initiatives
Ensure alignment between team objectives and broader engineering and business goals
Advocate for and allocate resources toward reducing technical debt and improving developer experience
Own the team's on-call experience and contribute to the incident response processes, ensuring sustainable practices and continuous improvement

What we offer

Free comprehensive health insurance for you and your children
Parent Care Program: receive one additional month of leave on top of the legal parental leave
Free mental health and coaching services through our partner Moka.care
For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
Work Council subsidy to refund part of sport club membership or creative class
Up to 14 days of RTT
A subsidy from the work council to refund part of the membership to a sport club or a creative class
Lunch voucher with Swile card

Fulltime

Lead Systems Software Architect

Roku is changing how the world watches TV. Roku is the #1 TV streaming platform ...

Location

United States , Austin

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

15+ years of industry experience in embedded systems-level software development
Strong experience with embedded Linux or Android-based systems
Proficiency in one or more systems programming languages such as C/C++ (Rust or similar is a plus)
Deep understanding of ARM-based SoCs, multimedia pipelines, and system constraints
Experience with DRM, content protection, secure boot
Experience collaborating with SoC vendors and ODM/OEM partners
Experience with NPU/DSP/AI accelerator blocks on embedded SoCs
Ability to build or integrate end-to-end flows where AI is in the loop
Proficient in using AI tools for debugging, code review, test selection, and log analysis
Strong communication skills

Job Responsibility

Own complex features or subsystems end-to-end, from design and implementation through bring-up, validation, and production support
Translate product and business goals into concrete designs, tasks, and implementation plans
Design, implement, and maintain core platform software for Roku device programs and platforms
Contribute to and influence hardware–software partitioning, platform APIs, and integration patterns
Drive and model best practices for coding standards, code reviews, testing strategies, and CI/CD
Implement and optimize video/audio pipelines, codecs, and rendering paths
Contribute to end-to-end multimedia system design for TVs and streaming devices
Define and help maintain benchmarks and test scenarios for media, graphics, and system behavior
Implement and maintain secure boot, DRM integrations, and content protection features
Lead the product evaluation and enablement of candidate SoCs and companion chipsets

What we offer

Global access to mental health and financial wellness support and resources
Healthcare (medical, dental, and vision)
Life, accident, disability, commuter, and retirement options (401(k)/pension)
Time off in accordance with local leave policies

Fulltime

Digital Software Engineering Lead Analyst – Vice President

The Digital S/W Engineer Lead Analyst is a lead-level professional role. This in...

Location

India , Pune

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

7+ years of progressive software development experience, demonstrating expert-level proficiency in JavaScript and Java frameworks (e.g., React.js, Spring Boot), and databases (e.g., Oracle, MongoDB, PostgreSQL)
Expert in Modern Application Architecture: Mastery of modern application architecture principles, including microservices, event-driven architectures, serverless, and cloud-native patterns
Deep expertise in Data Structures, Algorithms, and Object-Oriented Design Principles with Java
Proven leadership in leveraging and integrating Artificial Intelligence (AI) and Machine Learning (ML) tools to optimize development workflows, enhance code quality, and drive intelligent features
Extensive experience with Microservices frameworks (e.g., Spring Boot, Quarkus), Event-Driven Services (e.g., Kafka, RabbitMQ), and advanced Cloud-Native Application Development (AWS, Azure, GCP)
Multiple years of experience leading the design and implementation of Service-Oriented and Microservices architectures, including advanced REST, GraphQL, and gRPC implementations
Full Stack Architecture & Leadership: Demonstrated ability to architect, design, develop, and maintain complex, enterprise-grade full-stack solutions, encompassing both front-end and back-end components of robust web applications, with an emphasis on scalability and performance
Front-End Expertise: Expert-level proficiency in designing and developing highly intuitive, performant, and accessible user interfaces using cutting-edge JavaScript frameworks (e.g., React, Angular, Vue), advanced HTML5, and CSS (e.g., SASS/LESS, CSS-in-JS)
Back-End Mastery: Extensive experience in architecting and developing scalable server-side logic and sophisticated APIs using languages such as Java, Python, or similar, with a focus on high-throughput and low-latency systems
Advanced Database & Data Architecture Expertise: Comprehensive knowledge of SQL and PL/SQL, with a deep understanding of Relational Database Management Systems (RDBMS), particularly Oracle, including advanced database design, performance tuning, data warehousing, and NoSQL databases

Job Responsibility

Strategic Technical Leadership: Provide expert guidance and strategic oversight across the entire software development lifecycle, partnering continuously with senior stakeholders to align technical solutions with business objectives
Architectural Stewardship: Lead the design and evolution of robust, scalable, and secure enterprise applications, defining architectural patterns and ensuring adherence to best practices in cutting-edge technologies and software design patterns
Team & Project Leadership: Drive complex engineering initiatives within Agile delivery teams, fostering a culture of collaboration, excellence, and continuous improvement. Lead sprint goal achievement, oversee code quality, and actively participate in and lead broader Citi technical communities and advanced Agile/Scrum processes
Mentorship & Coaching: Act as a technical mentor and coach for junior and intermediate engineers, fostering their growth, critical thinking, and advanced problem-solving capabilities
Advanced Problem Solving & Troubleshooting: Exhibit mastery in analyzing and resolving intricate coding, application performance, and design challenges. Lead cross-functional efforts to diagnose and troubleshoot complex system issues
Proactive Root Cause Analysis: Spearhead thorough investigations to identify systemic root causes of development and performance bottlenecks, leading the implementation of comprehensive, long-term defect resolutions and preventative measures
Technical Vision & Acumen: Demonstrate a profound and forward-looking understanding of technical requirements, emerging trends, and their strategic implications for solutions under development, ensuring future-proof designs
Containerization, Orchestration & Cloud Strategy: Drive the strategic adoption and optimization of Docker for application containerization, Kubernetes for efficient service orchestration, and other cloud-native technologies to build resilient and scalable infrastructure
Communication, Risk & Stakeholder Management: Master effective communication of progress, proactively anticipate and mitigate technical and project bottlenecks, provide expert escalation management, and adeptly identify, assess, track, and manage issues and risks at strategic and operational levels
Process and System Optimization: Champion and lead initiatives to streamline, automate, and eliminate redundant processes within architecture, build, delivery, production operations, and across various business areas, driving significant efficiency gains and innovation

Fulltime

Director, Site Reliability Engineering

As our Director of Infrastructure platform, you will be a key driver of Doctolib...

Location

France , Paris

Salary:

Not provided

Doctolib

Expiration Date

Until further notice

Requirements

12+ years in software engineering, including 6+ years leading large (30+) distributed, international platform or infrastructure teams
Proven experience driving platform-as-a-product transformations and modularizing large monolithic architectures at scale
Demonstrated ability to architect, deliver, and operate secure, reliable, and scalable developer platforms in SaaS, multi-product, or regulated environments
Strong process orientation: experience implementing OKRs, robust monitoring/observability, and best-in-class incident management
Measurable impact on developer productivity, platform adoption, reliability, and cost-efficiency
Effective communicator and influencer, with the ability to align and inspire cross-functional stakeholders
Experience leading change and building high-performing, people-first engineering cultures
Fluent in English and comfortable in fast-paced, international environments

Job Responsibility

Lead and scale a high-performing infrastructure organization of 30+ engineers across Infrastructure, Automation, SRE, and Database teams, while maintaining strong engagement and fostering a culture of excellence and ownership
Own the infrastructure platform strategy and roadmap that enables Doctolib's modularization journey, delivers on company OKRs, and ensures predictable execution across all infrastructure and automation initiatives
Champion platform-as-a-product by building self-service capabilities (infrastructure provisioning, CI/CD, observability, database management) that transform developer experience and unlock team autonomy across the engineering organization
Be the guardian of quality and reliability by establishing world-class incident management, driving measurable improvements in availability and performance, and ensuring infrastructure components operate at the highest standards of security and resilience
Accelerate engineering velocity by reducing platform friction, enabling faster modularization, and leveraging AI-augmented development tools to multiply productivity across feature teams
Drive the infrastructure transformation from monolith-supporting infrastructure to a modular, multi-service platform architecture - enabling international expansion, product velocity, and operational excellence at scale
Act as a senior technical leader within the Platform organization and broader Tech leadership team, bringing strong technical opinions and challenging architectural decisions while clearly articulating how infrastructure investments contribute to company strategy and business outcomes

What we offer

Free comprehensive health insurance for you and your children
Parent Care Program: receive additional leave on top of the legal parental leave
Free mental health and coaching services through our partner Moka.care
For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
Work from abroad for up to 10 days per year thanks to our flexibility days policy
Work Council subsidy to refund part of sport club membership or creative class
Up to 14 days of RTT
Lunch voucher with Swile card

Fulltime

Principal AI Software Architect

Do you want to be at the forefront of innovating the latest hardware designs to ...

Location

United States , Redmond

Salary:

139900.00 - 274800.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, PyTorch, CUDA/Triton
Ability to meet Microsoft, customer and/or government security screening requirements
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Job Responsibility

Leads by example across teams and mentors others to produce extensible, maintainable, well-tested, secure, and performant code used across products that adheres to design specifications
Leads efforts to continuously improve code performance, testability, maintainability, effectiveness, and cost, while learning about and accounting for relevant trade-offs
Identifies best practices and coding patterns (e.g., leveraging state-of-the-art generative artificial intelligence [GenAI], approaches to source code organization, naming conventions) and provides deep expertise in the coding and validation strategy
Creates and applies metrics to drive code quality and stability, appropriate coding patterns, and best practices
Identifies and anticipates blockers or unknowns during the development process, escalates them, communicates how they will impact timelines, and then leads efforts to identify and implement strategies and/or opportunities to address them
Reviews product code and test code to ensure it meets team standards, contains the correct test coverage, and is appropriate for the product or solution area
Brings insight to code reviews to help improve code quality, coaching and providing feedback to develop other engineers' skills
Conducts code reviews in a timely fashion that helps accelerate the pace of development on the team. Considers diagnosability, reliability, testability, and maintainability when reviewing code, and understands when code is ready to be shared or delivered
Applies and reviews for coding patterns, security risks, compliance issues, and best practices in code reviews, providing feedback on code to drive adherence to best practices
Uses automated source code analysis tools that are incorporated into the build/development process

Fulltime

Select Country

Software Architect, Reliability Engineering

Job Description

Job Responsibility

Requirements

Nice to have

What we offer

Looking for more opportunities?