DevOps - Platform and Reliability Engineer Job at Cognitive Space (Houston)

Platform Engineer DevOps

We are looking for an experienced Platform Engineer DevOps to ensure that the fo...

Location

France , Paris

Salary:

Not provided

cozycozy

Expiration Date

Until further notice

Requirements

5+ years of hands-on experience in Platform Engineering, Infrastructure or DevOps
Expertise in operating and scaling Kubernetes and Docker in production environments
Proven experience managing hybrid cloud / on-premises infrastructure for high-traffic applications
A strong background in designing and implementing robust CI/CD pipelines (GitLab CI, Jenkins, etc.)
Experience with Infrastructure as Code (Terraform, Ansible, etc.)
Experience with monitoring, alerting, and reliability practices (SRE principles)
The mindset to mentor and guide other engineers, fostering a culture of automation and operational excellence
Excellent communication skills in English
The demonstrated ability to drive complex projects

Job Responsibility

Implement, maintain and secure infrastructure (cloud, bare-metal, Kubernetes clusters)
Automate environment configuration using Infrastructure as Code (e.g.,Terraform, Ansible) and adhere to GitOps principles
Implement full-stack observability (metrics, logs, traces), sophisticated alerting, and participate in the incident management lifecycle
Ensure compliance with Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all managed services
Implement and manage secrets management systems
Contribute to the design and evolution of hybrid infrastructure
Define, lead, and maintain engineering standards for security, reliability, and technology selection across the organization, supporting the Head of Engineering in defining the platform roadmap
Drive continuous improvement initiatives for cloud cost optimization, scalability, performance, and platform security posture
Maintain comprehensive, up-to-date documentation and best practices to foster self-service and cross-team enablement
Design, implement, and maintain CI/CD pipelines (using GitLab CI, Github, and/or Jenkins) tailored for microservice architectures built with Node.js

What we offer

Competitive salary
stock options
Alan health insurance
Swile card
unlimited coffee, tea, snacks, and drinks in the office

Senior Software Engineer – DevOps Platform

We’re looking for a Senior Software Engineer to join our Devops team, where you ...

Location

United States , Palo Alto; New York City

Salary:

172000.00 - 228000.00 USD / Year

Wealthfront

Expiration Date

Until further notice

Requirements

Extensive experience with running and troubleshooting modern Linux systems and services in production
6+ years of experience developing reliable production-grade software in Java, Go, or other similar languages
Proficiency with at least one automation technology such as Terraform, Chef, or Puppet
Successfully designed and deployed mission-critical complex distributed systems
Excellent critical thinking and communications skills with a desire to both learn from and educate your peers
A BS or MS in Computer Science or an Engineering field, or equivalent professional experience

Job Responsibility

Maintain our core infrastructure by writing software to automate application deployment, configure our infrastructure, and manage critical services such as our databases
Ensure that mission critical services operate reliably by triaging and fixing operational issues as an on-call engineer, participating in post-mortems, and implementing improvements to prevent future issues
Design, implement, and deploy internal tools and services to accelerate productivity of the wider Engineering team and enable direct ownership of operations
Help manage our server hardware in our physical data centers which may occasionally include travel to our Bay Area or New Jersey data centers for onsite projects
Be involved in key decisions regarding the evolution of our infrastructure
Mentor junior members of the team

What we offer

medical
vision
dental
401K plan
generous time off
parental leave
wellness reimbursements
professional development
employee investing discount

Fulltime

Platform Engineer

Motorica is at a breakthrough moment. We’ve built a generative AI animation plat...

Location

Sweden , Stockholm

Salary:

Not provided

Motorica

Expiration Date

Until further notice

Requirements

Proven experience in Platform Engineering, SRE, or DevOps, ideally in high-growth or AI/ML-heavy environments
Strong grasp of CI/CD systems, cloud infrastructure (AWS/GCP), and containerization (Docker/Kubernetes)
Familiarity with observability, monitoring, and incident response best practices
Security mindset with hands-on experience in audits, compliance (ISO 27001, SOC2, etc.), and vulnerability management
Strong communication skills
you’ll be interfacing with developers daily and need to translate infrastructure into clarity, not complexity
A proactive, solution-oriented mindset: you anticipate friction before others feel it

Job Responsibility

Provide common infrastructure guidance, reusable patterns, and automated tooling to engineering teams
Own the “paved road” for developers, reducing friction and cognitive load
Champion and implement security best practices across the entire platform
Play a key role in achieving ISO 27001 certification through technical implementation and evidence gathering
Build and operate a highly reliable and cost-efficient platform, with particular focus on optimizing GPU-heavy AI/ML workloads
Manage CI/CD systems (GitHub Actions, GitLab CI) and track key metrics like build times, deployment frequency, and failure rates
Oversee cloud environments (AWS, GCP), including health, security, and cost reporting
Lead security scans, audits, and vulnerability remediation
Maintain observability stack (Prometheus, Grafana, Datadog, GCP Logging), ensuring meaningful dashboards and alerts
Act as point-of-contact for ML Research team’s infra requests (GPU access, specialized pipelines)

What we offer

Stock Options program
Retirement Plan
Health Benefits (5000 SEK/year)
Life Insurance / Health Insurance / Injury Insurance
Competitive compensation

Fulltime

Platform Engineering Manager

As an Engineering Manager on the Platform Engineering team at Arrive Logistics, ...

Location

United States , Austin

Salary:

Not provided

Arrive Logistics

Expiration Date

Until further notice

Requirements

5+ years of engineering experience, with significant time spent in systems, software, platform, site reliability, or DevOps engineering
1+ year of people management or team leadership experience, including performance and career development
Demonstrated ability to pragmatically balance business priorities with technical constraints
Experience driving initiatives, while holding teams accountable for results and ensuring continuous improvement
Strong analytical, problem-solving, and decision-making skills
Significant experience building or operating cloud applications on a major provider such as Azure (preferred), AWS, or GCP
Expertise or significant experience with containerized workloads and Kubernetes or similar orchestration systems

Job Responsibility

Lead and manage a Platform Engineering team, ensuring members of the team’s performance aligns with the overall department goals by holding direct reports accountable to performance expectations
Coach, mentor, and advocate for your engineers, growing a stronger team through stronger individuals
Responsible for the delivery of high-quality, scalable platform solutions to production on a regular basis, owning the quality, documentation, tests, and communication across teams
Manage the team’s day-to-day with a high level of awareness of what your team/teams are doing to remove blockers and ensure efforts are aligned with the priorities
Keep your team informed of the priorities and vision that is driving the organization, making sure they understand the what and the why, to advance the organization’s goals
Work with the Director of Engineering, peers, and Product partners to scope, plan, and execute platform initiatives with impeccable communication around estimates and risk
Advance the organization’s goals within your team and ensure success in hitting them
Drive the development of custom tools and automation that hide Kubernetes complexity, streamline onboarding, remove manual toil, and improve mean time to detection and resolution
Oversee the design of robust CI/CD pipelines for applications and infrastructure
Partner with stakeholders across the organization to understand and drive requirements for the internal developer platform, assessing benefit and risk analysis, and advising on the best course of action

What we offer

Take advantage of our comprehensive benefits package, including medical, dental, vision, life, disability, and supplemental coverage
Invest in your future with our matching 401(k) program
Build relationships and find your home at Arrive through our Employee Resource Groups
Enjoy office wide engagement activities, team events, happy hours and more
Leave the suit and tie at home
our dress code is casual
Work in the booming city of Austin, TX – we are in a convenient location close to the airport and downtown
Park your car for free on site
Start your morning with a specialty drink from our fully stocked coffee bar, Broker’s Brew
Sweat it out with the team at our onsite gym

Fulltime

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...

Location

India , Bangalore

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
Minimum 2 years of experience managing or leading cloud operations teams
Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
Familiarity with modern CI/CD automation and tools
Excellent communication, stakeholder management, and team-building skills
Experience scaling SRE practices in high-growth or large-scale environments
Ability to balance long-term reliability initiatives with short-term delivery needs.

Job Responsibility

Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
Define and track key reliability metrics, and report on team performance and system health to leadership
Contribute to hiring, onboarding, and career development for SREs.

What we offer

Health & Wellbeing benefits for physical, financial, and emotional wellbeing
Personal & Professional Development programs
Unconditional inclusion in the workplace.

Fulltime

Senior Site Reliability Engineer

This is a role at Baxter where your work impacts saving and sustaining lives thr...

Location

United States , Deerfield

Salary:

96000.00 - 132000.00 USD / Year

Baxter

Expiration Date

Until further notice

Requirements

Bachelor's degree in computer science, IT, or related field (or equivalent experience)
Prior experience in Site Reliability Engineering and cloud-based infrastructure management
Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
Azure administration and operations experience, with certifications a plus
Knowledge of related technologies, including cloud, encryption, and security protocols
Systems administration experience in Windows and Linux environments
Proven problem-solving skills and experience with scripting and automation tools
Ability to create accurate documentation and reports, with excellent communication skills
Applicants must be authorized to work for any employer in the U.S.
Unable to sponsor or take over sponsorship of an employment visa at this time.

Job Responsibility

Drive strategies to ensure 24x7 availability of services and business continuity for customer-facing healthcare software applications and platforms hosted on Microsoft Azure cloud
Manage and administer Azure resources, including virtual machines, databases, and networking components
Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
Define and refine Operations SLAs to maintain high level of Customer Satisfaction
Establish non-functional requirements to meet SLAs
Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
Define key performance indicators that can be monitored, measured, and used to derive opportunities
Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes.

What we offer

Support for Parents
Continuing Education/Professional Development
Employee Health & Well-Being Benefits
Paid Time Off
2 Days a Year to Volunteer
Medical and dental coverage starting day one
Insurance coverage for basic life, accident, short-term and long-term disability
Business travel accident insurance
Employee Stock Purchase Plan (ESPP)
401(k) Retirement Savings Plan

Fulltime

Site Reliability Engineer

As a highly skilled Site Reliability Engineer (SRE), you will contribute to buil...

Location

United States , New York City; San Francisco

Salary:

160000.00 - 300000.00 USD / Year

Hebbia

Expiration Date

Until further notice

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
5+ years software development experience at a venture-backed startup or top technology firm
Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role
Strong expertise in managing CI/CD pipelines and deployment automation
Proficiency in cloud platforms such as AWS, Azure, or Google Cloud (we are an AWS shop)
Solid understanding of containerization and orchestration technologies such as Docker and Kubernetes
Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, or similar
Knowledge of infrastructure-as-code (IaC) tools such as Terraform or CloudFormation
Familiarity with security best practices and tools for infrastructure and application security
Excellent problem-solving skills and the ability to troubleshoot complex issues

Job Responsibility

Assist in managing deployment pipelines to facilitate smooth and efficient software releases
Help implement and maintain observability solutions for monitoring system performance and reliability
Support local development environments to optimize developer workflows
Work with development teams to ensure infrastructure aligns with project requirements
Contribute to improving the security of our infrastructure by assisting with proactive measures and audits
Assist in developing and maintaining automation scripts and tools to enhance operational efficiency
Help troubleshoot and resolve infrastructure and application issues to minimize downtime and maintain smooth operations
Participate in evaluating and integrating new technologies to enhance the scalability, reliability, and security of our infrastructure

What we offer

PTO: Unlimited
Insurance: Medical + Dental + Vision + 401K
Eats: Catered lunch daily + doordash dinner credit if you ever need to stay late
Parental leave policy: 3 months non-birthing parent, 4 months for birthing parent
Fertility benefits: $15k lifetime benefit
New hire equity grant: competitive equity package with unmatched upside potential

Fulltime

Senior Platform Engineer - AWS

We’re currently looking for a skilled and enthusiastic Senior Platform Engineer ...

Location

Germany , Hamburg or Berlin

Salary:

73000.00 - 90000.00 EUR / Year

About You

Expiration Date

Until further notice

Requirements

5+ years of professional experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE), with a significant focus on cloud infrastructure
Fluency in scripting languages (e.g., Python, Go, Bash) for system automation, tooling development, and operational tasks
Deep expertise in managing and scaling production workloads within a major public cloud provider (e.g., AWS, Azure, or GCP), including strong familiarity with core services like Compute, Networking, Identity & Access Management (IAM), and Managed Database
Proven mastery of Infrastructure-as-Code (IaC) using AWS CloudFormation and/or Terraform in complex, multi-account environments
Demonstrated experience designing, implementing, and maintaining robust CI/CD pipelines
Solid knowledge of monitoring and logging solutions
Excellent communication and documentation skills, with the ability to articulate complex technical issues to technical stakeholders

Job Responsibility

Own and evolve the Commerce Cloud’s AWS infrastructure through the application of Infrastructure-as-Code (IaC) principles to ensure scalability, high availability, and cost efficiency
Design, implement, and optimize CI/CD pipelines and operational workflows utilizing tools such as GitLab CI, AWS CloudFormation, and Terraform
Establish and enforce comprehensive, high-quality documentation for all infrastructure, operational playbooks, and critical architecture decisions
Act as a subject matter expert and trusted advisor, partnering with application development teams to architect and provision infrastructure that meets their specific workload requirements
Drive collaborative efforts with GCP Platform Engineers on cross-cloud initiatives and work closely with Information Security Engineers to design and implement security controls and governance policies
Spearhead the evaluation and adoption of emerging cloud and platform technologies, continuously seeking opportunities to improve platform performance and developer experience

What we offer

Hybrid working
Sports courses
Free access to code.talks
Exclusive employee discounts
Free drinks
Language courses
Laracast account for free
Company parties
Help in the relocation process
Mobility subsidy

Fulltime

DevOps - Platform and Reliability Engineer

Cognitive Space

Location:
United States , Houston

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:
December 31, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for DevOps - Platform and Reliability Engineer

Platform Engineer DevOps

Senior Software Engineer – DevOps Platform

Platform Engineer

Platform Engineering Manager

Site Reliability Engineering Manager

Senior Site Reliability Engineer

Site Reliability Engineer

Senior Platform Engineer - AWS

DevOps - Platform and Reliability Engineer

Cognitive Space

Location:United States , Houston

Category:IT - Software Development

Contract Type:Not provided

Salary:

Job Description:

Job Responsibility:

Requirements:

Additional Information:

Job Posted:December 31, 2025

Looking for more opportunities? Search for other job offers that match your skills and interests.

Similar Jobs for DevOps - Platform and Reliability Engineer

Platform Engineer DevOps

Senior Software Engineer – DevOps Platform

Platform Engineer

Platform Engineering Manager

Site Reliability Engineering Manager

Senior Site Reliability Engineer

Site Reliability Engineer

Senior Platform Engineer - AWS

Location:
United States , Houston

Category:
IT - Software Development

Contract Type:
Not provided

Job Posted:
December 31, 2025