CrawlJobs Logo

DevOps - Platform and Reliability Engineer

Cognitive Space

Location Icon

Location:
United States , Houston

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

100000.00 - 150000.00 USD / Year

Job Description:

We are looking for a highly skilled DevOps Engineer to join our forward-thinking team and someone who thrives in a dynamic and fast-paced environment. You will play a key role in advancing our CNTIENT platform with state-of-the-art machine learning algorithms and models. A successful DevOps engineer for this role will be able to fulfill both the Site Reliability Engineer (SRE) requirements of a cloud-based SaaS company, and the Platform Engineering principles of developer enablement.

Job Responsibility:

  • Manage and administer our AWS cloud infrastructure across multiple accounts, leveraging Terraform and AWS best practices
  • Oversee and maintain multiple EKS clusters, utilizing ArgoCD and Helm for deployments
  • Lead efforts in logging, monitoring, and alerting to ensure system reliability and performance
  • Own and optimize GitLab CI/CD pipelines, collaborating closely with developers to meet evolving needs
  • Stay engaged with new greenfield initiatives, providing input and expertise in solution architecture
  • When required, build and deliver on-premises versions of our products for government customers

Requirements:

  • US Citizenship, Permanent Resident (Green) Card
  • Bachelor’s degree in a relevant field: Computer Science, Engineering, etc. or equivalent work experience
  • 2-4 years of professional experience as a SRE, Platform Engineer, or Cloud Engineer emphasizing security best practices
  • Hands-on experience running and scaling apps with Kubernetes and Docker
  • Strong knowledge of AWS (certification preferred)
  • Experience managing infrastructure as code with Terraform
  • Familiarity with GitOps workflows and packaging apps using Helm
  • Skills in observability tools like Grafana, Prometheus, and Loki
  • Proficiency in Python and Bash for automation and scripting and experience working with PostgreSQL databases
  • Familiarity with government security and compliance frameworks and applying DevSecOps best practices for secure infrastructure and deployments
What we offer:
  • Equity in the form of options
  • Flexible Time-Off policy and company holidays
  • Cost-effective health care, dental, and vision with company contributions
  • 401k matching plan with company match
  • Life insurance
  • Short-term and long-term disability

Additional Information:

Job Posted:
December 31, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for DevOps - Platform and Reliability Engineer

Platform Engineer DevOps

We are looking for an experienced Platform Engineer DevOps to ensure that the fo...
Location
Location
France , Paris
Salary
Salary:
Not provided
cozycozy.com Logo
cozycozy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on experience in Platform Engineering, Infrastructure or DevOps
  • Expertise in operating and scaling Kubernetes and Docker in production environments
  • Proven experience managing hybrid cloud / on-premises infrastructure for high-traffic applications
  • A strong background in designing and implementing robust CI/CD pipelines (GitLab CI, Jenkins, etc.)
  • Experience with Infrastructure as Code (Terraform, Ansible, etc.)
  • Experience with monitoring, alerting, and reliability practices (SRE principles)
  • The mindset to mentor and guide other engineers, fostering a culture of automation and operational excellence
  • Excellent communication skills in English
  • The demonstrated ability to drive complex projects
Job Responsibility
Job Responsibility
  • Implement, maintain and secure infrastructure (cloud, bare-metal, Kubernetes clusters)
  • Automate environment configuration using Infrastructure as Code (e.g.,Terraform, Ansible) and adhere to GitOps principles
  • Implement full-stack observability (metrics, logs, traces), sophisticated alerting, and participate in the incident management lifecycle
  • Ensure compliance with Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all managed services
  • Implement and manage secrets management systems
  • Contribute to the design and evolution of hybrid infrastructure
  • Define, lead, and maintain engineering standards for security, reliability, and technology selection across the organization, supporting the Head of Engineering in defining the platform roadmap
  • Drive continuous improvement initiatives for cloud cost optimization, scalability, performance, and platform security posture
  • Maintain comprehensive, up-to-date documentation and best practices to foster self-service and cross-team enablement
  • Design, implement, and maintain CI/CD pipelines (using GitLab CI, Github, and/or Jenkins) tailored for microservice architectures built with Node.js
What we offer
What we offer
  • Competitive salary
  • stock options
  • Alan health insurance
  • Swile card
  • unlimited coffee, tea, snacks, and drinks in the office
Read More
Arrow Right

Senior Software Engineer – DevOps Platform

We’re looking for a Senior Software Engineer to join our Devops team, where you ...
Location
Location
United States , Palo Alto; New York City
Salary
Salary:
172000.00 - 228000.00 USD / Year
wealthfront.com Logo
Wealthfront
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience with running and troubleshooting modern Linux systems and services in production
  • 6+ years of experience developing reliable production-grade software in Java, Go, or other similar languages
  • Proficiency with at least one automation technology such as Terraform, Chef, or Puppet
  • Successfully designed and deployed mission-critical complex distributed systems
  • Excellent critical thinking and communications skills with a desire to both learn from and educate your peers
  • A BS or MS in Computer Science or an Engineering field, or equivalent professional experience
Job Responsibility
Job Responsibility
  • Maintain our core infrastructure by writing software to automate application deployment, configure our infrastructure, and manage critical services such as our databases
  • Ensure that mission critical services operate reliably by triaging and fixing operational issues as an on-call engineer, participating in post-mortems, and implementing improvements to prevent future issues
  • Design, implement, and deploy internal tools and services to accelerate productivity of the wider Engineering team and enable direct ownership of operations
  • Help manage our server hardware in our physical data centers which may occasionally include travel to our Bay Area or New Jersey data centers for onsite projects
  • Be involved in key decisions regarding the evolution of our infrastructure
  • Mentor junior members of the team
What we offer
What we offer
  • medical
  • vision
  • dental
  • 401K plan
  • generous time off
  • parental leave
  • wellness reimbursements
  • professional development
  • employee investing discount
  • Fulltime
Read More
Arrow Right

Platform Engineer

Motorica is at a breakthrough moment. We’ve built a generative AI animation plat...
Location
Location
Sweden , Stockholm
Salary
Salary:
Not provided
motorica.ai Logo
Motorica
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience in Platform Engineering, SRE, or DevOps, ideally in high-growth or AI/ML-heavy environments
  • Strong grasp of CI/CD systems, cloud infrastructure (AWS/GCP), and containerization (Docker/Kubernetes)
  • Familiarity with observability, monitoring, and incident response best practices
  • Security mindset with hands-on experience in audits, compliance (ISO 27001, SOC2, etc.), and vulnerability management
  • Strong communication skills
  • you’ll be interfacing with developers daily and need to translate infrastructure into clarity, not complexity
  • A proactive, solution-oriented mindset: you anticipate friction before others feel it
Job Responsibility
Job Responsibility
  • Provide common infrastructure guidance, reusable patterns, and automated tooling to engineering teams
  • Own the “paved road” for developers, reducing friction and cognitive load
  • Champion and implement security best practices across the entire platform
  • Play a key role in achieving ISO 27001 certification through technical implementation and evidence gathering
  • Build and operate a highly reliable and cost-efficient platform, with particular focus on optimizing GPU-heavy AI/ML workloads
  • Manage CI/CD systems (GitHub Actions, GitLab CI) and track key metrics like build times, deployment frequency, and failure rates
  • Oversee cloud environments (AWS, GCP), including health, security, and cost reporting
  • Lead security scans, audits, and vulnerability remediation
  • Maintain observability stack (Prometheus, Grafana, Datadog, GCP Logging), ensuring meaningful dashboards and alerts
  • Act as point-of-contact for ML Research team’s infra requests (GPU access, specialized pipelines)
What we offer
What we offer
  • Stock Options program
  • Retirement Plan
  • Health Benefits (5000 SEK/year)
  • Life Insurance / Health Insurance / Injury Insurance
  • Competitive compensation
  • Fulltime
Read More
Arrow Right

Platform Engineering Manager

As an Engineering Manager on the Platform Engineering team at Arrive Logistics, ...
Location
Location
United States , Austin
Salary
Salary:
Not provided
arrivelogistics.com Logo
Arrive Logistics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering experience, with significant time spent in systems, software, platform, site reliability, or DevOps engineering
  • 1+ year of people management or team leadership experience, including performance and career development
  • Demonstrated ability to pragmatically balance business priorities with technical constraints
  • Experience driving initiatives, while holding teams accountable for results and ensuring continuous improvement
  • Strong analytical, problem-solving, and decision-making skills
  • Significant experience building or operating cloud applications on a major provider such as Azure (preferred), AWS, or GCP
  • Expertise or significant experience with containerized workloads and Kubernetes or similar orchestration systems
Job Responsibility
Job Responsibility
  • Lead and manage a Platform Engineering team, ensuring members of the team’s performance aligns with the overall department goals by holding direct reports accountable to performance expectations
  • Coach, mentor, and advocate for your engineers, growing a stronger team through stronger individuals
  • Responsible for the delivery of high-quality, scalable platform solutions to production on a regular basis, owning the quality, documentation, tests, and communication across teams
  • Manage the team’s day-to-day with a high level of awareness of what your team/teams are doing to remove blockers and ensure efforts are aligned with the priorities
  • Keep your team informed of the priorities and vision that is driving the organization, making sure they understand the what and the why, to advance the organization’s goals
  • Work with the Director of Engineering, peers, and Product partners to scope, plan, and execute platform initiatives with impeccable communication around estimates and risk
  • Advance the organization’s goals within your team and ensure success in hitting them
  • Drive the development of custom tools and automation that hide Kubernetes complexity, streamline onboarding, remove manual toil, and improve mean time to detection and resolution
  • Oversee the design of robust CI/CD pipelines for applications and infrastructure
  • Partner with stakeholders across the organization to understand and drive requirements for the internal developer platform, assessing benefit and risk analysis, and advising on the best course of action
What we offer
What we offer
  • Take advantage of our comprehensive benefits package, including medical, dental, vision, life, disability, and supplemental coverage
  • Invest in your future with our matching 401(k) program
  • Build relationships and find your home at Arrive through our Employee Resource Groups
  • Enjoy office wide engagement activities, team events, happy hours and more
  • Leave the suit and tie at home
  • our dress code is casual
  • Work in the booming city of Austin, TX – we are in a convenient location close to the airport and downtown
  • Park your car for free on site
  • Start your morning with a specialty drink from our fully stocked coffee bar, Broker’s Brew
  • Sweat it out with the team at our onsite gym
  • Fulltime
Read More
Arrow Right

Site Reliability Engineering Manager

Hewlett Packard Enterprise (HPE) is looking for a Site Reliability Engineering M...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7–10 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
  • Minimum 2 years of experience managing or leading cloud operations teams
  • Deep understanding of cloud platforms (AWS, GCP, or Azure) and cloud-native architectures
  • Hands-on experience with Kubernetes, containers, infrastructure as code (e.g., Terraform), and configuration management tools
  • Strong foundation in observability (monitoring, logging, tracing), automation using Python, and incident response
  • Familiarity with modern CI/CD automation and tools
  • Excellent communication, stakeholder management, and team-building skills
  • Experience scaling SRE practices in high-growth or large-scale environments
  • Ability to balance long-term reliability initiatives with short-term delivery needs.
Job Responsibility
Job Responsibility
  • Lead and mentor a team of Site Reliability Engineers, supporting their growth, performance, and well-being
  • Own the reliability strategy for SASE cloud infrastructure systems, including incident management, SLIs/SLOs, and capacity planning
  • Partner with Engineering, Product, and Security teams to design and deliver highly available, scalable, and resilient cloud-native services
  • Guide the team in building automation, improving observability, and improve operational efficiency of our cloud infrastructure
  • Drive adoption of best practices in monitoring, alerting, on-call operations, and runbook development
  • Build and maintain a strong engineering culture based on ownership, collaboration, and continuous learning
  • Define and track key reliability metrics, and report on team performance and system health to leadership
  • Contribute to hiring, onboarding, and career development for SREs.
What we offer
What we offer
  • Health & Wellbeing benefits for physical, financial, and emotional wellbeing
  • Personal & Professional Development programs
  • Unconditional inclusion in the workplace.
  • Fulltime
Read More
Arrow Right

Senior Site Reliability Engineer

This is a role at Baxter where your work impacts saving and sustaining lives thr...
Location
Location
United States , Deerfield
Salary
Salary:
96000.00 - 132000.00 USD / Year
https://www.baxter.com/ Logo
Baxter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in computer science, IT, or related field (or equivalent experience)
  • Prior experience in Site Reliability Engineering and cloud-based infrastructure management
  • Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
  • Azure administration and operations experience, with certifications a plus
  • Knowledge of related technologies, including cloud, encryption, and security protocols
  • Systems administration experience in Windows and Linux environments
  • Proven problem-solving skills and experience with scripting and automation tools
  • Ability to create accurate documentation and reports, with excellent communication skills
  • Applicants must be authorized to work for any employer in the U.S.
  • Unable to sponsor or take over sponsorship of an employment visa at this time.
Job Responsibility
Job Responsibility
  • Drive strategies to ensure 24x7 availability of services and business continuity for customer-facing healthcare software applications and platforms hosted on Microsoft Azure cloud
  • Manage and administer Azure resources, including virtual machines, databases, and networking components
  • Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
  • Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
  • Define and refine Operations SLAs to maintain high level of Customer Satisfaction
  • Establish non-functional requirements to meet SLAs
  • Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
  • Define key performance indicators that can be monitored, measured, and used to derive opportunities
  • Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
  • Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes.
What we offer
What we offer
  • Support for Parents
  • Continuing Education/Professional Development
  • Employee Health & Well-Being Benefits
  • Paid Time Off
  • 2 Days a Year to Volunteer
  • Medical and dental coverage starting day one
  • Insurance coverage for basic life, accident, short-term and long-term disability
  • Business travel accident insurance
  • Employee Stock Purchase Plan (ESPP)
  • 401(k) Retirement Savings Plan
  • Fulltime
Read More
Arrow Right

Site Reliability Engineer

As a highly skilled Site Reliability Engineer (SRE), you will contribute to buil...
Location
Location
United States , New York City; San Francisco
Salary
Salary:
160000.00 - 300000.00 USD / Year
hebbia.ai Logo
Hebbia
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
  • 5+ years software development experience at a venture-backed startup or top technology firm
  • Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role
  • Strong expertise in managing CI/CD pipelines and deployment automation
  • Proficiency in cloud platforms such as AWS, Azure, or Google Cloud (we are an AWS shop)
  • Solid understanding of containerization and orchestration technologies such as Docker and Kubernetes
  • Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, or similar
  • Knowledge of infrastructure-as-code (IaC) tools such as Terraform or CloudFormation
  • Familiarity with security best practices and tools for infrastructure and application security
  • Excellent problem-solving skills and the ability to troubleshoot complex issues
Job Responsibility
Job Responsibility
  • Assist in managing deployment pipelines to facilitate smooth and efficient software releases
  • Help implement and maintain observability solutions for monitoring system performance and reliability
  • Support local development environments to optimize developer workflows
  • Work with development teams to ensure infrastructure aligns with project requirements
  • Contribute to improving the security of our infrastructure by assisting with proactive measures and audits
  • Assist in developing and maintaining automation scripts and tools to enhance operational efficiency
  • Help troubleshoot and resolve infrastructure and application issues to minimize downtime and maintain smooth operations
  • Participate in evaluating and integrating new technologies to enhance the scalability, reliability, and security of our infrastructure
What we offer
What we offer
  • PTO: Unlimited
  • Insurance: Medical + Dental + Vision + 401K
  • Eats: Catered lunch daily + doordash dinner credit if you ever need to stay late
  • Parental leave policy: 3 months non-birthing parent, 4 months for birthing parent
  • Fertility benefits: $15k lifetime benefit
  • New hire equity grant: competitive equity package with unmatched upside potential
  • Fulltime
Read More
Arrow Right

Senior Platform Engineer - AWS

We’re currently looking for a skilled and enthusiastic Senior Platform Engineer ...
Location
Location
Germany , Hamburg or Berlin
Salary
Salary:
73000.00 - 90000.00 EUR / Year
aboutyou.de Logo
About You
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE), with a significant focus on cloud infrastructure
  • Fluency in scripting languages (e.g., Python, Go, Bash) for system automation, tooling development, and operational tasks
  • Deep expertise in managing and scaling production workloads within a major public cloud provider (e.g., AWS, Azure, or GCP), including strong familiarity with core services like Compute, Networking, Identity & Access Management (IAM), and Managed Database
  • Proven mastery of Infrastructure-as-Code (IaC) using AWS CloudFormation and/or Terraform in complex, multi-account environments
  • Demonstrated experience designing, implementing, and maintaining robust CI/CD pipelines
  • Solid knowledge of monitoring and logging solutions
  • Excellent communication and documentation skills, with the ability to articulate complex technical issues to technical stakeholders
Job Responsibility
Job Responsibility
  • Own and evolve the Commerce Cloud’s AWS infrastructure through the application of Infrastructure-as-Code (IaC) principles to ensure scalability, high availability, and cost efficiency
  • Design, implement, and optimize CI/CD pipelines and operational workflows utilizing tools such as GitLab CI, AWS CloudFormation, and Terraform
  • Establish and enforce comprehensive, high-quality documentation for all infrastructure, operational playbooks, and critical architecture decisions
  • Act as a subject matter expert and trusted advisor, partnering with application development teams to architect and provision infrastructure that meets their specific workload requirements
  • Drive collaborative efforts with GCP Platform Engineers on cross-cloud initiatives and work closely with Information Security Engineers to design and implement security controls and governance policies
  • Spearhead the evaluation and adoption of emerging cloud and platform technologies, continuously seeking opportunities to improve platform performance and developer experience
What we offer
What we offer
  • Hybrid working
  • Sports courses
  • Free access to code.talks
  • Exclusive employee discounts
  • Free drinks
  • Language courses
  • Laracast account for free
  • Company parties
  • Help in the relocation process
  • Mobility subsidy
  • Fulltime
Read More
Arrow Right
Welcome to CrawlJobs.com
Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.