CrawlJobs Logo

Senior Infrastructure & Platform Engineer

Finland, Helsinki Employment contract · Job Posted February 14, 2026
Apply Position
Job Link Share

Job Description

You’re turning innovative research and bespoke tooling into secure, scalable, observable production systems. You’ll help establish the platform standards—AWS foundations, batch & workflow orchestration, CI/CD, IaC, security, and observability—that let our scientists and engineers ship reliable, low‑latency InSAR pipelines and APIs. While much of our development environment is in AWS, some of our offerings must be deployed in on-premise environments for production use and you will learn to prepare deployments outside of AWS. You will balance cloud infrastructure knowledge with on-premise needs to serve our customers. You’ll balance rigor with pragmatism, helping the team raise the reliability bar without slowing scientific iteration. You’ll help take systems that already work for our team and make them reliable, secure, and scalable for broader enterprise and customer use.

Job Responsibility

  • Lead and evolve cloud foundations in AWS: multi‑account setup and guardrails (Organizations, IAM/SCP/SSO), secure networking, encryption (KMS), secrets, artifact governance
  • Support on-premise deployments when needed, working closely with other engineering teams
  • Choose the right compute & orchestration for the service/product needs we have
  • Codify everything: build reusable Terraform/Terragrunt/CDK modules
  • drift control
  • environment promotion. Automate when possible
  • Harden CI/CD: SBOMs, image signing, policy gates, progressive delivery, fast rollback
  • Observability that matters: metrics/logs/traces (CloudWatch and/or Datadog), SLOs/error budgets, alerting, incident response with blameless postmortems
  • Security at speed: vulnerability management, supply‑chain hardening, least‑privilege by default, data‑access boundaries
  • Cost & performance: capacity planning, spot strategies, storage patterns for large rasters (S3, EFS/FSx), data‑locality aware processing
  • Partner to productize: help package Python algorithms into cloud‑native services and simple job specs
  • provide “golden‑path” templates for the team
  • Amplify with AI‑assist: accelerate coding, test generation, docs/runbooks—aligned with ICEYE’s AI‑productivity direction

Requirements

  • Deep AWS experience operating production systems at scale
  • Strong with Python for automation/tooling (reading service code as needed)
  • Batch/workflow orchestration
  • Containers at scale (Docker, ECR), CI/CD, artifact integrity and rollout strategies
  • IaC (Terraform/Terragrunt or CDK) and Git‑driven ops
  • Production observability and SRE practices (SLOs, incident response), CloudWatch/Datadog
  • Security fundamentals: IAM, network segmentation, encryption, vulnerability management
  • Kubernetes or equivalent
  • LLM IDE Tooling proficiency & curiosity (e.g. Cursor, Claude, Copilot)

Nice to have

  • Experience with on-premise deployment
  • Argo Workflows and Hera
  • GPU scheduling
  • FSx for Lustre
  • Postgres/PostGIS
  • OPA/Gatekeeper
  • Geospatial/EO exposure (SAR stacks, STAC, GDAL, xarray/dask)
  • Go familiarity to interface with services written elsewhere at ICEYE (nice to have only)

What we offer

  • Occupational healthcare, occupational and accident insurance
  • A yearly benefit budget to spend as you wish (i.e. on sport, transport, bike benefit, wellness, lunch, etc.)
  • Phone subscription with iPhone of choice
  • Relocation support (i.e. flight tickets, accommodation, relocation agency support)
  • Time for self-development, research, training, conferences, or certification schemes
  • Inspiring and collaborating offices and silent workspaces enable you to focus
  • A wide variety of the best coffee, tea, snacks, and sweets to accompany your daily space mission

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Infrastructure & Platform Engineer

8 matching positions

Senior Systems Engineer - Infrastructure & Platform Reliability

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serv...
Location
Location
United States , San Francisco; San Jose
Salary
Salary:
206000.00 - 310000.00 USD / Year
lambda.ai Logo
Lambda
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Have a keen interest in system design, architecting for performance, scalability, and experience with multiple cloud infrastructure platforms (AWS, GCP, Azure, etc.)
  • Think carefully about systems: edge cases, failure modes, behaviors, and specific implementations
  • Know and prefer configuration management systems and toolchains (Chef, Ansible, Terraform, GitHub Actions, etc.)
  • Have solid programming skills: Python, Go, etc.
  • Have an urge to collaborate and communicate asynchronously, combined with a desire to record and document issues and solutions
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can’t help but fix it
  • Have an urge for delivering quickly and effectively, and iterating fast
Job Responsibility
Job Responsibility
  • Design, write, and deliver software and services to improve the availability, scalability, reliability, and efficiency of Lambda’s internal IT systems and platforms
  • Solve problems relating to mission critical services and build automation to prevent problem recurrence with the goal of automating response to all non-exceptional events
  • Work with Lambda Engineering and internal teams to Influence and create new designs, architectures, standards, and methods for large-scale distributed systems
  • Engage in service capacity planning and demand forecasting, software performance analysis, and system tuning
  • Be an excellent communicator, producing documentation and related artifacts for the systems you are responsible for
What we offer
What we offer
  • Generous cash & equity compensation
  • Health, dental, and vision coverage for you and your dependents
  • Wellness and commuter stipends for select roles
  • 401k Plan with 2% company match (USA employees)
  • Flexible paid time off plan that we all actually use
  • Fulltime
Read More
Arrow Right

Senior AIOps Engineer (Platform & Infrastructure)

Groupon is moving beyond "experimenting" with AI to running it at massive scale....
Location
Location
Prague; Warsaw; Valencia; Madrid
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in Platform Engineering, SRE, or DevOps within a cloud-native environment
  • Deep experience managing stateful and stateless workloads (Helm, Istio, Docker)
  • Hands-on experience deploying and operating AI/ML tools or data-intensive systems in production
  • Strong skills in Python or Go to build custom API wrappers and automate operational tasks
  • Expertise in Prometheus, Grafana, and ELK stack to ensure end-to-end observability of complex AI requests
Job Responsibility
Job Responsibility
  • Architect the AI Stack: Design and operate core infrastructure on Kubernetes, including Vector Databases, LLM Gateways (LiteLLM), and workflow automation tools (n8n)
  • Enable at Scale: Drive AI adoption by creating self-service "Golden Paths" using Terraform and Helm, allowing engineering teams to deploy RAG pipelines with one click
  • Operational Excellence: Implement centralized observability, tracing (Langfuse), and governance to ensure our AI systems are reliable, auditable, and secure
  • Fiscal Discipline: Own the "AI Bill"—monitoring token usage and latency to optimize spend while maintaining high performance
What we offer
What we offer
  • End-to-end Ownership: Real authority to standardize how a global company builds with AI
  • Career Growth: This is a high-visibility role within a new, strategic team with potential for leadership progression
Read More
Arrow Right

Senior Software Engineer - Platform Infrastructure

We are seeking a Senior Software Engineer II to architect, build, and operate se...
Location
Location
United States
Salary
Salary:
192200.00 - 225810.00 USD / Year
confluent.io Logo
Confluent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in software engineering, SRE, or security engineering roles, with significant experience operating security platform services
  • Strong backend software development experience (Go, Java, Rust, Python)
  • Expertise with distributed systems, cloud infrastructure (AWS, GCP, Azure), Kubernetes, service mesh, and container orchestration
  • Strong understanding of security domains: IAM, OAuth2, OIDC, PKI, secrets management, policy engines, audit pipelines, zero trust architecture
  • Experience building highly reliable, observable, and resilient production systems
  • Operational expertise: SLOs, SLIs, error budgets, on-call leadership, incident management
  • Strong collaboration skills to drive alignment across engineering, security, and compliance stakeholders
  • Excellent communication skills with ability to influence technical and business leaders
  • BS, MS, or PhD in computer science or a related field, or equivalent work experience
Job Responsibility
Job Responsibility
  • Architect, design, and develop platform services with a strong focus on scalability, security, and developer experience
  • Lead operational design for reliability: build comprehensive observability, monitoring, and incident response automation into security-critical services
  • Build automation and tooling to drive self-healing systems, proactive risk detection, failure recovery, and continuous resilience testing
  • Collaborate with compliance, governance, and risk teams to translate regulatory and policy requirements into scalable technical controls
  • Lead technical design reviews, security architecture reviews, and incident postmortems for platform-level incidents
  • Mentor engineers across multiple disciplines on both security and operational best practices
  • Own end-to-end delivery of services: from initial design and development through deployment, production hardening, and lifecycle maintenance
What we offer
What we offer
  • Remote-First Work
  • Robust Insurance Benefits
  • Flexible Time Away
  • The Best Teammates
  • Experience Ambassadors
  • Open and Honest Culture
  • Well-Being and Growth
  • Offers Equity
  • Fulltime
Read More
Arrow Right

Senior Platform Engineer - Cloud & Infrastructure

Architect the Infrastructure of MLOps. This is a unique hybrid role where you wo...
Location
Location
Germany , Munich
Salary
Salary:
Not provided
ZenML
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep knowledge of Kubernetes (CKA level)
  • Experience with Docker, Terraform, Helm
  • Proficiency in Python and likely Go
  • Experience with AWS (EKS), GCP (GKE), Azure (AKS)
  • Experience with PostgreSQL, SQLModel, FastAPI
  • Infrastructure as Code (IaC) mastery
  • Ability to write production-quality code
  • Customer empathy and communication skills
  • Problem-solving skills for complex deployments
Job Responsibility
Job Responsibility
  • Build 'Infra-Heavy' Product Features like native schedulers and workload manager
  • Own the ZenML Pro (SaaS) Infrastructure ensuring resilience, scalability, and security
  • Enterprise Architecture & PoCs for complex customer deployments
  • Developer Experience by abstracting Kubernetes complexity from Data Scientists
What we offer
What we offer
  • Inspiring international team
  • Genuine connection & lots of fun with team events
  • Annual company offsite
  • Office in the heart of Munich
  • Flexible hours & trust-based work
  • Remote-friendly culture
  • Competitive compensation
  • Fulltime
Read More
Arrow Right

Senior ML Infrastructure Engineer, Inference Platform

About the Team: The ML Inference Platform is part of the AV ML Infrastructure or...
Location
Location
United States , Austin, Texas; Mountain View, California; Sunnyvale, California
Salary
Salary:
155420.00 USD / Year
gm.com Logo
General Motors
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience, with focus on machine learning systems or high performance backend services
  • Expertise in either Python, C++ or other relevant coding languages
  • Expertise in ML inference, model serving frameworks (triton, rayserve, vLLM etc)
  • Strong communication skills and a proven ability to drive cross-functional initiatives
  • Ability to thrive in a dynamic, multi-tasking environment with ever-evolving priorities
Job Responsibility
Job Responsibility
  • Design and implement core platform backend software components
  • Collaborate with ML engineers and researchers to understand critical workflows, parse them to platform requirements, and deliver incremental value
  • Lead technical decision-making on model serving strategies, orchestration, caching, model versioning, and auto-scaling mechanisms for highly optimized use of accelerators
  • Drive the development of monitoring, observability, and metrics to ensure reliability, performance, and resource optimization of inference services
  • Proactively research and integrate state-of-the-art model serving frameworks, hardware accelerators, and distributed computing techniques
  • Lead technical initiatives across GM’s ML ecosystem
  • Raise the engineering bar through technical leadership, establishing best practices
  • Contribute to open source projects
  • represent GM in relevant communities
What we offer
What we offer
  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • Fulltime
Read More
Arrow Right

Senior AI Infrastructure Engineer - Training Platform

As a Software Engineer on the Machine Learning Infrastructure team, you will bui...
Location
Location
United States , San Francisco; Seattle; New York
Salary
Salary:
216000.00 - 270000.00 USD / Year
scale.com Logo
Scale
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in backend or infrastructure engineering, with at least 2 years focused on orchestrating ML workloads at scale (100+ GPU nodes)
  • Strong programming skills in one or more languages (e.g. Python, Go, Rust, C++)
  • Experience with complex compute management systems that cover queueing, quotas, preemption, and gang scheduling
  • Experience with distributed training infrastructure, such as EFA, Infiniband, and topology-aware scheduling
  • Experience with distributed storage systems (e.g. Lustre, S3) as they relate to training throughput
  • Expert-level knowledge of Kubernetes internals (Custom Resources, Operators, Admission Controllers) and how they interact with device plugins for specialized hardware
  • Familiarity with cloud infrastructure (AWS, GCP) and infrastructure as code (e.g., Terraform)
  • Proven ability to solve complex problems and work independently in fast-moving environments
Job Responsibility
Job Responsibility
  • Architect and scale a multi-tenant orchestration layer that abstracts away the complexity of GPU clusters, ensuring high utilization and seamless job recovery
  • Design and implement scheduling primitives to optimize the lifecycle of training jobs
  • Develop deep observability and automated health-checking into the training stack to proactively identify and isolate hardware failures
  • Evaluate and integrate emerging technologies in the CNCF and AI ecosystem (e.g. Ray, Kueue), making data-driven build vs. buy decisions that balance velocity with long-term maintainability
  • Work closely with Finance and Procurement teams to drive our capacity planning process
  • Participate in our team's on call process to ensure the availability of our services
  • Own projects end-to-end, from requirements, scoping, design, to implementation, in a highly collaborative and cross-functional environment
What we offer
What we offer
  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • commuter stipend (may be eligible)
  • Fulltime
Read More
Arrow Right

Senior-Staff Software Engineer, Platform Infrastructure

As a Senior Software Engineer on this team, you will help architect, design and ...
Location
Location
United States , San Mateo
Salary
Salary:
130000.00 - 280000.00 USD / Year
verkada.com Logo
Verkada
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Must have a BS, MS, or PhD in Computer Science, or similar technical field of study
  • Experience and enthusiasm for learning about new infrastructure products, features, and strategies
  • Comfortable with working at the frontier of infrastructure and software development
  • Experience in Python and/or Go
  • Experience with one of the major cloud platforms (preferably AWS)
  • Strong written and verbal communications
Job Responsibility
Job Responsibility
  • Identify and lead critical efforts related to scalability, reliability and efficiency
  • Influence the features and direction of our platform with your own ideas
  • Provide technical support for engineers on team
  • Align with product and org objectives, and coordinate with cross-functional teams on delivering key results
What we offer
What we offer
  • Healthcare programs that can be tailored to meet the personal health and financial well-being needs - Premiums are 100% covered for the employee under at least one plan and 80% for family premiums under all plans
  • Nationwide medical, vision and dental coverage
  • Health Saving Account (HSA) with annual employer contributions and Flexible Spending Account (FSA) with tax saving options
  • Expanded mental health support
  • Paid parental leave policy & fertility benefits
  • Time off to relax and recharge through our paid holidays, firmwide extended holidays, flexible PTO and personal sick time
  • Professional development stipend
  • Fertility stipend
  • Wellness/fitness benefits
  • Healthy lunches provided daily
  • Fulltime
Read More
Arrow Right

Senior Software Engineer II - Platform Infrastructure

We are seeking a Senior Software Engineer II to architect, build, and operate se...
Location
Location
Canada
Salary
Salary:
179200.00 - 210600.00 CAD / Year
confluent.io Logo
Confluent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in software engineering, SRE, or security engineering roles, with significant experience operating security platform services
  • Strong backend software development experience (Go, Java, Rust, Python)
  • Expertise with distributed systems, cloud infrastructure (AWS, GCP, Azure), Kubernetes, service mesh, and container orchestration
  • Strong understanding of security domains: IAM, OAuth2, OIDC, PKI, secrets management, policy engines, audit pipelines, zero trust architecture
  • Experience building highly reliable, observable, and resilient production systems
  • Operational expertise: SLOs, SLIs, error budgets, on-call leadership, incident management
  • Strong collaboration skills to drive alignment across engineering, security, and compliance stakeholders
  • Excellent communication skills with ability to influence technical and business leaders
  • BS, MS, or PhD in computer science or a related field, or equivalent work experience
Job Responsibility
Job Responsibility
  • Architect, design, and develop platform services with a strong focus on scalability, security, and developer experience
  • Lead operational design for reliability: build comprehensive observability, monitoring, and incident response automation into security-critical services
  • Build automation and tooling to drive self-healing systems, proactive risk detection, failure recovery, and continuous resilience testing
  • Collaborate with compliance, governance, and risk teams to translate regulatory and policy requirements into scalable technical controls
  • Lead technical design reviews, security architecture reviews, and incident postmortems for platform-level incidents
  • Mentor engineers across multiple disciplines on both security and operational best practices
  • Own end-to-end delivery of services: from initial design and development through deployment, production hardening, and lifecycle maintenance
What we offer
What we offer
  • Remote-First Work
  • Robust Insurance Benefits
  • Flexible Time Away
  • The Best Teammates
  • Experience Ambassadors
  • Open and Honest Culture
  • Well-Being and Growth
  • Offers Equity
  • Fulltime
Read More
Arrow Right