CrawlJobs Logo

Software Engineer - Cloud FinOps & Reliability

lumalabs.ai Logo

Luma AI

Location Icon

Location:
United States , Palo Alto

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

120000.00 - 255000.00 USD / Year

Job Description:

This is a foundational engineering position for a technical, data-driven expert who gets excited about optimization at a massive scale. As a foundational member of our SRE team, you will specialize in FinOps and cloud cost management, owning the financial health of one of the world's largest multi-cloud GPU infrastructures. You will be an SRE who applies a deep understanding of cloud architecture and pricing models to find and eliminate inefficiency. You will use your software engineering skills to build the tools and automation required to govern our cloud spend, providing critical insights that allow us to scale our AI research and products sustainably.

Job Responsibility:

  • Analyze & Optimize: Actively monitor and analyze costs across our entire technical ecosystem—including multi-cloud infrastructure (AWS, GCP, OCI), on-premise clusters, and third-party services—to identify and execute on opportunities for cost optimization. Develop forecasting models to predict future spend and inform our capacity planning
  • Manage & Commit: Develop and actively manage a multi-million dollar portfolio of Reserved Instances (RIs) and Savings Plans to maximize commitment-based discounts across our global GPU and CPU fleets
  • Automate & Build: Apply a software engineering approach to design, build, and maintain custom tools and automation in Python and SQL. Your systems will track, analyze, and report on costs across our entire fleet of providers and services, with a focus on detecting anomalies immediately
  • Partner & Advise: Working closely as an embedded member of the SRE team, you will partner with fellow SREs and research teams to model the cost implications of new models and infrastructure designs, providing expert guidance on cost-performance trade-offs
  • Visualize & Report: Create and manage a centralized observability stack for cloud costs, building dashboards in tools like Grafana to give a real-time, granular view of our financial posture to all stakeholders

Requirements:

  • 5+ years of experience in a technical role such as Site Reliability Engineer, DevOps Engineer, Infrastructure Engineer, or a dedicated Cloud Cost Engineer
  • Deep, hands-on expertise with the cost models and optimization levers of at least one major cloud provider (AWS, GCP), and a willingness to learn others
  • Proficient in Python for the purpose of scripting, data analysis, and building automation tooling
  • Strong, foundational understanding of cloud infrastructure, including containerization (Docker, Kubernetes), networking, and storage
  • Not an accountant
  • you are a systems thinker who is passionate about applying engineering principles to solve financial challenges at scale
  • A tenacious troubleshooter and a data-driven decision-maker who thrives on finding the 'why' behind the numbers

Nice to have:

  • Experience managing a monthly cloud spend in excess of $1 million
  • Relevant certifications, such as the FinOps Certified Practitioner (FOCP)
  • Experience building custom cost allocation, showback, or chargeback systems from scratch
  • A background working with large-scale GPU clusters for AI/ML workloads

Additional Information:

Job Posted:
January 13, 2026

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Software Engineer - Cloud FinOps & Reliability

Cloud Engineering Manager - FinOps

This role combines technical expertise, leadership, and operational excellence t...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven expertise in cloud platforms (e.g., AWS, Azure, Google Cloud) and cloud-native technologies
  • Strong knowledge of FinOps principles and cloud financial management, including cost optimization, forecasting, and governance
  • Experience with application development frameworks (e.g., Node.js, Python, Java) and modern software engineering practices
  • Familiarity with cloud monitoring and cost management tools, such as AWS Cost Explorer, Azure Cost Management, or third-party FinOps platforms (e.g., CloudHealth, Apptio)
  • Proficiency in containerization and orchestration technologies such as Docker and Kubernetes
  • Demonstrated success in leading engineering teams, managing priorities, and delivering complex projects on time and within budget
  • Strong collaboration skills, with the ability to work effectively across engineering, finance, and business teams
  • Exceptional ability to communicate technical concepts to non-technical stakeholders and align engineering efforts with business goals
  • Bachelor’s or master’s degree in computer science, engineering, information systems, or related field
  • Typically, 7-10 years’ experience, including 0-2 years of people management experience
Job Responsibility
Job Responsibility
  • Lead and inspire a team of cloud engineers focused on FinOps application development, fostering a culture of innovation, collaboration, and continuous improvement
  • Drive the design, development, and implementation of cloud engineering applications that enable visibility, optimization, and governance of cloud costs and usage
  • Architect scalable, secure, and resilient solutions that align with FinOps principles (e.g., cost optimization, forecasting, usage analytics)
  • Collaborate with product managers and business stakeholders to define requirements, prioritize features, and deliver value-driven solutions
  • Ensure seamless integration of FinOps applications with existing HPE cloud platform tools and systems
  • Lead efforts to optimize cloud infrastructure costs and usage patterns across HPE's cloud platforms, leveraging advanced analytics and automation
  • Establish and enforce engineering best practices, including CI/CD pipelines, DevSecOps principles, and automated testing frameworks
  • Monitor and improve application performance, reliability, and scalability through proactive measures and robust incident management
  • Collaborate with finance teams to ensure compliance with cloud spending policies and reporting requirements
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Staff Platform Software Engineer

EarnIn is seeking a Staff Platform Engineer to lead the strategic design, automa...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
earnin.com Logo
EarnIn
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s Degree in Computer Science or equivalent industry experience
  • 7+ years of experience in cloud infrastructure, managing large-scale, high-availability, customer-facing distributed systems
  • Proven experience mentoring and guiding senior engineers, driving technical decisions, and leading company-wide cloud initiatives
  • Mastery of public cloud providers, specifically AWS (EKS, DynamoDB, Aurora, Kinesis, etc.)
  • Strong expertise in containerized microservices running on Kubernetes
  • Deep knowledge of automation and configuration management tools (Terraform, Ansible)
  • Expertise on CICD pipelines and tools, including Jenkins, GHA, Argo CD, Spinnaker & FluxCD or similar
  • Experience with advanced observability tools (DataDog, CloudWatch)
  • Track record of leading cost optimization / FinOps initiatives, performance tuning, and operational excellence projects
  • Proven ability to drive cross-functional initiatives with engineering, product, and business teams
Job Responsibility
Job Responsibility
  • Serve as a key architect and thought leader in the cloud infrastructure domain, guiding the team on best practices
  • Mentor and coach senior engineers across the company in advanced cloud operations practices
  • Provide oversight of hosted Linux and Windows systems, networks, databases, and applications, identifying and solving critical performance, scalability, and stability challenges
  • Design and develop reusable components and operational strategies to enhance the scalability, performance, and monitoring of cloud systems
  • Collaborate with other senior engineers to create technical solutions that address company-wide cloud challenges
  • Lead the establishment and continuous evolution of infrastructure-as-code best practices, driving automation, self-healing, and security standards
  • Drive operational cost savings through service optimizations, autoscaling strategies, and distributed processing architectures
  • Collaborate closely with cross-functional teams, including security, engineering, and business teams, to ensure that operational strategies align with company-wide objectives
  • Provide thought leadership in company-wide initiatives such as observability, automation, and disaster recovery
  • Continuously evaluate existing tools and processes, lead efforts to socialize, present, and implement enhancements for optimal operational efficiency
What we offer
What we offer
  • healthcare
  • internet/cell phone reimbursement
  • a learning and development stipend
  • opportunities to travel to our Mountain View HQ
  • Fulltime
Read More
Arrow Right

Senior Infrastructure Software Engineer, Cloud Foundation

As a Senior Infrastructure Software Engineer on the Cloud Platform org, you will...
Location
Location
Poland
Salary
Salary:
314500.00 - 425500.00 PLN / Year
dropbox.com Logo
Dropbox
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience as a backend, platform, or infrastructure engineer, with a proven track record of building scalable, reliable systems
  • Proficiency in backend development with Golang and Python (required)
  • Hands-on experience deploying and managing production workloads in public cloud environments (AWS and/or Azure)
  • Expertise with infrastructure-as-code (Terraform, CDK) and automation of cloud infrastructure configuration
  • Strong knowledge of public cloud architecture best practices, including AWS Well-Architected principles, networking, and identity/access management
  • Ability to design and implement technical solutions that translate business and product requirements into efficient cloud-based architectures
  • Effective communication skills for cross-functional collaboration and driving alignment on cloud solutions
Job Responsibility
Job Responsibility
  • Design and build highly available, scalable services that provision and seamlessly integrate secure public cloud infrastructure
  • Partner with security and network engineering teams to define clear requirements and set standards for public cloud usage across Dropbox
  • Collaborate with Capacity Engineering to integrate supply-side capabilities with FinOps tooling and processes
  • Document, share, and promote best practices to help product engineering teams succeed in public cloud environments
  • Shape technical direction at an organizational level by translating business and technical constraints into actionable roadmaps, and driving alignment across the platform org
  • Provide technical guidance and mentorship to junior engineers via code review and design docs
  • Contribute to the evolution of Dropbox’s infrastructure stack by improving code quality and system reliability
What we offer
What we offer
  • Competitive medical, dental and vision coverage
  • Retirement savings through a defined contribution pension or savings plan
  • Flexible PTO/Paid Time Off, paid holidays, Volunteer Time Off, and more
  • Income Protection Plans: Life and disability insurance
  • Business Travel Protection: Travel medical and accident insurance
  • Perks Allowance to be used on what matters most to you
  • Parental benefits including: Parental Leave, Fertility Benefits, Adoptions and Surrogacy support, and Lactation support
  • Mental health and wellness benefits
  • Fulltime
Read More
Arrow Right

Senior Azure DevOps Engineer

We are looking to recruit an SC Cleared Senior Azure DevOps Engineer for a leadi...
Location
Location
United Kingdom
Salary
Salary:
80000.00 - 90000.00 GBP / Year
datacareers.co.uk Logo
DataCareers
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience in Azure services and architecture (VMs, EntraID, Application Gateway, Sentinel, Defender for Cloud, Azure Fabric, Functions, Logic Apps, Front Door, App Service, Dev Box, Azure Migrate)
  • Strong expertise in Azure DevOps, GitHub CI/CD, and build/release automation
  • Proficiency with Infrastructure as Code (Terraform, Pulumi, CloudFormation, PowerShell)
  • Experience deploying solutions in AWS is desirable
  • Familiarity with containerization and orchestration (Docker, Kubernetes) and automation/configuration tools (Ansible)
  • Strong scripting skills (PowerShell, Bash, Python)
  • Experience with monitoring and observability tools (Grafana, Azure Monitor, DataDog, New Relic)
  • Deep understanding of cloud security, governance, and FinOps principles
  • Solid Windows, Linux, and Microsoft 365 design and implementation experience
  • Proven experience migrating databases (e.g., MS SQL) in cloud environments
Job Responsibility
Job Responsibility
  • Lead the design and implementation of cloud infrastructure and DevOps processes across client projects
  • Act as a technical advisor for cloud engineers, providing guidance on CI/CD automation, container orchestration, and platform reliability
  • Design, document, and maintain secure technical and security architectures aligned with best practices
  • Collaborate with Architecture, Security, Software Engineering, and Product teams to align cloud platform strategy
  • Drive improvements in automation, infrastructure as code, and overall DevOps maturity across projects
  • Mentor and coach engineering teams to adopt modern engineering practices and automation strategies
  • Deliver large-scale infrastructure transformation projects with low-level design expertise
  • Stay ahead of emerging technologies, applying them to deliver maximum client value
  • Fulltime
Read More
Arrow Right

Principal Azure DevOps Engineer

We are looking to recruit an SC Cleared Principal Azure DevOps Engineer for a le...
Location
Location
United Kingdom
Salary
Salary:
80000.00 - 90000.00 GBP / Year
datacareers.co.uk Logo
DataCareers
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience in Azure services and architecture (VMs, EntraID, Application Gateway, Sentinel, Defender for Cloud, Azure Fabric, Functions, Logic Apps, Front Door, App Service, Dev Box, Azure Migrate)
  • Strong expertise in Azure DevOps, GitHub CI/CD, and build/release automation
  • Proficiency with Infrastructure as Code (Terraform, Pulumi, CloudFormation, PowerShell)
  • Experience deploying solutions in AWS is desirable
  • Familiarity with containerization and orchestration (Docker, Kubernetes) and automation/configuration tools (Ansible)
  • Strong scripting skills (PowerShell, Bash, Python)
  • Experience with monitoring and observability tools (Grafana, Azure Monitor, DataDog, New Relic)
  • Deep understanding of cloud security, governance, and FinOps principles
  • Solid Windows, Linux, and Microsoft 365 design and implementation experience
  • Proven experience migrating databases (e.g., MS SQL) in cloud environments
Job Responsibility
Job Responsibility
  • Lead the design and implementation of cloud infrastructure and DevOps processes across client projects
  • Act as a technical advisor for cloud engineers, providing guidance on CI/CD automation, container orchestration, and platform reliability
  • Design, document, and maintain secure technical and security architectures aligned with best practices
  • Collaborate with Architecture, Security, Software Engineering, and Product teams to align cloud platform strategy
  • Drive improvements in automation, infrastructure as code, and overall DevOps maturity across projects
  • Mentor and coach engineering teams to adopt modern engineering practices and automation strategies
  • Deliver large-scale infrastructure transformation projects with low-level design expertise
  • Stay ahead of emerging technologies, applying them to deliver maximum client value
  • Fulltime
Read More
Arrow Right

Machine Learning Engineering Team Lead

Lead a high-performing team focused on building large-scale distributed training...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
aignostics.com Logo
Aignostics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Engineering, Mathematics, or a related field
  • 6+ years of software engineering or ML engineering experience, with at least 2 years in a technical leadership or team lead role
  • Proven track record of building and leading high-performing engineering teams
  • Experience guiding projects across the whole Software Development Life Cycle
  • Deep understanding of fundamental Machine Learning concepts and principles, familiarity with advanced model optimization techniques
  • Significant experience with large-scale distributed training systems and frameworks (especially PyTorch and NCCL)
  • Familiarity with GPUs, distributed systems, parallel computing and scaling laws
  • Advanced programming skills in Python, experience in performance-critical languages (C/C++ or CUDA) being a plus
  • Familiarity of MLOps/DevOps best practices including CI/CD, Docker, Kubernetes, and observability, cloud platforms (GCP, AWS or Azure) and infrastructure-as-code
  • Experience with Linux, version control, and container technologies
Job Responsibility
Job Responsibility
  • Build and scale a high-performing team capable of tackling complex distributed ML challenges
  • Own the full employee lifecycle: recruiting, onboarding, performance management, career development, and retention
  • Empower your team members and help them grow in autonomy and technical expertise
  • Mentor engineers at all levels, fostering a culture of continuous learning and psychological safety
  • Create an inclusive environment where diverse perspectives drive innovation
  • Define and execute technical roadmaps aligned with company objectives and product needs
  • Lead resource allocation and capacity planning to balance team workload and business priorities
  • Own FinOps responsibilities: optimize cloud costs, track spending, and ensure efficient resource utilization
  • Ensure operational readiness through monitoring, incident response protocols, and system reliability practices
  • Establish and track KPIs for team performance, system efficiency and health
What we offer
What we offer
  • Learning & Development yearly budget of 1,000€ (plus 2 L&D days)
  • Language classes, and internal development programs
  • Access to leadership development programs and executive coaching
  • Flexible working hours and teleworking policy
  • 30 paid vacation days per year
  • Family & pet friendly and support flexible parental leave options
  • Subsidized membership of your choice among public transport, sports, and well-being
  • Social gatherings, lunches, and off-site events for a fun and inclusive work environment
  • Optional company pension scheme
Read More
Arrow Right
New

Engineering Manager - Machine Learning

As a ML Engineering Team Lead at Aignostics, you will lead a high-performing tea...
Location
Location
Germany , Berlin
Salary
Salary:
Not provided
aignostics.com Logo
Aignostics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or Master's degree in Computer Science, Engineering, Mathematics, or a related field
  • 6+ years of software engineering or ML engineering experience, with at least 2 years in a technical leadership or team lead role
  • Proven track record of building and leading high-performing engineering teams
  • Experience guiding projects across the whole Software Development Life Cycle
  • Deep understanding of fundamental Machine Learning concepts and principles
  • Familiarity with advanced model optimization techniques
  • Significant experience with large-scale distributed training systems and frameworks (especially PyTorch and NCCL)
  • Familiarity with GPUs, distributed systems, parallel computing and scaling laws
  • Advanced programming skills in Python
  • Familiarity of MLOps/DevOps best practices including CI/CD, Docker, Kubernetes, and observability
Job Responsibility
Job Responsibility
  • Build and scale a high-performing team capable of tackling complex distributed ML challenges
  • Own the full employee lifecycle: recruiting, onboarding, performance management, career development, and retention
  • Empower your team members and help them grow in autonomy and technical expertise
  • Mentor engineers at all levels
  • Create an inclusive environment where diverse perspectives drive innovation
  • Define and execute technical roadmaps aligned with company objectives and product needs
  • Lead resource allocation and capacity planning
  • Own FinOps responsibilities: optimize cloud costs, track spending, and ensure efficient resource utilization
  • Ensure operational readiness through monitoring, incident response protocols, and system reliability practices
  • Establish and track KPIs for team performance, system efficiency and health
What we offer
What we offer
  • Learning & Development yearly budget of 1,000€ (plus 2 L&D days)
  • Language classes
  • Internal development programs
  • Access to leadership development programs and executive coaching
  • Flexible working hours and teleworking policy
  • 30 paid vacation days per year
  • Family & pet friendly
  • Support flexible parental leave options
  • Subsidized membership of your choice among public transport, sports, and well-being
  • Social gatherings, lunches, and off-site events
Read More
Arrow Right

Director, Cloud and DevOps Platforms

We are seeking a visionary and technically strong Director of Cloud and DevOps P...
Location
Location
United States , Palm Beach Gardens; Remote City
Salary
Salary:
174250.00 - 243750.00 USD / Year
berettaclima.it Logo
Beretta Clima Italia
Expiration Date
January 31, 2026
Flip Icon
Requirements
Requirements
  • Bachelor’s Degree
  • 10+ years of experience in cloud infrastructure, DevOps, or platform engineering
  • 5+ years of experience working with one or more of the hyper-scalers (AWS, Azure, and/or GCP)
  • 5+ years of people leadership experience
Job Responsibility
Job Responsibility
  • Define and execute the product vision and roadmap for cloud and DevOps platforms
  • Drive adoption of a platform-as-a-product mindset across infrastructure and engineering teams
  • Build a catalog of automated infrastructure foundations, self-service provisioning, CI/CD pipelines, and container patterns for microservices applications
  • Develop deep partnerships with digital product and enterprise software engineering teams
  • Understand dependencies, priorities, and technical diversity across teams
  • Tailor platform solutions that balance common services with alignment to varying technology stacks and patterns
  • Lead the development and delivery of scalable, secure, and resilient cloud-native platforms
  • Ensure platforms support rapid development, deployment, and operations across hybrid and multi-cloud environments
  • Integrate observability, security, and compliance into platform capabilities
  • Leverage AI to simplify the DevOps experience and enhance developer productivity
What we offer
What we offer
  • Health Care benefits: Medical, Dental, Vision
  • wellness incentives
  • Retirement benefits
  • Paid vacation days, up to 15 days
  • paid sick days, up to 5 days
  • paid personal leave, up to 5 days
  • paid holidays, up to 13 days
  • birth and adoption leave
  • parental leave
  • family and medical leave
  • Fulltime
Read More
Arrow Right