CrawlJobs Logo

Senior Machine Learning Site Reliability Engineer

prima.it Logo

Prima

Location Icon

Location:
Italy , Milan

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

Are you looking for a new challenge? Fancy helping us shape the future of motor insurance? Prima could be the place for you. Since 2015, we’ve been using our love of data and tech to rethink motor insurance and bring drivers a great experience at a great price. Our story began in Italy, where we’ve quickly become the number one online motor insurance provider. In fact, we’re trusted by over 4 million drivers. And now we’re expanding to help millions more drivers in the UK and Spain. To help fuel that growth, we need a Senior Machine Learning Site Reliability Engineer to join our Infrastructure team. This team is the beating heart of Prima. You’ll be joining over 300 engineers across software development, infrastructure, operations and security. Fueled by curiosity, experimentation and collaboration, you’ll help deliver scalable, impactful solutions that shape the future of insurance.

Job Responsibility:

  • Hands-on Reliability & System Engineering: Design, build, and operate reliable and scalable systems by defining and monitoring SLOs/SLIs, working directly on production infrastructure, and collaborating closely with software engineers on system design and reliability improvements
  • Automation, Operations & Incident Response: Actively develop automation for infrastructure and operational workflows to eliminate toil and reduce MTTR, participate in and lead incident response, and drive blameless post-incident reviews with concrete follow-ups implemented in code and tooling
  • Performance, Capacity & Security: Continuously analyze and optimize system performance and cost, provide data, insights, and recommendations to inform capacity planning, and support security best practices through hands-on vulnerability remediation and threat mitigation

Requirements:

  • SRE & Cloud Engineering: Hands-on experience with SRE practices in production, strong AWS expertise, Kubernetes, networking, DNS, and Infrastructure as Code (Pulumi preferred, Terraform a plus)
  • Automation, Software Engineering and MLOps: Demonstrate strong software engineering fundamentals with an emphasis on code quality and maintainability. This includes solid Python proficiency and deep knowledge of the Python ecosystem (testing, debugging, packaging), hands-on experience with PySpark, and a consistent focus on writing clean, well-structured, and maintainable code. Familiarity with MLOps practices such as model registries, model versioning, retraining workflows, and end-to-end deployment lifecycles is also expected
  • Reliability, Data & Operations: Add stakeholder engagement and mentoring e.g. lead incident response and RCAs, improve system reliability, and engage stakeholders to propose solutions, share learnings, and mentor others

Nice to have:

  • Regulated Environments & Security: Experience operating in highly regulated industries (e.g. Insurance, Banking, Healthcare), managing sensitive data, and supporting secure networking setups, including exposure to security technologies such as Cloudflare
  • Distributed Systems & Microservices: Strong understanding of microservices architectures, their principles and trade-offs, with the ability to troubleshoot and maintain distributed systems and supporting technologies (RabbitMQ, Kafka, PostgreSQL, Redis)
  • Observability & Platform Operations: Hands-on experience with Datadog for platform and application monitoring, performance optimisation, and solid fundamentals in database structures and operational troubleshooting, with exposure to systems built in languages such as Rust and Elixir
What we offer:
  • Grow with us: access to learning resources, mentorship and a growth plan tailored to you
  • Thrive and perform: private healthcare, gym discounts, wellbeing programs and mental health support

Additional Information:

Job Posted:
January 20, 2026

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Senior Machine Learning Site Reliability Engineer

Senior Software Engineer, Backend

As a Senior Software Engineer, Backend specializing in database architecture and...
Location
Location
United States , San Francisco
Salary
Salary:
150000.00 - 240000.00 USD / Year
chefrobotics.ai Logo
Chef Robotics
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
  • 7+ years of professional experience in backend development roles with demonstrated leadership experience
  • Expert knowledge of relational databases (MySQL, PostgreSQL) including schema design, optimization, and administration
  • Strong proficiency with Python and JavaScript/TypeScript with advanced software engineering skills
  • Extensive experience leading projects with at least two web frameworks: Flask, FastAPI, Django, Node.js, or Next.js
  • Proven experience designing and implementing RESTful and GraphQL APIs at scale
  • Advanced understanding of containerization (Docker) and orchestration (Kubernetes) technologies
  • Experience with cloud infrastructure and deployment (AWS, GCP, or Azure) in production environments
  • Proven experience leading complex backend projects and mentoring junior engineers
  • Understanding of data requirements for robotics or automation systems
Job Responsibility
Job Responsibility
  • Lead the design, implementation, and optimization of database schemas to support robot operations, telemetry, recipe management, and system analytics
  • Develop robust data migration strategies and version control for database schema evolution
  • Implement efficient query optimization and indexing strategies to support high-throughput robot operations
  • Establish data integrity protocols and backup systems to ensure operational continuity across customer deployments
  • Create scalable data access layers that balance security, performance, and maintainability
  • Mentor team members on database design patterns and optimization techniques
  • Lead the development and maintenance of scalable APIs to serve robot control systems, dashboards, and monitoring tools
  • Design and implement secure authentication and authorization mechanisms across backend services
  • Develop robust middleware for processing and validating data between robotics subsystems
  • Create service interfaces that enable efficient communication between robotics components and cloud services
What we offer
What we offer
  • medical, dental, and vision insurance
  • commuter benefits
  • flexible paid time off (PTO)
  • catered lunch
  • 401(k) matching
  • early-stage equity
  • Fulltime
Read More
Arrow Right
New

Senior Site Reliability Engineer

The Windows and Devices mission is to create innovative, trusted, and open produ...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Information Technology, or related field AND 6+ years technical experience in software engineering, network engineering, or systems administration
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 8+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience
  • 3+ years technical experience working with large-scale cloud or distributed systems
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • Independently designs, creates, tests, and deploys changes through a safe deployment process (SDP) to enhance code quality and improve the observability, security, reliability and operability of platforms, systems, and products at scale
  • Leverages technical expertise in the infrastructure of cloud technologies and specific products to advocate for, or directly contribute to the automation to improve the availability, security, quality, observability, reliability, efficiency, observability, and performance of related sets of products
  • Leverages end-to-end technical expertise and telemetry analysis alongside advanced artificial intelligence (AI) and machine learning (ML) algorithms to identify patterns and opportunities to implement configuration and data changes
  • Shares insights and best practices via documented artifacts that can be applied to improve development and operations across related sets of systems, platforms, and/or products
  • Writes code, scripts, systems, and/or artificial intelligence (AI)/machine learning (ML) platforms to automate operations tasks at scale
  • Develops, maintains, and implements capacity planning models and monitoring tools to forecast product capacity, related security risk, and resource demands
  • Handles incidents during on-call shifts assessing impact, troubleshooting complex problems, taking appropriate action to mitigate impact, and heading investigations to address root cause(s)
  • Leverages existing tools and automation, including the safe deployment process (SDP), to enable product engineering teams within their organization to increase the velocity in which they can reliably and safely implement changes in production
  • Draws insights from performance and resource monitoring across products and services within their organization to identify whether there is a need to optimize algorithms, security, infrastructure, or architecture
  • Analyzes data from telemetry pipelines and monitoring tools that detail operations metrics of systems, platforms, or products operating at scale
  • Fulltime
Read More
Arrow Right

Senior Machine Learning Engineer

Security represents the most critical priorities for our customers in a world aw...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Data Science, Engineering, or a related technical field
  • 7+ years of overall experience
  • 5+ years of hands‑on software engineering experience writing production‑quality code
  • 3+ years designing and implementing end‑to‑end software systems
  • Solid understanding of machine learning fundamentals, model evaluation, experimentation, and performance trade‑offs
  • Experience building or operationalizing LLM / Generative AI systems, including RAG, prompt engineering, or agent‑based architectures
  • Proven ability to collaborate across disciplines and operate with autonomy at senior IC scope
Job Responsibility
Job Responsibility
  • Design, develop, and deploy AI / ML systems across the full lifecycle, including data ingestion, feature engineering, model training, evaluation, and production integration
  • Build and optimize Generative AI and LLM‑based systems, including agentic workflows, prompt engineering, retrieval‑augmented generation (RAG), and fine‑tuning approaches
  • Write production‑grade code (Python, C#, and/or Java) with a strong focus on scalability, performance, security, testability, and maintainability
  • Partner closely with engineering, product management, and applied science teams to translate business and customer requirements into robust technical solutions
  • Ship and operate large‑scale AI services in cloud environments, with ownership of reliability, latency, throughput, accuracy, and cost efficiency
  • Define and execute model evaluation strategies, including offline experiments, online monitoring, drift detection, bias analysis, and feedback loops
  • Implement MLOps best practices, including CI/CD for models, versioning, rollout strategies, observability, and live‑site monitoring
  • Apply Responsible AI principles—privacy, security, explainability, fairness, and compliance—throughout system design and deployment
  • Stay current with advancements in GenAI, LLM frameworks, and ML infrastructure, assessing feasibility and impact for enterprise security scenarios
  • Contribute technical leadership by reviewing designs, mentoring peers, and raising the overall engineering and scientific bar of the team
  • Fulltime
Read More
Arrow Right

Senior Manager - DevSecOps & Site Reliability Engineering

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States
Salary
Salary:
118450.00 - 236900.00 USD / Year
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
March 30, 2026
Flip Icon
Requirements
Requirements
  • 7+ years of DevSecOps & SRE leadership in hybrid enterprise environments
  • Proven release management for multi-team agile trains (7+ teams)
  • Hands-on experience with CI/CD (GitHub/ADO), artifact management, code scanning, and observability stacks
  • Deep knowledge of security frameworks and compliance in healthcare-grade systems
  • Strong coaching, stakeholder management, and executive communication skills
  • 3+ years in change/release management, incident/problem management, and ITIL frameworks
  • Experience with cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes), and observability tools
  • Experience managing vendor/contractor teams
  • Bachelor’s Degree or equivalent work experience in Computer Science, Information Systems, Data Engineering, Data Analytics, Machine Learning, or related field required
Job Responsibility
Job Responsibility
  • Architect and implement DevSecOps and SRE practices for hybrid environments (on-prem and cloud)
  • Define and execute DevSecOps strategy, including policy, standards, and guardrails
  • Oversee CI/CD pipelines, security automation, and incident management
  • build/operate pipelines with integrated quality gates, code scanning (Sonar), secrets management, SBOM, and IaC
  • Lead SRE functions: SLIs/SLOs, error budgets, resilience engineering, performance, and capacity planning
  • Deploy, manage, and optimize observability and event management platforms
  • stand up metrics, tracing, logging, and immutable logging for governance and audits
  • Coordinate releases across 7+ scrum teams, aligning regression/UAT calendars and compliance gates
  • lead and govern change/release management processes, including CAB participation and risk mitigation
  • Champion security-by-design: threat modeling, shift-left testing, dependency hygiene, data segmentation, and zero-trust
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

The AI Platform organization at Microsoft builds the end-to-end Azure AI stack/P...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience writing production code in building internet scale services and distributed systems
  • Ability to debug, read code and work on a large and increasing codebase
  • Engineering knowledge of machine learning systems and data pipelines
  • Experience mentoring other developers, working partners and being a team player
  • Excellent communication and presentation skill
Job Responsibility
Job Responsibility
  • Design, implement, and support scalable, reliable, high-performance services
  • Write clean and concise code with unit tests
  • Design, implement, and support new features as well as extend existing systems
  • Investigate live site issues and implement and deploy fixes
  • Participate in an on-call rotation
  • Drive quality engineering via code reviews and design discussions
  • Fulltime
Read More
Arrow Right

Senior Mechanical Engineer

As a Senior Mechanical Engineer, you will play a key role in taking SunDrive’s p...
Location
Location
Australia , Sydney (Kurnell)
Salary
Salary:
120000.00 - 135000.00 AUD / Year
sundrivesolar.com Logo
Sundrive
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Mechanical or Mechatronics Engineering
  • 5+ years of relevant industry experience in equipment, machinery, or product development environments
  • Strong experience across the full mechanical design lifecycle, from concept through to release and support
  • Proficiency in 3D CAD and 2D drawings, including GD&T
  • Solid understanding of fluid systems, piping, pumps, and flow-related mechanical design
  • Experience applying first-principles engineering, tolerance analysis, DFMEA, and structured validation approaches
  • Hands-on experience with prototyping tools such as CNC machining, 3D printing, or rapid fabrication techniques
  • Strong subsystem integration experience across mechanical, electrical, and software interfaces
  • Sound material selection knowledge, particularly for corrosive or chemically aggressive environments
  • A practical, analytical problem-solving mindset with the ability to work independently and collaboratively
Job Responsibility
Job Responsibility
  • Leading mechanical design activities across the full lifecycle, including concept development, detailed design, prototyping, testing, validation, and release to production
  • Designing mechanical systems, components, and assemblies for SunDrive’s solar development and manufacturing tools, balancing performance, safety, reliability, and cost
  • Preparing and presenting design concepts, design reviews, DFMEAs, safety risk assessments, and technical recommendations at defined project gates
  • Supporting system integration, installation, commissioning, and validation at SunDrive or partner sites, including limited domestic or international travel
  • Collaborating closely with software, electrical, process, and integration teams to ensure mechanical designs interface cleanly at the system level
  • Working with suppliers and partners to support fabrication, assembly, quality control, and design-for-manufacture improvements
  • Continuously improving designs based on test data, field learning, customer feedback, and reliability insights
  • Creating clear, robust documentation, including drawings, specifications, BOMs, test plans, and operating procedures
  • Mentoring junior engineers and technicians through design reviews, technical guidance, and hands-on problem solving
  • Escalating major design trade-offs, risks, or non-standard solutions to the Engineering Director as required
What we offer
What we offer
  • Access SunDrive’s Employee Stock Option Plan (ESOP)
  • Company-wide L&D Fund for courses, coaching, and technical development
  • Dedicated weekly learning time
  • Up to 2 days of paid L&D leave per year
  • 4 weeks annual leave + unlimited rollover
  • 10 days paid Personal/Carer’s Leave + unlimited rollover
  • Parental Leave: 13 weeks primary / 6 weeks secondary caregiver leave
  • Paid antenatal leave and support for early arrivals, surrogacy, and return-to-work planning
  • 2 days paid Compassionate Leave per occasion
  • Free, confidential EAP support for employees and immediate family
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

Online advertising powers many of the free online experiences we rely on—search,...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's or master's degree in computer science or related technical field AND 6+ years of engineering experience
  • Strong coding proficiency in C/C++, C#, or Java
  • Experience building backend services, distributed systems, or high-scale APIs
  • Hands-on experience with big data technologies such as Kafka, Hadoop, Spark, ADLS, or similar
  • Strong debugging, problem-solving, and systems thinking capability
  • Solid analytical skills with experience using SQL, KQL, Hive, or similar for production investigations
  • Knowledge in some of the following areas: Machine Learning, Quantitative Analysis, Big Data Analytics, Business Analytics, Regression Modelling, Predictive Modelling, and Analytical Decision-Making
Job Responsibility
Job Responsibility
  • Analyse auction logs, marketplace signals, and system telemetry to diagnose KPI shifts and revenue trends
  • Investigate anomalies through large-scale data analysis (Hive/Presto/KQL/ADX) and propose data-driven fixes
  • Identify revenue opportunities through statistical analysis, hypothesis testing, and metric deep dives
  • Partner with data scientists to validate models, tune pricing logic, or refine optimization algorithms
  • Build APIs, microservices, and high-performance data pipelines that integrate with partners, marketplaces, and real-time decision workflows
  • Improve system observability using telemetry instrumentation, dashboards, tracing, and logs
  • Lead deep-dive debugging efforts during production issues—correlating system states, data patterns, and infrastructure behaviour
  • Own live-site operations including on-call rotations, root-cause analysis, and long-term reliability improvements
  • Drive performance tuning to reduce latency, eliminate bottlenecks, and enhance service health
  • Collaborate closely with PMs, data scientists, analysts, and engineering leaders to align technical execution with business metrics
  • Fulltime
Read More
Arrow Right

Senior Applied Scientist - Security Research

Security represents the most critical priorities for our customers in a world aw...
Location
Location
United States , Redmond
Salary
Salary:
119800.00 - 234700.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research)
  • OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
  • OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
  • OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check
Job Responsibility
Job Responsibility
  • Model development & optimization. Design, develop, fine‑tune, and evaluate models, summarization, and reasoning
  • Data & evaluation at scale. Build/extend data pipelines for curation/labeling/feature stores
  • author offline eval harnesses
  • run A/Bs
  • define guardrails and success metrics
  • Production ML engineering. contribute to service code and configs
  • add monitoring, tracing, dashboards, and auto‑scaling
  • participate in on‑call and postmortems to improve live‑site reliability
  • Collaboration & mentoring. Partner across PM/ENG/Research teams and beyond
  • identify AI technologies to create an adaptive and scalable solution to provide protection for our customers, share methods and code, review PRs, improve reproducibility and documentation
  • Fulltime
Read More
Arrow Right