CrawlJobs Logo

Senior Manager, AI Infrastructure and Operations

Japan, Tokyo · Job Posted February 20, 2026
Apply Position
Job Link Share

Job Description

The Sr. Manager/Staff Engineer, AI Infrastructure & MLOps Engineering is a senior technical leader responsible for architecting, building, and scaling Pfizer’s AI infrastructure and developer platforms. This role leverages extensive experience in cloud engineering, DevOps, and MLOps to deliver robust, high-performance solutions supporting advanced AI/ML workloads in biotechnology, healthcare, and enterprise technology. The successful candidate will drive innovation in automation, reliability, and scalability, enabling scientists and engineers to rapidly develop, deploy, and monitor machine learning models in production environments.

Job Responsibility

  • Design, implement, and own large-scale cloud-based HPC and MLOps platforms supporting AI model training, genomic sequencing, and precision medicine
  • Architect multi-environment clusters (AWS, GCP, Azure), enabling GPU/FPGA workloads and advanced observability
  • Lead the development of developer and cloud platforms, including internal engineering accelerators and reusable toolsets
  • Design, implement, and manage unified platform catalogs using Backstage, enhancing developer experience and application metadata management
  • Develop custom plugins and APIs for Backstage to support internal engineering workflows and documentation
  • Build and maintain Python-based automation frameworks, CI/CD pipelines, and Infrastructure-as-Code (Terraform, Helm, Pulumi, AWS CDK)
  • Operationalize containerized solutions using Docker and Kubernetes, integrating MLflow, Kubeflow, and other orchestration platforms
  • Implement robust automation for provisioning, configuring, and managing cloud resources across multiple environments
  • Lead the implementation of Service Level Indicators (SLIs), Service Level Objectives (SLOs), and advanced observability (Prometheus, Grafana, PagerDuty)
  • Develop and maintain APIs and services for model management, feature stores, and inference pipelines
  • Operationalize ML model serving at scale using frameworks such as TensorFlow Serving, TorchServe, KServe, and Seldon Core
  • Ensure compliance with industry standards (e.g., HIPAA, FDA) for data protection and reliability
  • Mentor engineers and lead cross-functional teams to deliver integrated solutions
  • Champion engineering excellence through design documentation, code reviews, and testing automation
  • Present at industry summits, author technical proposals, and contribute to open-source projects (Kubernetes, Helm, Go, Envoy)
  • Drive agile delivery, sprint planning, and performance optimization
  • Lead incident response and disaster recovery initiatives for mission-critical platforms
  • Foster a culture of shared ownership, transparency, and innovation

Requirements

  • 8+ years of hands-on software engineering experience in cloud infrastructure, DevOps, and MLOps
  • Deep expertise in Python, Kubernetes, Terraform, Helm, and CI/CD pipeline development
  • Proven experience architecting and operating containerized solutions on AWS, GCP, and Azure
  • Strong knowledge of Infrastructure-as-Code, distributed systems, and production system reliability
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field

Nice to have

  • Expertise in AWS cloud services (EC2, S3, Lambda, EKS, SageMaker, API Gateway, CloudFormation, IAM, etc.)
  • Experience deploying and customizing Backstage as a unified catalog for teams, services, and technical documentation
  • Experience building and deploying microservices and REST/gRPC APIs for AI model delivery
  • Familiarity with MLflow, Kubeflow, and other MLOps orchestration platforms
  • Proficiency with model serving frameworks (TensorFlow Serving, TorchServe, KServe, Seldon Core, BentoML, etc.)

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Senior Manager, AI Infrastructure and Operations

8 matching positions

Senior Manager, Operations Knowledge Systems & Process Design

This isn't traditional knowledge management. You're building the operating syste...
Location
Location
United States , Nashville
Salary
Salary:
Not provided
https://checkr.com Logo
Checkr
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of experience in operations, process improvement, knowledge management, or related fields
  • 3+ years leading teams or complex cross-functional initiatives
  • Demonstrated expertise in business process design, mapping, and optimization (Lean, Six Sigma, or similar methodologies)
  • Strong systems thinking—ability to see how knowledge, process, technology, and people interconnect
  • Proven ability to write clear, effective operational content that scales across audiences and channels
  • Data fluency: comfortable using metrics and analytics to drive decisions and measure impact
  • Experience building scalable solutions that work across multiple teams or functions
  • Excellent stakeholder management skills with ability to influence without authority
  • Clear, compelling communication—can translate complex systems into understandable frameworks
Job Responsibility
Job Responsibility
  • Design and evolve the knowledge infrastructure that powers compliance operations, customer support, and external help center content
  • Write and oversee the creation of content that works—clear, actionable knowledge that scales across channels and use cases
  • Develop and maintain structured taxonomies and leverage AI-powered approaches for organizing and surfacing unstructured content
  • Create systems that enable both human agents and AI systems to leverage knowledge effectively
  • Establish frameworks for knowledge quality, governance, and lifecycle management that scale with business growth
  • Map, document, and optimize cross-functional processes across compliance, support, and supply chain operations
  • Design processes that balance efficiency, quality, and customer experience outcomes
  • Build process frameworks that support continuous improvement and rapid iteration
  • Harness conversation analytics and AI to surface patterns, gaps, and opportunities in knowledge and process performance
  • Use operational data and performance metrics to identify knowledge and process gaps and translate insights into action
What we offer
What we offer
  • Lunch four times a week
  • Commuter stipend
  • Snacks and beverages
  • Fulltime
Read More
Arrow Right

Revenue Operations Manager

This is one of the most critical roles driving the scalability and financial per...
Location
Location
Sweden , Stockholm
Salary
Salary:
Not provided
mentimeter.com Logo
Mentimeter
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in Operations (Revenue, Sales or Marketing Ops), SaaS Sales or Consultancy
  • Highly driven, proactive, and action-oriented with a strong bias toward execution
  • Curious interest in leveraging AI and automation to drive smarter decisions and improve operational effectiveness
  • Excellent communicator with the ability to align and collaborate effectively with senior leadership and cross-functional teams
  • Ability to work cross-functionally and align operational initiatives with business goals
  • Attention to detail and a structured, problem-solving mindset
  • Familiarity with SaaS sales processes and CRM data models
Job Responsibility
Job Responsibility
  • Revenue Process Design and Implementation: Responsible for process design and driving scalability within our Enterprise Bow Tie funnel
  • Partnering with Revenue leaders to align Sales Ops initiatives with Mentimeter’s G2M strategy
  • Leading and contributing to cross-functional projects focused on revenue enablement and operational excellence
  • Implement process changes through tooling and data infrastructure, automating workflows where possible to ensure scalability
  • Drive cross-functional alignment and change management to ensure consistent process adoption and scalability
  • Tech Stack & System Enablement: Ownership of tools and systems that are the closest to your specialisation
  • Workflows and automation: Identify and implement workflow improvements that increase productivity and visibility throughout the funnel
  • Ensure data activation within the system
  • Ensure CRM data integrity: Responsible for legal compliance for the data in the tools and maintaining data hygiene
  • Having commercial ownership for driving renewal process and negotiations and optimise costs and tool ROI
What we offer
What we offer
  • Diverse and inclusive work environment
  • Continuous professional development
  • Access to a leadership program (including external personal coach)
  • Relevant education
  • Competitive compensation and benefits package, including pension contributions
Read More
Arrow Right

Engineering Manager, Infrastructure

As an Engineering Manager for the Infrastructure team, you’ll lead the engineers...
Location
Location
Canada; United States
Salary
Salary:
195000.00 - 285000.00 USD / Year
apollo.io Logo
Apollo.io
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of hands-on software or infrastructure engineering experience
  • 2+ years of experience leading teams of senior and staff-level engineers in platform, SRE, or infrastructure domains
  • Proven ability to design and operate large-scale distributed systems in cloud environments (preferably GCP or AWS)
  • Expertise with Kubernetes, Docker, Terraform, Ubuntu, and CI/CD pipelines
  • Familiarity with observability tools (Grafana, Prometheus, ELK, Datadog, NewRelic) and performance tuning
  • Strong grounding in networking, security, and reliability principles
  • Experience managing infrastructure costs, availability SLAs, and high-throughput systems at scale
Job Responsibility
Job Responsibility
  • Lead, coach, and grow a distributed team of high-impact Infrastructure Engineers
  • Partner with senior engineering leadership on strategic initiatives such as cloud migration, infrastructure scaling, platform reliability, and cost efficiency
  • Define and implement modern operational excellence practices, including SLOs, error budgets, incident reviews, and performance monitoring
  • Guide technical decision-making across key areas like Kubernetes, GCP, observability, networking, CI/CD, and IaC (Terraform, Ansible)
  • Collaborate with AI, Data, and Product Engineering teams to ensure infrastructure scalability for ML and AI-native workloads
  • Run effective 1:1s, career development conversations, and quarterly performance reviews
  • Support recruiting efforts to attract top engineering talent across time zones
What we offer
What we offer
  • Equity
  • Company bonus or sales commissions/bonuses
  • 401(k) plan
  • At least 10 paid holidays per year
  • Flex PTO
  • Parental leave
  • Employee assistance program and wellbeing benefits
  • Global travel coverage
  • Life/AD&D/STD/LTD insurance
  • FSA/HSA and medical, dental, and vision benefits
  • Fulltime
Read More
Arrow Right

Senior Systems Architect, DAM & Infrastructure

The Senior Systems Architect is a cross-functional role on the Design Operations...
Location
Location
United States , Bay Area
Salary
Salary:
153000.00 - 270000.00 USD / Year
block.xyz Logo
Block
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • AI fluency
  • 8+ years of experience in related roles (library science, mass communications, computer sciences, advertising, graphics design, marketing, or equivalent experience)
  • Track record of successful people management across multi-discipline teams
  • Ability to explain complex ideas to a variety of audiences, building collaborative relationships across teams
  • Excellent communication, problem solving, and analytical skills
  • Expertise with enterprise digital asset management systems, software, related tooling, and scalable workflows
  • Ability to design/extend cataloging taxonomy and define content policies
  • In-depth knowledge of file formats (print, digital, video) and media usage rights terminology
  • Familiarity with licensing agreements, talent contracts, and rights-management in the advertising, film, photography, and music industries
Job Responsibility
Job Responsibility
  • Create and manage consistent asset management processes for Square's global DAM and MAM, with a focus on ecosystem definition that leverages AI for processing, ingesting, metadata tagging, cataloging, versioning, and distribution of assets
  • Develop systems to scale assets across channels and markets, owning user-friendly optimizations for localization and QA. Obtain final asset approvals from necessary stakeholders and communicate to all partners when delivering
  • Review and fulfill new asset requests, consult and recommend alternative solutions, when appropriate, for end-users
  • Manage rights information and governance for all licensed content
  • auditing, decommissioning, and gathering insights into usage and content gaps
  • Partner with IT, Design Operations, and Integrated Production teams
  • defining project request processes, workflow optimization, and coordination of project pipeline
  • Align our assets pipeline process with other business units, ensuring our MAM and other internal creative file systems are linked and automated with Square's DAM
  • Work with our tooling vendors and guide the Asset Management team to plan, build, and rollout system updates, fixes, and new feature implementations
  • Help educate and mature the proper security and compliant usage of licensed assets
What we offer
What we offer
  • Remote work
  • medical insurance
  • flexible time off
  • retirement savings plans
  • modern family planning
Read More
Arrow Right

Senior Product Manager

As a Senior Product Manager for Private Cloud AI, you will lead the strategy, de...
Location
Location
United States , Spring, Texas
Salary
Salary:
117500.00 - 270000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree or equivalent in computer science, engineering or related field of study
  • MBA or advanced degree in computer science or engineering preferred
  • 8+ years of work experience in related field
  • Technical understanding and knowledge of the AI infrastructure industry
Job Responsibility
Job Responsibility
  • Define and execute a product strategy to unlock AI opportunities across the world’s largest organizations
  • Independently leads and drives the end to end strategy and operational product roadmap for one or more complex products
  • Defines the value proposition, target customer segments, and business case to bring one or more innovative and disruptive products to market
  • Synthesizes market requirements into marketing/customer details
  • Advises key stakeholders on the portfolio strategy across all phases of the lifecycle
  • Creates and drives goal alignment and collaborates across value chain partners to optimize margins and enable product success
What we offer
What we offer
  • Comprehensive suite of benefits for physical, financial and emotional wellbeing
  • Programs catered to helping you reach career goals
  • Unconditional inclusion celebrating individual uniqueness
  • Flexibility to manage work and personal needs
  • Fulltime
Read More
Arrow Right

AI Product Manager

We’re scaling AI and machine learning across our products, devices, and operatio...
Location
Location
United States , Boston
Salary
Salary:
121300.00 - 177900.00 USD / Year
simplisafe.com Logo
SimpliSafe
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of product management experience, including significant ownership of AI/ML or data-intensive products
  • Clear track record of shipping production ML systems (not just integrating third-party AI APIs), in close partnership with data science, ML engineering, and MLOps
  • Principal-level impact: leading cross-team initiatives, shaping strategy, and influencing senior stakeholders
  • Strong understanding of core ML concepts and lifecycle: data, labeling, training/validation, evaluation metrics, deployment, monitoring, and retraining
  • ML experience with at least one of following: computer vision or sensor data, LLM-powered applications (prompting, RAG, fine-tuning, evaluation) and/or hardware or edge products (e.g., on-device models, connectivity/latency trade-offs)
  • Familiarity with modern ML infrastructure (cloud platforms, model serving, CI/CD for ML, monitoring/alerting)
  • Comfortable going deep into data, metrics, and model behavior—not just the UX layer
  • Excellent communicator who can make complex AI topics clear to diverse audiences
  • Strong alignment with our values: customer-obsessed, low ego, highly collaborative, comfortable with ambiguity, and biased toward learning and iteration.
Job Responsibility
Job Responsibility
  • Define and communicate the multi-year roadmap for key AI/ML capabilities across SimpliSafe
  • Identify and prioritize AI opportunities where models and data can materially improve safety, customer experience, or efficiency—on both devices and cloud services
  • Make build-vs-buy decisions for AI capabilities in partnership with data science and engineering
  • Partner with data scientists, ML engineers, and MLOps to design and deliver end-to-end ML solutions—from problem framing through data, training, evaluation, deployment, and monitoring
  • Work with hardware and embedded teams to shape edge AI/ML experiences (e.g., on-device detection, low-latency decisions, bandwidth-aware designs)
  • Define model-level requirements (metrics, latency, cost, guardrails) and connect them to business outcomes (e.g., false alarm reduction, detection accuracy, handle time, CSAT)
  • Translate product needs into requirements for ML platform capabilities (model serving, observability, experiment tracking, human-in-the-loop tools)
  • Lead product direction for LLM and multimodal use cases (e.g., text, vision, sensor data)
  • Decide when to use prompt engineering, RAG, fine-tuning, or traditional ML—and how to evaluate quality, safety, and hallucinations
  • Design workflows that incorporate human review and escalation where needed
What we offer
What we offer
  • A mission- and values-driven culture and a safe, inclusive environment where you can build, grow, and thrive
  • A comprehensive total rewards package that supports your wellness and provides security for SimpliSafers and their families
  • Free SimpliSafe system and professional monitoring for your home
  • Employee Resource Groups (ERGs) that bring people together, give opportunities to network, mentor and develop, and advocate for change
  • Participation in our annual bonus program, equity, and other forms of compensation, in addition to a full range of medical, retirement, and lifestyle benefits.
  • Fulltime
Read More
Arrow Right

Senior Manager, Performance AI/ML Network Deployment Engineering

The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering i...
Location
Location
United States , Santa Clara
Salary
Salary:
210400.00 - 315600.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Expertise in networking and performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements
  • Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains, namely compute, network, storage
  • Experience in working with large customers such as Cloud Service Providers and global enterprise customers
  • Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc
  • Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
  • Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
  • Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends
  • Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista experience is preferred
  • Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market
  • Excellent communication level from engineer to mid-management to C-level of audience
Job Responsibility
Job Responsibility
  • Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models
  • Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability
  • Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads
  • Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations
  • Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins
  • Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement
  • Engage with AMD product groups to drive resolution of application and customer issues
  • Develop and present training materials to internal audiences, at customer venues, and at industry conferences
Read More
Arrow Right

Risk and Compliance Senior Manager

From day one at Unobravo, we’ve been on a mission to make mental health support ...
Location
Location
Italy , Milan
Salary
Salary:
Not provided
unobravo.com Logo
Unobravo
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years in senior compliance roles, with mandatory experience in a regulated market
  • healthcare sector (digital and/or physical) experience is a plus
  • Strong knowledge of European regulations, including data protection, healthcare, digital marketing, and consumer protection
  • Ability to anticipate and address evolving AI regulations, ensuring training, compliance, and organisational readiness
  • Global or pan-European experience, with ability to balance local compliance needs with a worldwide strategy
  • Excellent communication skills to translate complex compliance topics into practical solutions for diverse stakeholders
  • Proactive and hands-on, able to balance strategic initiatives with operational needs
  • Fluency in Italian and English, with international experience
  • presence in Italy is a strong advantage
Job Responsibility
Job Responsibility
  • Strategic Compliance Leadership: Define and implement a practical compliance framework across products, marketing, and infrastructure, balancing scale-up needs with risk management
  • Clinical Collaboration: Ensure compliance with healthcare regulations relevant to our role as a medical center
  • Compliance Management: Partner with product, marketing, and security to ensure GDPR, healthcare advertising, and NIS2 compliance. Provide strategic advice on privacy and health regulation, enabling Privacy by Design and Compliance by Design
  • Cross-functional Collaboration: Work closely with legal, IT, finance, HR, clinical, operations, and leadership to integrate compliance into all business decisions
  • Risk Management: Identify and mitigate risks across privacy, data, marketing, and communications. Lead DPIAs, LIAs, and other assessments
  • Global & Local Balance: Develop a compliance strategy that ensures our global product meets local regulatory requirements
  • Policies & Training: Create internal policies, deliver training, and build a culture of compliance and privacy awareness
  • Audit & Incident Response: Lead audits, monitor compliance, manage incidents, and oversee whistleblowing and reporting processes
  • Stakeholder Communication: Represent compliance priorities to leadership and advocate for key initiatives
  • Regulatory Monitoring: Track regulatory changes and best practices, updating company policies as needed
What we offer
What we offer
  • Flexibility to work from anywhere within your country of hire
  • Home workstation budget
  • Up to two coworking sessions a month
  • Exclusive discounts on psychotherapy sessions
  • Company retreats, team-building experiences, aperitivo parties
  • Free online language training
  • Birthday day off
  • Additional day off on World Mental Health Day
  • Inclusive parental leave
  • Fulltime
Read More
Arrow Right