CrawlJobs Logo

Principal Engineer, AI Model Lifecycle

crusoe.ai Logo

Crusoe

Location Icon

Location:
United States , San Francisco

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

260000.00 - 326000.00 USD / Year

Job Description:

The Principal Software Engineer for the Model LifeCycle team will play a crucial role in building a comprehensive managed platform for the entire application development lifecycle, with a specific focus on leveraging Machine Learning models, including Large Language Models (LLMs). This role offers significant 0 → 1 ownership — you'll be designing and building core systems from first principles.

Job Responsibility:

  • Manage fine-tuning systems for large foundation models (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling
  • Implement and maintain end-to-end training pipelines for Large Language Models
  • Distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling)
  • Agent execution infrastructure
  • Dataset, model, and experiment management: versioning, lineage, evaluation, and reproducible fine-tuning at scale
  • Work closely with product, business, and platform teams to shape the core abstractions and APIs of the system
  • Influence long-term architectural decisions around training runtimes, scheduling, storage, and model lifecycle management
  • Contribute to and engage with the open-source LLM ecosystem

Requirements:

  • Advanced degree in Computer Science, Engineering, or a related field
  • 10-15+ years of industry experience driving impactful projects in the AI Space
  • Proven track record of delivering early-stage projects under tight deadlines
  • Expertise in using cloud-based services, such as, elastic compute, object storage, virtual private networks, managed database, etc.
  • Experience in Generative AI (Large Language Models, Multimodal)
  • Deep experience with AI infrastructure, including training, inference

Nice to have:

  • Proficiency in Golang or Python for large-scale, production-level services
  • Contributions to open-source AI projects such as vLLM or similar frameworks
  • Performance optimizations on GPU systems and inference frameworks
  • Experience working with PyTorch
  • Experience with training and fine-tuning LLMs
What we offer:
  • Restricted Stock Units
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal
  • Company paid commuter benefit
  • $300/month

Additional Information:

Job Posted:
February 21, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Principal Engineer, AI Model Lifecycle

Principal Engineer, AI Strategy and Innovation

Shape the architecture and execution of CLEAR’s AI platform strategy, from infra...
Location
Location
United States , New York
Salary
Salary:
250000.00 - 290000.00 USD / Year
clearme.com Logo
Clear
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 12+ years in software engineering and/or technical experience with deep expertise in AI systems, ML platforms, and data infrastructure
  • At least 5 years of experience with various AI technologies including GenAI, ML, Deep Learning, RPA or others
  • Proven ability to scale AI capabilities into high-throughput, low-latency environments
  • Strong technical background in cloud-native architectures (AWS or similar) and modern AI/ML stacks (TensorFlow/PyTorch, MLflow, RAG, MCP, etc.)
  • Experience leading AI strategy and platform adoption in enterprise-scale environments
  • Skilled at translating regulatory and compliance requirements into responsible AI practices
  • Track record of partnering closely with Product, Engineering, Analytics, and Security teams as well as business executives
  • Excellent communicator who can set a vision for AI, explain technical trade-offs, and influence executives, peers, and partners
  • Passionate about embedding AI into core products to deliver measurable impact for members and enterprise partners
Job Responsibility
Job Responsibility
  • Define and scale CLEAR’s AI strategy: spanning data pipelines, ML lifecycle management, and intelligent applications
  • Lead engineering execution for AI models (development, deployment, monitoring, retraining) with a focus on reliability, observability, and ethical AI practices
  • Modernize analytics and intelligence systems to deliver predictive insights and partner-facing transparency in real time
  • Operationalize trust in AI by embedding privacy, compliance, and security into all platforms and workflows
  • Influence cross-functional stakeholders across the business, fostering a culture of technical rigor, collaboration, and innovation, advising C Suite executives, leaders, and individual contributors
  • Lead the AI Governance group and drive best practices across business functions
  • Track and optimize KPIs on AI adoption, model performance, scalability, and business impact
What we offer
What we offer
  • Comprehensive healthcare plans
  • Family-building benefits (fertility and adoption/surrogacy support)
  • Flexible time off
  • Annual wellness stipend
  • Free OneMedical memberships for you and your dependents
  • A CLEAR Plus membership
  • A 401(k) retirement plan with employer match
  • Catered lunches every day
  • Fully stocked kitchens
  • Stipends and reimbursement programs for well-being and learning & development
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Principal Machine Learning Engineer

As a Principal Engineer on the ITSM team, you will get the opportunity to work o...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of total experience
  • Fluency in at least 1 scripting, OOP language
  • Solid understanding of machine learning concepts and algorithms, including supervised and unsupervised learning, deep learning, and NLP
  • Familiarity with popular ML libraries like sci-kit-learn, Keras/TensorFlow/PyTorch, numpy, pandas
  • Good Understanding of Machine Learning project lifecycle
  • Familiarity with MLOps and experience with scaling and deploying Machine Learning models
Job Responsibility
Job Responsibility
  • Work on cutting-edge AI and ML algorithms that help modernize IT Operations by reducing MTTR (mean time to resolve), and MTTI (Mean time to identify)
  • Use software development expertise to solve difficult problems, tackling complex infrastructure and architecture challenges
  • Lead engineers to drive involved projects from technical design to launch
  • Collaborate with other teams and internal customers to set expectations, gather input, and communicate results
  • Work with a distributed, world-class team shaping the future of AIOps
  • Master Generative AI
  • Become a machine learning maestro
  • Collaborate with diverse minds
  • Make a tangible impact
  • Routinely tackle complex architectural challenges
What we offer
What we offer
  • Health coverage
  • Paid volunteer days
  • Wellness resources
  • Fulltime
Read More
Arrow Right

Principal Data Engineer

We are on the lookout for a Principal Data Engineer to help define and lead the ...
Location
Location
United Kingdom
Salary
Salary:
Not provided
dotdigital.com Logo
Dotdigital
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience delivering python-based projects in the data engineering space
  • Extensive experience working with SQL and NoSQL database technologies (e.g. SQL Server, MongoDB & Cassandra)
  • Proven experience with modern data warehousing and large-scale data processing tools (e.g. Snowflake, DBT, BiqQuery, Clickhouse)
  • Hands on experience with data orchestration tools like Airflow, Dagster or Prefect
  • Experience using cloud environments (e.g. Azure, AWS, GCP) to process, store and surface large scale data
  • Experience using Kafka or similar event-based architectures e.g. (Pub/Sub via AWS SQS, Azure EventHubs, AWS Kinesis)
  • Strong grasp of data architecture and data modelling principles for both OLAP and OLTP workloads
  • Capable in the wider software development lifecycle in terms of agile ways of working and continuous integration/deployment of data solutions
  • Experience as a lead or Principal Engineer on large-scale data initiative or product builds
  • Demonstrated ability to architect data systems and data structures for high volume, high throughput systems
Job Responsibility
Job Responsibility
  • Lead the design and implementation of scalable, secure and resilient data systems across streaming, batch and real-time use cases
  • Architect data pipelines, model and storage solutions that power analytical and product use cases
  • using primarily Python and SQL via orchestration tooling that run workloads in the cloud
  • Leverage AI to automate both data processing and engineering processes
  • Assure and drive best practices relating to data infrastructure, governance, security and observability
  • Work with technologists across multiple teams to deliver coherent features and data outcomes
  • Support the data team to help adopt data engineering principles
  • Identify, validate and promote new tools and technologies that improve the performance and stability of data services
What we offer
What we offer
  • Parental leave
  • Medical benefits
  • Paid sick leave
  • Dotdigital day
  • Share reward
  • Wellbeing reward
  • Wellbeing Days
  • Loyalty reward
  • Fulltime
Read More
Arrow Right

Principal Machine Learning System Engineer

As a Principal Machine Learning System Engineer on the AI & ML Platform team, yo...
Location
Location
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience in building Machine Learning and AI infra/platform/system (generally 5+ years)
  • Comprehensive ML lifecycle expertise: proven experience developing, deploying, and maintaining end-to-end ML systems, from data engineering to model serving and monitoring
  • Large-scale system design: Extensive experience designing and building scalable, fault-tolerant, and high-performance distributed systems for machine learning
  • Proficiency with frameworks and languages: Expert-level proficiency in Python and ML frameworks like PyTorch, TensorFlow, or JAX. Familiarity with other languages like Go, Java, or Scala is also beneficial
  • MLOps and automation: Deep experience implementing MLOps, CI/CD pipelines, and automation for continuous training, deployment, and monitoring of ML models
Job Responsibility
Job Responsibility
  • Collaborate with your teammates to solve complex problems, from technical design to launch
  • Deliver cutting-edge solutions that are used by other Atlassian teams and products to build AI features that reach millions of customers
  • Deliver code reviews, documentation & bug fixes within a strong engineering culture
  • Partner across engineering teams to take on company-wide initiatives spanning multiple projects
  • Mentor junior members of the team
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
Read More
Arrow Right

Principal Automation Engineer

We are seeking a Principal Automation Engineer to lead and drive innovation in a...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or master’s degree in computer science, cybersecurity, data science, or related engineering field
  • proven experience (8+ years) in cybersecurity, with at least 3+ years in automation-focused roles
  • deep understanding of cybersecurity frameworks and concepts, including attack vectors, threat landscapes, and defence mechanisms
  • strong experience with SIEM/SOAR/ and EDR/XDR platforms and tools
  • experience in Machine Learning (ML) and Agentic AI applied for security use-cases
  • experience with anomaly detection, behavioural modeling, and predictive analytics in cybersecurity contexts
  • experience integrating machine learning models into security operations workflows in enterprise environments
  • proficiency in languages such as Python, Go, SPL, YaraL, and building automation frameworks
  • hands-on experience with big data technologies and cloud environments (AWS, Azure, GCP)
  • familiarity with regulatory requirements and compliance frameworks (e.g., GDPR, NIST, ISO 27001)
Job Responsibility
Job Responsibility
  • Drive the SOAR development lifecycle, in support of security operations and engineering teams
  • develop SOAR playbooks and logic
  • build integrations across SIEM, SOAR, EDR, identity platforms, and cloud-native services
  • write, test, and maintain automation scripts and workflows
  • deliver API solutions for SOC and enterprise Business Units
  • design and implement reusable automation services, APIs, and playbooks
  • maintain documentation for scripts, integrations, and workflows
  • debug and resolve technical issues in the automation lifecycle
  • apply advanced analytics, Machine Learning, and AI for security automation
  • partner with SOC/IR leadership and IT stakeholders to gather SOAR requirements and develop solutions
What we offer
What we offer
  • Health and wellbeing benefits
  • career development programs
  • unconditional inclusion
  • flexibility to manage work and personal needs
  • Fulltime
Read More
Arrow Right

Principal Machine Learning Engineer

As a Principal Engineer on the ITSM team, you will get the opportunity to work o...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of total experience
  • Fluency in Python
  • Solid understanding of machine learning concepts and algorithms, including supervised and unsupervised learning, deep learning, and NLP
  • Familiarity with popular ML libraries like sci-kit-learn, Keras/TensorFlow/PyTorch, numpy, pandas
  • Good Understanding of Machine Learning project lifecycle
  • Experience in architecting and implementing high-performance RESTful microservices (API development for ML Models)
  • Familiarity with MLOps and experience with scaling and deploying Machine Learning models
Job Responsibility
Job Responsibility
  • Shape the future of AIOps
  • Master Generative AI
  • Become a machine learning maestro
  • Collaborate with diverse minds
  • Make a tangible impact
  • Routinely tackle complex architectural challenges, spar with other principal engineers to build ML pipelines and models that scale for thousands of customers
  • Lead code reviews & documentation as well as take on complex bug fixes, especially on high-risk problems.
  • Develop leadership skills
  • Fulltime
Read More
Arrow Right

Principal AI Technology & Innovation Specialist

The Principal AI Technology & Innovation Specialist at NTT DATA is a key role fo...
Location
Location
South Africa , Johannesburg
Salary
Salary:
Not provided
nttdata.com Logo
NTT DATA
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree or equivalent in Computer Science, Artificial Intelligence, Data Science, or a related field
  • Advanced degrees (MSc/PhD) in AI/ML fields preferred
  • TOGAF, COBIT, or related enterprise architecture certifications are beneficial
  • Certifications in machine learning or cloud-based AI platforms (e.g., AWS Certified Machine Learning Specialty, Google Cloud AI Engineer) are advantageous
  • Extensive experience in leading enterprise AI innovation and architecture initiatives
  • Proven track record of evaluating, piloting, and operationalizing AI solutions in enterprise environments
  • Experience working across multiple industries and large-scale IT organizations
  • Hands-on experience in AI/ML development, integration, and lifecycle management
  • Familiarity with regulations governing AI use, such as the EU AI Act, and experience in operationalizing compliance measures
  • Deep knowledge of modern AI paradigms including generative AI (e.g., LLMs), machine learning infrastructure, AI model lifecycle, and MLOps
Job Responsibility
Job Responsibility
  • Lead the evaluation and strategic assessment of emerging AI technologies, platforms, and vendor solutions, advising on technical and ethical feasibility
  • Design and guide the development of AI capabilities and innovation pilots, translating business goals into AI-enabled solutions
  • Define architectural blueprints for integrating AI technologies into IT systems and product platforms, ensuring security, scalability, and alignment with enterprise standards
  • Develop frameworks for responsible AI adoption including model evaluation, explainability, privacy, compliance (e.g., EU AI Act), and ethical use
  • Partner with product and platform teams to align AI innovations with enterprise technology strategy and business outcomes
  • Drive initiatives for AI prototyping, proof-of-concepts (PoCs), and production readiness assessments
  • Monitor vendor roadmaps and contribute to the strategy for selecting and onboarding external AI capabilities
  • Act as a center of excellence for AI within the IT organization, driving awareness, knowledge sharing, and standardization
  • Collaborate with enterprise architects and platform leads to integrate AI tools into data infrastructure, software architecture, and cloud environments
  • Perform technical due diligence on third-party AI services and models, ensuring fit-for-purpose and cost-effective solutions
What we offer
What we offer
  • Workplace embraces diversity and inclusion – it’s a place where you can grow, belong and thrive
  • Fulltime
Read More
Arrow Right