CrawlJobs Logo

AI Engineer - Instrumentation

arize.com Logo

Arize

Location Icon

Location:
United States

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

125000.00 - 225000.00 USD / Year

Job Description:

Join Arize AI's Engineering team working on OpenInference – the industry-leading open-source standard for AI observability, LLM and agent instrumentation. You'll be at the forefront of defining how organizations instrument, trace, and observe their AI applications across the entire ecosystem, shaping the future of AI development worldwide.

Job Responsibility:

  • Build new LLM and instrumentation libraries for emerging LLM providers and agent frameworks
  • Maintain and enhance existing instrumentation across Python and TypeScript ecosystems, and others (OpenAI, Anthropic, LlamaIndex, CrewAI and many more)
  • Drive improvements to semantic conventions and OpenTelemetry standards that define AI observability
  • Collaborate with the global developer community through GitHub, Slack, and conferences, as well as Arize PMs and solution architects
  • Take complex problems from ideation to completion with full ownership and accountability

Requirements:

  • 3-5+ years of software development experience shipping production code
  • Expert-level proficiency in both Python and TypeScript
  • Community-oriented mindset with genuine passion for collaborative open-source development
  • Deep interest in AI/LLM ecosystem with desire to stay current on emerging technologies
  • Strong analytical skills to distill requirements from diverse sources and stakeholders

Nice to have:

  • Experience building SDK clients, instrumentation libraries, or platform APIs
  • Hands-on experience with AI/ML observability or evaluation systems
What we offer:
  • competitive equity package
  • comprehensive benefits package, including medical, dental, vision
  • a 401(k) plan
  • unlimited paid time off
  • a generous parental leave plan
  • additional support for mental health and wellness
  • WFH monthly stipend to pay for co-working spaces

Additional Information:

Job Posted:
December 06, 2025

Employment Type:
Fulltime
Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI Engineer - Instrumentation

AI Product Engineer, New Grad

We are looking for exceptionally motivated new graduates to join our high-perfor...
Location
Location
United States
Salary
Salary:
100000.00 - 135000.00 USD / Year
arize.com Logo
Arize
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Taking ownership of complex problems from day one
  • Putting in the extra effort to learn new technologies rapidly
  • Pushing yourself outside your comfort zone regularly
  • Maintain high standards of code quality despite being new to the industry
  • Think critically about system and product design rather than just following instructions
  • Experience working with GraphQl or a comparable API technology
  • Experience working with ML, analytics, data science or data visualization products
  • Working knowledge of Machine Learning and/or Data Science
  • First-hand experience working with large language models (LLMs) or developing AI products
Job Responsibility
Job Responsibility
  • Part of the core team that drives AI product innovation
  • Understanding how some of the most impactful engineering teams are developing AI and LLM-powered applications
  • Building the right tools to enable them to do their best work
  • Working on product solutions ranging from clean APIs that magically instrument applications, interactive playgrounds for prompt engineering and agent development, or scaling up real-time evaluation infrastructure to handle millions of annotations per second
  • Diving into complex technical challenges like: Computing model evaluation metrics across billions of data points
  • Creating AI Agents that help customer troubleshoot their own applications
  • Building on top of a cutting-edge custom OLAP database
  • Implementing advanced dimensionality reduction techniques
  • Building systems that handle millions of annotations per second
What we offer
What we offer
  • Competitive equity package
  • Comprehensive benefit package including: medical, dental, vision, 401(k) plan, unlimited paid time off, generous parental leave plan, others for mental and wellness support
  • WFH monthly stipend to pay for co-working spaces
  • Fulltime
Read More
Arrow Right

AI Automation Engineer

AI Automation Engineer role at Brighte, a company building a platform to enable ...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
brighte.com.au Logo
Brighte Capital
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of software engineering experience
  • At least 2 years building and operating LLM or AI-powered systems in production
  • Experience working in a fintech, lending or payment business is highly advantageous
  • Solid understanding of LLM integration patterns: RAG architecture, prompt engineering, embedding models, vector databases, and output validation
  • Hands-on experience with workflow automation platforms (n8n, Temporal, or similar) in a production context
  • Working knowledge of AWS Bedrock, or experience integrating multiple LLM providers
  • Awareness of AI compliance considerations in regulated environments, data handling, audit logging, and explainability basics
  • An AI-first mindset, proactively looking for opportunities to apply AI thoughtfully and evaluating model outputs critically
Job Responsibility
Job Responsibility
  • Design and build LLM-powered features and automation workflows including RAG pipelines, agent workflows, and tool-calling integrations
  • Contribute to data-flow design, context management, and model selection decisions
  • Participate in technical design reviews and help establish good patterns for prompt engineering, retrieval, and output validation
  • Lead the development of AI tools and applications, from inception to implementation
  • Identify and address quality issues in production AI systems
  • Help migrate ad-hoc LLM integrations toward more maintainable, testable service boundaries
  • Flag and help address technical debt in AI workflows
  • Implement and contribute to evaluation frameworks for AI systems in production
  • Instrument AI features with meaningful observability: token consumption, latency, retrieval quality, and error rates
  • Help define quality baselines and monitor AI system performance over time
What we offer
What we offer
  • Flexible working arrangements
  • Hybrid work model (3 days in office, 2 WFH)
  • Free lunch on Mondays
  • Weekly Thursday social event
  • Employee Share Option Plans (ESOP)
  • Stocked pantry with snacks
  • Fresh bread
  • Protein bars
  • Popcorn
  • Fresh fruit
  • Fulltime
Read More
Arrow Right

Software Engineer, AI

Meta is seeking a Software Engineer with deep AI specialization to help build an...
Location
Location
Singapore
Salary
Salary:
Not provided
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of software engineering experience, with a focus on building and deploying machine learning or AI systems in production environments
  • Experience designing and implementing end-to-end machine learning pipelines, including data preprocessing, model training, evaluation, and serving at scale
  • Experience with deep learning frameworks such as PyTorch or TensorFlow, and proficiency in Python for AI and data engineering workflows
  • Experience applying experimentation methodologies — including A/B testing and metric design — to evaluate AI model performance and drive product decisions
  • Experience building maintainable, well-tested codebases for AI systems, including unit testing, integration testing, and monitoring for model quality and reliability
Job Responsibility
Job Responsibility
  • Design and implement scalable AI and machine learning systems, including model training pipelines, inference infrastructure, and feature engineering frameworks, to power Meta's core products
  • Develop and optimize large-scale AI models — including large language models, generative AI systems, and ranking and recommendation models — from prototype through production deployment
  • Leverage AI tools and workflows as a force multiplier to expand technical scope across modeling, data analysis, and operational readiness within a single project lifecycle
  • Establish and maintain robust evaluation frameworks, automated testing, and monitoring pipelines to ensure reliability and quality of AI systems in production
  • Own the technical design of AI components and systems, evaluating architectural trade-offs to meet well-defined product and business requirements
  • Instrument AI systems with telemetry, design experiments to validate model hypotheses, and make data-informed decisions that balance short-term goals with long-term model quality
  • Proactively identify performance bottlenecks in model serving and training infrastructure, using profiling and benchmarking to drive latency and throughput improvements
  • Collaborate with product managers, data scientists, and research scientists to translate AI research advances into production-ready features with measurable user impact
  • Contribute to AI safety, privacy, and integrity practices by incorporating responsible AI principles into system design and partnering with cross-functional teams on safeguards
  • Mentor other engineers on AI engineering best practices, advocate for coding and testing standards, and help drive adoption of AI-augmented development workflows across the team
Read More
Arrow Right

Senior Software Engineer

As a Senior Research Engineer at Microsoft, you will advance Microsoft’s mission...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, Mathematics, Statistics, Physics, or a related field and 4 or more years in applied ML or AI research and product engineering
  • Master’s degree and 3 or more years in applied ML or AI research and product engineering
  • PhD in a relevant field and 2 or more years with generative AI, LLMs, or related ML algorithms
  • Proficiency in Python and at least one deep learning framework such as PyTorch, JAX, or TensorFlow
  • Experience deploying Fine Tuned LLMs or multimodal models in live production environments
  • Experience shipping and maintaining production AI systems
  • Ability to meet Microsoft, customer, and government security screening requirements
  • Microsoft Cloud Background Check upon hire or transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Bringing State-of-the-Art Research to Products
  • Design and implement AI systems using foundation models, prompt engineering, retrieval-augmented generation, multi-agent architectures, and classic ML
  • Fine-tune large language models on domain-specific data and evaluate via offline and online methods such as A/B testing, telemetry, and shadow deployments
  • Build and harden prototypes into production-ready services using robust software engineering and MLOps practices
  • Drive original research and thought leadership (whitepapers, internal notes, patents)
  • convert insights into shipped capabilities
  • Research Translation: Continuously review emerging work
  • identify high-potential methods and adapt them to Microsoft problem spaces
  • End-to-End System Development
  • ML Design & Architecture: Own end-to-end pipeline from data prep, training, evaluation, deployment, and feedback loops
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, AI & Automation

Mozilla is seeking a Staff Software Engineer to lead the next evolution of Servi...
Location
Location
United States; Canada
Salary
Salary:
128000.00 - 171000.00 CAD / Year
mozilla.org Logo
Mozilla
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in IT engineering, enterprise applications, or service delivery
  • ITIL 4 Foundations certification (required)
  • Comfortable leading incident, problem, and change management (ITIL), measuring SLAs/SLOs, and using data to drive continuous service improvements
  • Demonstrated success owning platforms end-to-end, including lifecycle management, integrations, and adoption
  • Advanced proficiency in Python scripting — with proven experience developing, optimizing, and maintaining automation frameworks, APIs/webhooks, and integration logic across enterprise tools
  • Strong automation & scripting skills in one or more of: Python
  • with experience building APIs/webhooks, workflow engines, and chatbots
  • Advanced experience in AI & automation platforms (GenAI copilots, RPA, intelligent workflows), including security and governance guardrails for safe adoption
  • A builder’s mindset: bias for automation and documentation, empathy for end users, and a willingness to mentor Service Desk colleagues
  • Strong program/project management skills with proven ability to independently deliver initiatives
Job Responsibility
Job Responsibility
  • Roll out new collaboration and Enterprise GenAI capabilities: run pilots, manage change/comms, create guides, and deliver training that drives adoption and safety
  • Design and ship automations (scripts, workflows, chatbots, AI agents) to streamline provisioning/deprovisioning, entitlements, group lifecycle, and tier-1 deflection
  • Build monitoring and alerting for SaaS health and key workflows
  • lead incident, problem, and change practices to reduce MTTR and prevent recurrences
  • Define SLAs/SLOs, instrument CSAT and operational metrics, and run regular service reviews that turn data into improvements
  • Serve as technical escalation for complex tickets: drive resolution, perform root-cause analysis, and capture knowledge to uplevel the Service Desk
  • Maintain clean, current documentation: runbooks, KB articles, architecture diagrams, and admin playbooks
  • Champion a 'shift-left' model: expand self-service portals, guided flows, and AI virtual agents to improve velocity and user experience
  • Enterprise AI Security & Governance: implement guardrails for AI/automation usage, including access controls, data handling policies, compliance alignment, and ongoing risk monitoring
  • Cross-Functional Partnership: collaborate with People, Security, Finance, Legal, and Procurement to ensure platforms align with organizational goals
What we offer
What we offer
  • Generous performance-based bonus plans to all eligible employees
  • Rich medical, dental, and vision coverage
  • Generous retirement contributions with 100% immediate vesting (regardless of whether you contribute)
  • Quarterly all-company wellness days where everyone takes a pause together
  • Country specific holidays plus a day off for your birthday
  • One-time home office stipend
  • Annual professional development budget
  • Quarterly well-being stipend
  • Considerable paid parental leave
  • Employee referral bonus program
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, AI & Automation

Mozilla is seeking a Staff Software Engineer to lead the next evolution of Servi...
Location
Location
United States
Salary
Salary:
138000.00 - 217000.00 USD / Year
mozilla.org Logo
Mozilla
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years in IT engineering, enterprise applications, or service delivery
  • ITIL 4 Foundations certification (required)
  • Comfortable leading incident, problem, and change management (ITIL), measuring SLAs/SLOs, and using data to drive continuous service improvements
  • Demonstrated success owning platforms end-to-end, including lifecycle management, integrations, and adoption
  • Advanced proficiency in Python scripting — with proven experience developing, optimizing, and maintaining automation frameworks, APIs/webhooks, and integration logic across enterprise tools
  • Strong automation & scripting skills in one or more of: Python
  • with experience building APIs/webhooks, workflow engines, and chatbots
  • Advanced experience in AI & automation platforms (GenAI copilots, RPA, intelligent workflows), including security and governance guardrails for safe adoption
  • A builder’s mindset: bias for automation and documentation, empathy for end users, and a willingness to mentor Service Desk colleagues
  • Strong program/project management skills with proven ability to independently deliver initiatives
Job Responsibility
Job Responsibility
  • Roll out new collaboration and Enterprise GenAI capabilities: run pilots, manage change/comms, create guides, and deliver training that drives adoption and safety
  • Design and ship automations (scripts, workflows, chatbots, AI agents) to streamline provisioning/deprovisioning, entitlements, group lifecycle, and tier-1 deflection
  • Build monitoring and alerting for SaaS health and key workflows
  • lead incident, problem, and change practices to reduce MTTR and prevent recurrences
  • Define SLAs/SLOs, instrument CSAT and operational metrics, and run regular service reviews that turn data into improvements
  • Serve as technical escalation for complex tickets: drive resolution, perform root-cause analysis, and capture knowledge to uplevel the Service Desk
  • Maintain clean, current documentation: runbooks, KB articles, architecture diagrams, and admin playbooks
  • Champion a 'shift-left' model: expand self-service portals, guided flows, and AI virtual agents to improve velocity and user experience
  • Enterprise AI Security & Governance: implement guardrails for AI/automation usage, including access controls, data handling policies, compliance alignment, and ongoing risk monitoring
  • Cross-Functional Partnership: collaborate with People, Security, Finance, Legal, and Procurement to ensure platforms align with organizational goals
What we offer
What we offer
  • Generous performance-based bonus plans
  • Rich medical, dental, and vision coverage
  • Generous retirement contributions with 100% immediate vesting
  • Quarterly all-company wellness days
  • Country specific holidays plus a day off for your birthday
  • One-time home office stipend
  • Annual professional development budget
  • Quarterly well-being stipend
  • Considerable paid parental leave
  • Employee referral bonus program
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, AI Developer Tools

At Docker, we make app development easier so developers can focus on what matter...
Location
Location
United States , Seattle
Salary
Salary:
184600.00 - 260700.00 USD / Year
docker.com Logo
Docker
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years building production-grade backend systems or developer-facing tools
  • Hands-on experience with AI/ML technologies such as practical production experience with LLM APIs (OpenAI, Anthropic, etc.), prompt engineering, or AI agent development
  • Proficiency in Go (preferred), Rust, Java, or Python with strong software engineering fundamentals
  • Experience designing and building distributed systems, microservices, or platform infrastructure
  • Strong understanding of cloud-native systems (AWS, GCP, or Azure), APIs, and data stores
  • Solid grasp of CI/CD, automated testing, code review practices, and modern development workflows
  • Product-minded approach to building developer tools with focus on user experience and measurable outcomes
  • Excellent communication skills in remote, asynchronous environments with ability to document technical decisions clearly
  • Ownership mentality with bias for action and iterative delivery
  • Comfortable working autonomously across distributed teams and navigating ambiguity
Job Responsibility
Job Responsibility
  • Build AI-Powered Developer Tools: Design, implement, and ship production-ready AI agents and tools that accelerate developer productivity such as code review and refactoring assistants, automated test generators, local environment setup tools, deployment pipeline diagnostic agents, and on-call assistance tools
  • Implement LLM Integrations: Build robust, production-grade integrations with LLM APIs (OpenAI, Anthropic, etc.) such as prompt engineering, response parsing, error handling, rate limiting, cost management, and performance optimization
  • Develop Agent Orchestration Systems: Create agent frameworks and orchestration systems that enable complex multi-step workflows, tool calling, context management, and agent-to-agent communication
  • Contribute to Platform Infrastructure: Build self-service platform capabilities that enable teams across Docker to rapidly deploy and operate their own AI developer tools such as deployment pipelines, observability integration, security controls, and operational tooling
  • Drive Adoption of AI-Native Development: Build tools and programs that accelerate adoption of AI developer tools such as Claude Code, Cursor, and Warp across Docker's engineering organization
  • Ensure Production Quality: Write well-tested code with strong test coverage (unit, integration, end-to-end)
  • establish monitoring, alerting, and operational excellence for AI systems
  • Collaborate Cross-Functionally: Partner with Principal Engineer on architecture, work with product and design teams on features and UX, and collaborate with platform teams (Infrastructure, Security, Data) on integrations
  • Participate in Operations: Take part in on-call rotation for AI developer tools
  • respond to incidents, debug production issues, and drive continuous improvement of system reliability
What we offer
What we offer
  • Freedom & flexibility
  • fit your work around your life
  • Designated quarterly Whaleness Days plus end of year Whaleness break
  • Home office setup
  • we want you comfortable while you work
  • 16 weeks of paid Parental leave
  • Technology stipend equivalent to $100 net/month
  • PTO plan that encourages you to take time to do the things you enjoy
  • Training stipend for conferences, courses and classes
  • Equity
  • Fulltime
Read More
Arrow Right

Senior Software Engineer

The Fabric Data Engineering Experience & Infrastructure team is hiring a Full St...
Location
Location
Canada , Vancouver
Salary
Salary:
114400.00 - 203900.00 CAD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science, or related technical discipline AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 4+ years experience in frontend + UX engineering skills: React + TypeScript, accessibility, performance, and building user-centered flows
  • 4+ years experience Backend / full-stack fundamentals: service/API design, debugging distributed systems, reliability/operability, and production ownership
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Job Responsibility
Job Responsibility
  • Own end-to-end delivery of one or more critical scenarios across Fabric Data Engineering experiences (e.g., Lakehouse, Notebooks, Spark job experiences, pro‑dev tooling), from requirements → architecture → implementation → rollout → live-site operations
  • Build and ship polished, accessible, and performant frontend UX in React/TypeScript, partnering with Design/PM to translate scenarios into clear flows and incremental deliverables (including beyond-chat, structured UI where appropriate)
  • Build and evolve full-stack capabilities that power the UX: service endpoints, orchestration, and integrations that connect the UI to Fabric items/artifacts and execution systems (Spark / notebooks), with strong attention to reliability, latency, and cost
  • Implement AI-assisted experiences that help data engineers “author outcomes,” including workflows that gather context, propose plans, execute steps, and surface progress/results in a way that builds user trust (clarity, reviewability, reversibility)
  • Contribute to AI-powered developer productivity inside notebooks and data engineering experiences (e.g., copilots, quick actions, AI enrichments), including instrumentation for quality, usage, and performance
  • Drive engineering excellence: write maintainable code, build automated tests (unit + E2E), participate in code/design reviews, and mentor other engineers through best practices
  • Operate what you build: contribute to on-call, incident response, telemetry/monitoring, and post-incident improvements
  • continuously harden system behavior in production
  • Fulltime
Read More
Arrow Right