Senior DevOps Engineer (AI & Cloud Infrastructure) Job at Inflection AI (Palo Alto)

Senior DevOps AI Engineer

We are seeking a highly experienced and technically proficient Senior DevOps Eng...

Location

United States , Columbia

Salary:

150000.00 - 250000.00 USD / Year

Synergy ECP

Expiration Date

Until further notice

Requirements

B.S. in a relevant technical field with 12 years of experience, or M.S. in a relevant technical field with 10 years of experience
Advanced proficiency in DevOps principles and practices
Demonstrated expertise in containerization using Docker and Kubernetes
Proven experience in architecting and managing CI/CD pipelines
Extensive experience with AI model lifecycle management and maintenance
Familiarity with cloud platforms (AWS, Microsoft Azure) for infrastructure deployment and management
Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
Excellent communication and interpersonal skills, with the ability to effectively collaborate with cross-functional teams
Ability to translate complex technical concepts into actionable engineering solutions
TS/SCI with CI Poly

Job Responsibility

Design, implement, and maintain robust infrastructure for enterprise AI applications in cloud environments (AWS, Microsoft Azure)
Develop and optimize engineering workflows and processes to support AI model development, deployment, and maintenance
Architect and manage CI/CD pipelines for continuous integration and continuous delivery of AI models and applications
Implement and manage containerization solutions using technologies like Docker and Kubernetes
Ensure efficient AI model lifecycle management, including versioning, monitoring, and scaling
Collaborate with AI/ML engineers and data scientists to streamline deployment processes and optimize resource utilization
Oversee system performance, security, and scalability of AI infrastructure
Continuously research and implement new DevOps tools and practices to enhance efficiency

What we offer

Highly competitive compensation
Comprehensive Health Benefits package
401K Retirement plan
People Partners to help navigate personal and professional worlds
Wellness resources
Company-sponsored continuing education program
Generous Paid Time Off
11 paid holidays a year
Flexible work options
Philanthropy program participation

Fulltime

Senior DevOps Engineer, AI

LogicMonitor® is the AI-first hybrid observability platform powering the next ge...

Location

India , Pune

Salary:

Not provided

LogicMonitor

Expiration Date

Until further notice

Requirements

4+ years of experience in DevOps or similar roles
Proven experience with AWS (preferred), and GCP in production environments
Strong expertise in Infrastructure as Code practices
Solid knowledge of Kubernetes (EKS), container orchestration, and cluster security
Hands-on experience with Grafana, Prometheus, and alerting/monitoring systems
Understanding of network connectivity over the private link endpoint, VPC, cross-account vpc connectivity, how to make things accessible internally, externally, etc.
Experience in deploying automated Canary and Integration testing pipelines, CI/CD pipeline etc.
Exposing internal self-hosted services like LangFuse via WebUI for internal users using Traefik or Ingress controller or any other tool
Experience in deployment of LLM related solutions that require MCP, LangFuse, Airflow, GraphDB, VectorDB, Redis etc.
Experience working with developers on on-demand JIT access to Prod clusters to troubleshoot/debug issues with tools like Teleport or some other

Job Responsibility

Multi-Cloud Enablement: Expand and manage application hosting across AWS and Google Cloud, ensuring performance, flexibility, and resilience
Infrastructure as Code (IaC): Develop and maintain Terraform or similar installers for Azure and GCP to fully automate infrastructure deployments
Cost Optimization: Design and implement AWS cost optimization strategies, including reserved instances, right-sizing, and resource efficiency initiatives
Cloud Security: Strengthen infrastructure security with robust access controls, encryption, monitoring, and alerting frameworks
Observability: Build and enhance monitoring platforms with Grafana dashboards and Prometheus alerts for real-time performance insights and proactive issue resolution
Kubernetes Management: Implement Role-Based Access Control (RBAC) and optimize Ingress controllers (Traefik or similar) for enhanced security and delivery resilience
Automation & Scripting: Create Python and Bash scripts to automate repetitive tasks, streamline workflows, and improve operational efficiency

Senior Devops & AI Engineer

This role presents a unique opportunity to contribute to the future of impactful...

Location

India , Hyderabad

Salary:

Not provided

Fission Labs

Expiration Date

Until further notice

Requirements

Bachelor's degree in Computer Science, Engineering, or related field
6+ years of experience in Infrastructure Mgmt. roles, with a focus on cloud platforms (Azure and AWS Preferred)
Hands-on experience with operations (DevSecOps) principles and best practices
Proficiency in scripting languages such as Python, PowerShell, or Bash
Excellent communication and collaboration skills
In-depth knowledge of Linux operating systems, including CentOS, Ubuntu, and Red Hat, with expertise in shell scripting, package management, and system administration
Hands-on experience with a wide range of AWS and Azure services
Develop and maintain Infrastructure as Code (IAC) templates using tools such as Terraform or AWS CloudFormation
Experience setting up cloud infrastructure stack, databases, service endpoints, GPU as well as CPU resource scaling, optimization etc.
Should have worked AIOps/MLOP

Job Responsibility

Configure and optimize Linux-based servers for performance, security, and resource utilization, including kernel tuning, file system management, and network configuration
Architect cloud solutions leveraging best practices and services offered by AWS and Azure, optimizing for scalability, reliability, and cost-effectiveness
Implement and manage hybrid cloud environments, facilitating seamless integration and interoperability between AWS and Azure services
Establish version control practices for IAC templates, ensuring traceability, auditability, and reproducibility of infrastructure changes

What we offer

Opportunity to work on impactful technical challenges with global reach
Vast opportunities for self-development, including online university access and knowledge sharing opportunities
Sponsored Tech Talks & Hackathons to foster innovation and learning
Generous benefits packages including health insurance, retirement benefits, flexible work hours, and more
Supportive work environment with forums to explore passions beyond work

Fulltime

New

Senior DevOps and Cloud Engineer

We are seeking a highly motivated and self-sufficient Senior DevOps / Cloud Engi...

Location

United States , Scottsdale

Salary:

Not provided

gate6

Expiration Date

Until further notice

Requirements

5+ years of hands-on experience with AWS
experience with GCP and/or Azure is advantageous
Proven ability to work as a self-driven individual contributor with minimal dependency on others
Proficiency with cloud infrastructure setup, CI/CD processes, and DevOps toolchains
Strong scripting skills (Shell, YAML, Python, etc.)
Solid understanding of both Linux and Windows system environments
Experience with containerization, orchestration, and cloud automation tools
AWS certification (e.g., Solutions Architect, DevOps Engineer) is strongly preferred
Familiarity with AI/GenAI technologies, MLOps concepts, or AI-powered cloud solutions is highly desirable
Excellent problem-solving, analytical, and communication skills

Job Responsibility

Independently design, implement, and manage cloud infrastructure across AWS and GCP (Azure knowledge is a plus)
Build, configure, and maintain secure, scalable cloud resources with minimal external support
Set up and manage VPCs, IAM, security groups, and access control
Lead and execute migration of applications and workloads to the cloud
Establish disaster recovery processes, automate cloud management tasks, and maintain best practices
Monitor usage and apply tagging strategies for cost control and visibility
Use AWS tools (CloudFormation, CloudWatch, Migration Hub, DMS, AWS Transfer for SFTP) for deployments and monitoring
Design and manage CI/CD pipelines with Jenkins, GitLab CI, or AWS DevOps
Ensure compliance with security policies and industry standards
Implement log aggregation and performance monitoring for cloud environments

Fulltime

Senior DevOps / Voice Infrastructure Engineer

As we grow and take on exciting new challenges, we’re on the lookout for excepti...

Location

Salary:

Not provided

Mad Devs

Expiration Date

Until further notice

Requirements

3+ years of hands-on experience with Asterisk or FreeSWITCH
Deep knowledge of SIP, RTP, SRTP protocols
Experience with SIP proxies — Kamailio or OpenSIPS
WebRTC integrations
Trunk configuration, dialplan design, codec negotiation
GCP and/or AWS hands-on experience (2+ years)
Kubernetes (GKE or EKS) in production environments
Terraform — custom modules, multi-environment setups
Docker, Docker Compose
CI/CD: GitHub Actions, ArgoCD / Flux

Job Responsibility

Design, deploy, and maintain SIP/VoIP infrastructure (Asterisk, FreeSWITCH, Kamailio) for AI Agents
Integrate voice platforms with cloud services (GCP, AWS) and internal AI pipelines
Ensure high availability and low latency of voice services (HA, load balancing, failover)
Manage cloud infrastructure via IaC (Terraform) and container orchestration in Kubernetes
Set up call quality monitoring (MOS, jitter, packet loss) and alerting with Grafana / Victoria Metrics
Build and optimize CI/CD pipelines (GitHub Actions, ArgoCD) for voice services
Harden voice infrastructure security: encryption (SRTP, TLS), toll fraud prevention, DoS protection
Integrate with PSTN/SIP trunk providers, manage DID numbers and call routing

What we offer

Flexible working hours
Remote-first culture
Long-term projects
Salary in dollars
Professional communities
Onsite business trips
Training budget
Paid conferences

Fulltime

Senior Cloud Platform Engineer with AI Enablement

We are looking for a Senior Cloud Platform Engineer with AI Enablement experienc...

Location

Poland , Warszawa

Salary:

Not provided

Algoteque

Expiration Date

Until further notice

Requirements

Solid experience as a Cloud, DevOps, Platform, SRE or Infrastructure Engineer
Hands-on knowledge of at least one major cloud platform: AWS, Azure or GCP
Experience with Kubernetes in production or near-production environments
Experience with Infrastructure as Code, especially Terraform
Familiarity with CI/CD tools (GitHub Actions, GitLab CI/CD, Jenkins, Azure DevOps or Tekton)
Knowledge of observability and monitoring tools (Prometheus, Grafana, ELK, Loki, Datadog, New Relic, CloudWatch or OpenSearch)
Experience with production support, incident management, RCA and deployment stability
Good scripting or programming skills (Python, Bash, PowerShell or Go)
Understanding of security basics, IAM, secrets management and secure cloud delivery
Experience working with development teams and improving Developer Experience

Job Responsibility

Design and develop cloud-native platforms and internal developer platforms (IDP)
Deliver scalable, reliable and secure platform solutions for engineering teams
Work with Kubernetes and cloud services on AWS, Azure or GCP
Build and maintain Infrastructure as Code (Terraform, Pulumi, CloudFormation)
Develop CI/CD pipelines and automate deployments
Introduce platform standards, reusable templates, golden paths and best practices
Improve observability, monitoring, alerting and incident response
Support reliability, high availability, disaster recovery and operational stability
Use AI tools and AI-assisted workflows to boost engineering productivity and platform operations
Help define safe, controlled usage of AI tools, coding agents and LLM-based workflows

What we offer

B2B contract
100% remote work
A unique and engaging project in the EdTech space

Fulltime

Senior ML Infrastructure / ML DevOps Engineer

We are looking for a Senior ML Infrastructure / DevOps Engineer who loves Linux,...

Location

Salary:

Not provided

Pathway

Expiration Date

Until further notice

Requirements

Former or current Linux / systems / network administrator comfortable living in the shell and debugging at OS and network layers (systemd, filesystems, iptables/security groups, DNS, TLS, routing)
5+ years of experience in DevOps/SRE/Platform/Infrastructure roles running production systems, ideally with high‑performance or ML workloads
Deep familiarity with Linux as a daily driver, including shell scripting and configuration of clusters and services
Strong experience with workload management, containerization, and orchestration (Slurm, Docker, Kubernetes) in production environments
Solid understanding of CI/CD tools and workflows (GitHub Actions, GitLab CI, Jenkins, etc.), including building pipelines from scratch
Hands-on cloud infrastructure experience (AWS, GCP, Azure), especially around GPU instances, VPC/networking, storage, and managed ML services (e.g., SageMaker HyperPod, Vertex AI)
Proficiency with infrastructure as code (Terraform, CloudFormation, or similar) and a bias toward automation over manual operations
Experience with monitoring and logging stacks (Grafana, Prometheus, Loki, CloudWatch, or equivalents)
Familiarity with ML pipeline and experiment orchestration tools (MLflow, Kubeflow, Airflow, Metaflow, etc.) and with model/version management
Solid programming skills in Python, plus the ability to read and debug code that uses common ML libraries (PyTorch, TensorFlow) even if you are not a full‑time model developer

Job Responsibility

Design, operate, and scale GPU and CPU clusters for ML training and inference (Slurm, Kubernetes, autoscaling, queueing, quota management)
Automate infrastructure provisioning and configuration using infrastructure‑as‑code (Terraform, CloudFormation, cluster‑tooling) and configuration management
Build and maintain robust ML pipelines (data ingestion, training, evaluation, deployment) with strong guarantees around reproducibility, traceability, and rollback
Implement and evolve ML‑centric CI/CD: testing, packaging, deployment of models and services
Own monitoring, logging, and alerting across training and serving: GPU/CPU utilization, latency, throughput, failures, and data/model drift (Grafana, Prometheus, Loki, CloudWatch)
Work with terabyte‑scale datasets and the associated storage, networking, and performance challenges
Partner closely with ML engineers and researchers to productionize their work, translating experimental setups into robust, scalable systems
Participate in on‑call rotation for critical ML infrastructure and lead incident response and post‑mortems when things break

What we offer

Intellectually stimulating work environment
Be a pioneer: you get to work with realtime data processing & AI
Work in one of the hottest AI startups, with exciting career prospects
Team members are distributed across the world
Responsibilities and ability to make significant contribution to the company’s success
Inclusive workplace culture

Fulltime

Senior AI Engineer – Microsoft Fabric & Azure AI Foundry

We are looking for an experienced AI Engineer to lead the implementation of Azur...

Location

United States , New York City

Salary:

160000.00 - 220000.00 USD / Year

Valtech

Expiration Date

Until further notice

Requirements

5+ years of experience in cloud engineering, AI engineering, or data platform architecture
Strong hands-on experience with: Microsoft Fabric, Azure AI Foundry, Azure OpenAI, Azure Machine Learning, Azure Data Services
Experience integrating AI workloads into enterprise analytics platforms
Proficiency in Python and/or C#
Experience with REST APIs, SDKs, and AI orchestration frameworks
Knowledge of: Vector databases, Retrieval-Augmented Generation (RAG), Prompt engineering, Model evaluation and monitoring
Familiarity with DevOps practices including GitHub Actions or Azure DevOps
Strong understanding of enterprise security and governance

Job Responsibility

Design and implement AI solutions using Microsoft Azure AI Foundry within an existing Microsoft Fabric architecture
Integrate AI services with Fabric components including: Data Factory, OneLake, Power BI, Lakehouse and Warehouse environments, Real-Time Analytics
Build and operationalize generative AI and machine learning workflows
Configure and manage: Azure AI Services, Azure OpenAI, Model deployment pipelines, Prompt orchestration and evaluation
Establish secure connectivity between Azure AI Foundry and enterprise data sources
Implement governance, RBAC, security, compliance, and cost management controls
Develop reusable AI pipelines, APIs, and automation frameworks
Collaborate with platform teams to ensure scalability, observability, and production readiness
Support CI/CD and Infrastructure-as-Code deployment patterns
Provide technical leadership and documentation for AI platform adoption

What we offer

Flexibility, with remote and hybrid work options (country-dependent)
Career advancement, with international mobility and professional development programs
Learning and development, with access to cutting-edge tools, training and industry experts
Medical, dental, and vision insurance for you and your family, plus employer contributions to Health Savings Accounts

Fulltime

Select Country

Senior DevOps Engineer (AI & Cloud Infrastructure)

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?

Senior DevOps Engineer (AI & Cloud Infrastructure)

Senior DevOps AI Engineer

Senior DevOps Engineer, AI

Senior Devops & AI Engineer

Senior DevOps and Cloud Engineer

Senior DevOps / Voice Infrastructure Engineer

Senior Cloud Platform Engineer with AI Enablement

Senior ML Infrastructure / ML DevOps Engineer

Senior AI Engineer – Microsoft Fabric & Azure AI Foundry

Our AI answers in your language