Platform Engineer – AIOps & Infrastructure Job at Solvedex

New

Platform Engineering Manager

As Platform Engineering Manager at Power Design, you'll lead the buildout of our...

Location

United States , St Petersburg

Salary:

Not provided

Power Design

Expiration Date

Until further notice

Requirements

Education: Bachelor's degree in Computer Science, Computer Engineering, Information Systems, or a related field
equivalent professional experience considered
Experience: 7–10 years of progressive experience in infrastructure engineering, platform engineering, DevOps, or SRE — with meaningful time in both hands-on implementation and technical leadership
Preferred certifications: HashiCorp Terraform Associate, AWS/Azure Solutions Architect, CKA/CKAD, or equivalent cloud or platform engineering certifications
Hands-on production experience with at least one major cloud platform (Azure, AWS, GCP, or OCI)
breadth across multiple platforms strongly preferred
Demonstrated history of evaluating infrastructure decisions through a cloud-first lens, identifying when to leverage cloud services rather than defaulting to on-premises solutions
Hands-on expertise with Terraform or a comparable IaC framework
GitOps pipeline experience (GitHub Actions, Azure DevOps, GitLab CI, or similar)
Production experience implementing enterprise observability and AIOps tooling (Datadog, Dynatrace, New Relic, Prometheus/Grafana, or equivalent), including anomaly detection, event correlation, and automated remediation workflows

Job Responsibility

Design, build, and maintain automation for infrastructure provisioning, configuration, and lifecycle management — with security controls built in from the start
Lead the evaluation, selection, and implementation of Power Design's first enterprise observability and AIOps platform, owning the decision end-to-end from vendor assessment through production rollout
Develop and maintain observability tooling, dashboards, and automated remediation workflows covering metrics, logging, tracing, and alerting across cloud and on-premises environments
Build and enforce CI/CD pipelines for infrastructure and platform services using GitOps best practices
Continuously evaluate the infrastructure footprint and identify workloads where cloud migration would improve resilience, reduce complexity, or lower cost — and build the business case to act on it
Apply a security-first lens to every platform decision, including IAM/RBAC design, secrets management, Zero Trust implementation (Zscaler), and policy-as-code
Create self-service infrastructure workflows — provisioning automation, access workflows, and internal developer tooling — to reduce ticket volume and enable engineering teams to move faster
Leverage AI-assisted tooling for anomaly detection, event correlation, and operational insights to drive a proactive operations model
Establish and own design standards, architectural consistency, and IaC strategy across the Platform Engineering function
Provide technical leadership and mentorship to platform engineers

Fulltime

Senior AIOps Engineer (Platform & Infrastructure)

Groupon is moving beyond "experimenting" with AI to running it at massive scale....

Location

Prague; Warsaw; Valencia; Madrid

Salary:

Not provided

Groupon

Expiration Date

Until further notice

Requirements

5+ years in Platform Engineering, SRE, or DevOps within a cloud-native environment
Deep experience managing stateful and stateless workloads (Helm, Istio, Docker)
Hands-on experience deploying and operating AI/ML tools or data-intensive systems in production
Strong skills in Python or Go to build custom API wrappers and automate operational tasks
Expertise in Prometheus, Grafana, and ELK stack to ensure end-to-end observability of complex AI requests

Job Responsibility

Architect the AI Stack: Design and operate core infrastructure on Kubernetes, including Vector Databases, LLM Gateways (LiteLLM), and workflow automation tools (n8n)
Enable at Scale: Drive AI adoption by creating self-service "Golden Paths" using Terraform and Helm, allowing engineering teams to deploy RAG pipelines with one click
Operational Excellence: Implement centralized observability, tracing (Langfuse), and governance to ensure our AI systems are reliable, auditable, and secure
Fiscal Discipline: Own the "AI Bill"—monitoring token usage and latency to optimize spend while maintaining high performance

What we offer

End-to-end Ownership: Real authority to standardize how a global company builds with AI
Career Growth: This is a high-visibility role within a new, strategic team with potential for leadership progression

Lead Engineer – Platform Engineering

We are looking for a Lead DevOps Engineer to join the Platform Engineering team ...

Location

United States , St Petersburg, Florida

Salary:

Not provided

Raymond James

Expiration Date

Until further notice

Requirements

Deep experience with virtualization platforms (e.g., VMware vSphere/ESXi, Hyper‑V, KVM/Nutanix)
Hands‑on experience with configuration management tools such as Ansible
Implement and support enterprise load balancer solutions (e.g., F5 BIG-IP, NGINX, Azure/AWS load balancers), including configuration, automation, and traffic‑routing policies
Familiarity with AI‑assisted operations tools (AIOps), or how they can fit into the workflow
Solid understanding of CI/CD systems (GitHub Actions, Azure DevOps, Jenkins, GitLab CI)
Advanced scripting skills in Python, PowerShell, and/or Bash
Experience with provisioned workflow development in Service Now
Strong knowledge of monitoring and logging platforms (Prometheus/Grafana, Splunk, Elastic, Datadog, etc.)
Understanding of security best practices, IAM/RBAC, secrets management, and compliance frameworks
Strong networking and systems fundamentals (TCP/IP, DNS, load balancing, storage)

Job Responsibility

Design, build, and maintain automation for VM provisioning, configuration, and lifecycle management
Enhance and support CI/CD pipelines for infrastructure and platform services
Provide technical leadership and mentorship to engineers across the platform engineering team
Use AI‑assisted tooling when beneficial for anomaly detection, event correlation, and operational insights
Work on standardized VM images, templates, and OS baselines to ensure consistency and security
Improve platform reliability through monitoring, alerting, and SRE‑aligned practices
Develop and maintain observability tooling, dashboards, and automated remediation workflows
Ensure security best practices across VM platforms, including RBAC, secrets management, and patching
Optimize VM capacity, performance, and resource utilization across environments
Collaborate with development, cloud, and security teams to deliver stable, self‑service platform capabilities

Fulltime

Account Manager, Global System Integrator

Account Manager for OpsRamp business focusing on Global System Integrators. This...

Location

United States , New Jersey or Texas

Salary:

194500.00 - 456500.00 USD / Year

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

High passion for learning new technologies in the market
Excellent communication skills (oral, written, and presentation) with the ability to articulate and sell on value propositions to GSI's
Deep understanding of GSI's Business Ecosystems and knack to position product / platform as integrated solutions with partners
A track record for being detail-oriented with a demonstrated ability to self-motivate and follow-through on projects
Strong problem-solving skills with an ability to analyze problems and develop actionable and appropriate tactical plans quickly
Strong sales acumen with an understanding across IT Operations Management and AI Ops Platform solutions (ITOM & AIOps)
Strong understanding of Strategic Consulting, Systems Integration, Global Delivery Models, Managed / IT Outsourcing Services, Infrastructure Management Services, IT / Data Center Transformation, ITOM & AIOps, Cloud Computing, Platform Service, etc.
Exceptional interpersonal and relationship management skills
Proven ability to build and maintain executive-level relationships
Bachelor of Engineering, MBA (Preferred)

Job Responsibility

Develop and maintain executive relations within Global System Integrators (GSI's) to broaden awareness and acceptance of OpsRamp AIOps Solutions to power their Managed Services Platforms
Recruit new GSI's in line with the company's direction to drive growth for the OpsRamp business
Develop and execute a strategic business plan that meets and exceeds revenue targets
Align with cross-functional stakeholders including Product Management, Engineering, Marketing, Sales, and Operations
Create incremental revenue opportunities with GSI's via new joint solution offerings, new markets, and joint customer pursuits
Develop and maintain a robust deal pipeline with targeted solutions to continuously grow the business and generate incremental revenue
Provide timely, concise, accurate information of account and opportunity status, plans, and events
Manage and report business through accurate forecasting, stakeholder updates, and quarterly business reviews
Exceed revenue growth expectations
Achieve quarterly and annual bookings targets by growing joint partner business across the globe

What we offer

Health & Wellbeing benefits
Personal & Professional Development programs
Unconditional Inclusion environment
Comprehensive benefits suite supporting physical, financial and emotional wellbeing
Career development programs

Fulltime

GSI Sales

As part of the OpsRamp presales team, revolutionize cloud computing by deliverin...

Location

India , Bangalore

Salary:

Not provided

Hewlett Packard Enterprise

Expiration Date

Until further notice

Requirements

High passion for learning new technologies in the market
Excellent communication skills (oral, written, and presentation) with the ability to articulate and sell on value propositions to GSIs
Deep understanding of GSIs Business Ecosystems and knack to position product/platform as integrated solutions with partners
A track record for being detail-oriented with a demonstrated ability to self-motivate and follow-through on projects
Strong problem-solving skills with an ability to analyze problems and develop actionable and appropriate tactical plans quickly
Adaptability and flexibility to work in a startup environment
Strong sales acumen with an understanding across ITOM & AIOps solutions
Strong understanding of Strategic Consulting, Systems Integration, Global Delivery Models, Managed/IT Outsourcing Services, Infrastructure Management Services, IT/Data Center Transformation, ITOM & AIOps, Cloud Computing, Platform Service, etc.
Exceptional interpersonal and relationship management skills
Proven ability to build and maintain executive-level relationships

Job Responsibility

Develop and maintain Exec relations within Global System Integrators (GSIs) to broaden awareness and acceptance of OpsRamp AIOps Solutions to power their Managed Services Platforms
Recruit new GSIs in line with the company’s direction to drive growth for the OpsRamp business
Develop and execute a strategic business plan that meets and exceeds revenue targets
Align with cross-functional stakeholders including Product Management, Engineering, Marketing, Sales, and Operations
Create incremental revenue opportunities with GSIs via new joint solution offerings, new markets, and joint customer pursuits
Develop and maintain a robust deal pipeline with targeted solutions to continuously grow the business and generate incremental revenue
Provide timely, concise, accurate information of account & opportunity status, plans, and events
Manage and report business through accurate forecasting, stakeholder updates, and quarterly business reviews
Exceed revenue growth expectations
Achieve quarterly and annual bookings targets by growing joint partner business across the globe

What we offer

Comprehensive suite of benefits that supports physical, financial and emotional wellbeing
Specific programs catered to helping reach career goals
Inclusive environment celebrating individual uniqueness

Fulltime

Senior Sre – Data & Middleware Observability & Incident Reduction Vice President

The Senior Incident Operations & Optimization Specialist for Data & Middleware i...

Location

United States , Irving

Salary:

125760.00 - 188640.00 USD / Year

Citi

Expiration Date

Until further notice

Requirements

A minimum of 8+ years of hands-on experience in database administration, middleware engineering, or enterprise data platform operations
Proven experience in event management, alert tuning, and incident reduction for data and middleware services, with measurable results
Direct, hands-on experience with modern AIOps and event management platforms
Deep knowledge of both relational (e.g., Oracle, SQL Server) and NoSQL (e.g., MongoDB) database technologies, including clustering, replication, and performance tuning
Expertise in middleware platforms, including messaging technologies (e.g., MQ, Kafka) and application servers (e.g., WebSphere, Tomcat)
Hands-on experience developing robust automation solutions using relevant scripting languages (e.g., Python, Shell) and modern automation frameworks
Proficiency in log analysis, pattern recognition, and using query languages for data analysis on log aggregation platforms
Excellent analytical abilities with a systematic approach to troubleshooting complex data platform architectures and correlating infrastructure issues with application impact
Exceptional communication skills with the ability to collaborate effectively with DBAs, middleware engineers, and application teams, and to present technical concepts to diverse audiences
Bachelor's degree in Computer Science, Information Technology, Computer Engineering, or a related technical field

Job Responsibility

Analyze and optimize monitoring across all database and middleware platforms to address high-volume, low-value alerts, identify patterns in incident generation, and determine root causes
Develop and implement domain-specific correlation, de-duplication, and suppression rules on AIOps and event management platforms
Create logic that understands database cluster relationships, messaging dependencies, and application-to-database connections
Architect and develop automation playbooks for incident data enrichment and automated remediation of common database and middleware issues, such as connection pool resets or service restarts
Identify monitoring gaps across the data and middleware landscape, proposing enhancements to ensure comprehensive health monitoring and address blind spots in transactional flows
Partner closely with Database Administration (DBA), middleware engineering, and application teams to validate correlation logic, build consensus on threshold changes, and provide expert guidance on event management best practices
Continuously validate the effectiveness of implemented rules and automation, ensuring critical health indicators remain highly visible
Lead post-implementation reviews and drive iterative improvements

What we offer

medical, dental & vision coverage
401(k)
life, accident, and disability insurance
wellness programs
paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays

Fulltime

Senior Product Marketing Manager

Are you passionate about cloud computing and the future of intelligent cloud ope...

Location

United States , Redmond

Salary:

106400.00 - 203600.00 USD / Year

Microsoft Corporation

Expiration Date

Until further notice

Requirements

Master's Degree in Marketing, Computer Science, Business or related field AND 3+ years experience in business OR Bachelor's Degree in Marketing, Computer Science, Business or related field AND 5+ years experience in business OR equivalent experience
Strong background in B2B audience marketing, cloud infrastructure, AIOps platform, or adjacent technical domains
Proven experience launching complex, technical products and shaping new or emerging categories
Deep comfort with technical concepts (cloud architecture, AI systems, automation, APIs)
Exceptional positioning, messaging, and storytelling skills
Strategic thinker who can also execute with speed and precision
Customer-obsessed and insight-driven

Job Responsibility

Develop and lead the outbound marketing strategy for agentic cloud operations, from early-category definition to scale
Develop differentiated positioning, messaging frameworks, and value propositions for technical and business audiences
Define customer personas, and use cases across platform, infrastructure, and AI-driven operations teams
Partner closely with Integrated Marketing and Audience Marketing to execute outbound marketing campaigns, track results and optimize campaigns or programs
Lead go-to-market planning and execution for major product launches and feature releases for the agentic cloud ops portfolio
Craft the core narrative around agentic systems across key cloud operations domains and lifecycle i.e. deployment/configuration, observability, resiliency, optimization, and security
Translate complex technical concepts into clear, compelling stories without oversimplifying
Partner with Go-To-Market managers to build enablement assets (pitch decks, demos, battlecards, case studies)
Equip Go-To-Market and field teams to sell a new category with confidence and consistency
Support enterprise, mid-market, and developer-led motions as needed

Fulltime

Principal Site Reliability Engineer

Groupon is modernizing its global platform — and reliability is at the center of...

Location

Colombia

Salary:

Not provided

Groupon

Expiration Date

Until further notice

Requirements

10+ years in software/systems engineering
5+ years in SRE or platform reliability
Strong experience with GCP (preferred) or AWS, Kubernetes, and Terraform
Proficiency in Python or Go for automation and tooling
Deep understanding of observability stacks (Prometheus, Grafana, OpenTelemetry) and service meshes (Istio, Envoy)
Hands-on AIOps experience: anomaly detection, predictive analytics, ML-assisted operations
Strong communication and influencing skills — data over hierarchy

Job Responsibility

Architect and maintain self-healing systems with 99.9%+ availability targets
Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns
Implement adaptive SLIs/SLOs that evolve automatically from real-time data
Build AIOps-based observability and auto-remediation pipelines
Apply predictive modeling to forecast failures before they impact users
Lead chaos, performance, and resilience testing programs
Map platform and service behavior to revenue impact and drive improved revenue resilience through better infrastructure performance
Mentor engineers and drive reliability standards across teams
Partner with platform, data, and product teams to ensure stability aligns with business goals
Support major incident response, incident review, and participate in on-call rotations

What we offer

The opportunity to work with cutting-edge technologies in a transformative environment
Professional growth and leadership development pathways tailored to your aspirations
A chance to leave a lasting impact by shaping the future of reliable and scalable systems

Select Country

Platform Engineer – AIOps & Infrastructure

Job Description

Job Responsibility

Requirements

Nice to have

Looking for more opportunities?