CrawlJobs Logo

AI Platform Site Reliability Engineering Specialist

nttdata.com Logo

NTT DATA

Location Icon

Location:
India , Bengaluru

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

The AI Platform Site Reliability Engineering Specialist will operate and maintain the infrastructure for GenAI applications, focusing on automation and reliability.

Job Responsibility:

  • Operate, monitor, and maintain the infrastructure supporting GenAI applications ( training, inference, feature store, data ingestion, model serving)
  • Design and build automation for core platform capabilities, reducing manual toil
  • Develop and maintain infrastructure-as-code (IaC) for provisioning and managing compute, storage, network, GPU clusters, Kubernetes / container orchestration, etc.
  • Establish, monitor and enforce SLOs/SLIs/LSAs, error budgets, alerting, and dashboards
  • Lead incident response, root cause analysis (RCA), postmortems, and systemic remediation
  • Perform capacity planning, scaling strategies, workload scheduling and resource forecasting
  • Optimize cost vs. performance trade-offs in large-scale compute environments
  • Harden systems for security, compliance, auditability, and data governance
  • Collaborate across teams (cloud engineers, data engineers, infrastructure, security) to ensure safe deployment, rollout, rollback, and integration of new systems
  • Define disaster recover (DR) strategies, back/restore practices, fault tolerance mechanisms
  • Maintain runbooks, operation playbooks, documentation, and training materials
  • Participate in on-call rotations and respond to production incidents 24/7 as needed
  • Continuously evaluate and integrate new tools, frameworks, or technologies to enhance platform reliability

Requirements:

  • Bachelor's or Master's degree in Computer Science or related field, or equivalent job experience
  • 5 years of production experience in SRE / Infrastructure / ops for large-scale systems
  • Strong programming/scripting skills (Python, Go, Java, or equivalent)
  • Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
  • Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
  • Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures
  • Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
  • Networking and systems engineering knowledge (TCP/IP, DNS, routing, load balancing, distributed storage)
  • Solid experience in capacity planning, performance tuning, scaling, and incident response
  • Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improvements
  • Experience in regulated environments (financial services, compliance, audit, security) is a strong plus
  • Excellent communication, documentation, and cross-team collaboration skills
  • Proven track record of reducing operational toil via automation

Nice to have:

  • Understanding of SRE techniques
  • Proficiency with Open Telemetry tools including Grafana, Loki, Prometheus, and Cortex
  • Good knowledge of Microservice based architecture, industry standards, for both public and private cloud
  • Knowledge of data pipeline technologies (Kafka, Spark, Flink, etc.)
  • Good knowledge of various DB engines (SQL, Redis, Kafka, Snowflake, etc.) for cloud app storage
  • Experience working with Generative AI development, embeddings, fine tuning of Generative AI models
  • Experience in high-performance computing (HPC), distributed GPU cluster scheduling (e.g. Slurm, Kubernetes GPU scheduling)
  • Understanding of ModelOps / ML Ops / LLM Op
  • Experience with chaos engineering, canary deployments, blue/green rollouts

Additional Information:

Job Posted:
February 14, 2026

Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AI Platform Site Reliability Engineering Specialist

Site Reliability Engineering Specialist

The Site Reliability Engineering Specialist independently executes activities th...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A degree in IT, Maths or Science
  • A deep understanding of full stack monitoring solutions such as Dynatrace
  • Strong proficiency in one or more programming languages (e.g. Java, Python)
  • Experience with cloud platforms (AWS, Azure, or GCP)
  • Solid understanding of software architecture, design patterns, and microservices
  • Familiarity with CI/CD tools and DevOps practices
  • High levels of quality presentation and reporting capabilities
  • Resilience to ensure support teams are engaged 24x7x365
  • Ability to adapt to latest industry trends
  • CI/CD/CT Pipeline management
Job Responsibility
Job Responsibility
  • Executes the implementation of new software development life cycle automation tools, frameworks, and code pipelines
  • Coordinates a diverse team and creates the initial test schedule
  • Executes the implementation of automation technologies
  • Proactively identifies and manages risk
  • Leads scale testing to measure, tune and optimise system performance
  • Executes metric/monitoring analysis
  • Designs, analyses, develops and troubleshoots highly distributed large-scale production systems
  • Executes approaches that scale systems sustainably
  • Writes and delivers infrastructure as code software
  • Implements robust monitoring and alerting systems and performs root cause analysis
  • Fulltime
Read More
Arrow Right

Solutions Architect

As a Solution Architect in the Field Engineering team, you will span both techni...
Location
Location
Australia
Salary
Salary:
Not provided
vercel.com Logo
Vercel
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Significant experience in a customer‑facing technical role (solutions architect, sales engineer, platform architect, Agency Software Engineer) working with modern web or cloud platforms
  • Strong hands‑on skills with JavaScript/TypeScript and modern web frameworks, ideally including Next.js
  • Understanding of web performance, caching, networking, and security fundamentals, ideally AI/LLM powered applications and Agentic workflows
  • Comfortable discussing business outcomes with executives and implementation details with developers
  • Track record of owning complex technical projects, juggling multiple stakeholders, and delivering on time in fast‑moving environments
  • Thrives in collaborative field teams, enjoys mentoring others, and values clear, concise communication and documentation
  • Familiar with cloud platforms (such as AWS, Azure, or GCP) and how SaaS and edge platforms integrate into broader enterprise architectures
  • Empathy for developers and a passion for great developer experience
  • Bonus if you have experience of building AI powered applications, or Agentic workloads, ideally leveraging tooling such as the AI SDK
  • Bonus if you have production experience deploying and operating Next.js or other modern frontend frameworks on Vercel or similar platforms
Job Responsibility
Job Responsibility
  • Partner with Sales to lead technical discovery, qualify customer use cases, and shape solution strategy for key opportunities
  • Design target architectures for modern Agentic workflows, and AI and web applications on Vercel, including frontends, APIs, data and edge services, and integrations with headless and composable ecosystems (CMS, commerce, payments, media, analytics)
  • Create and deliver tailored demos, technical presentations, and workshops that clearly articulate Vercel’s value and differentiate our platform
  • Own and run proofs of concept and pilots end‑to‑end: define success criteria, guide implementation, remove technical blockers, and ensure executive‑level outcomes are documented
  • Act as lead architect post‑sale for strategic accounts, guiding onboarding, reference implementations, migration patterns, and best practices for building and operating on Vercel
  • Proactively engage with customers on performance, reliability, and cost optimization, using metrics like Core Web Vitals, Lighthouse, and observability data to drive continuous improvement
  • Collaborate with other members of the Field Engineering Team to standardise playbooks, reusable assets, and patterns that scale across customers and regions
  • Respond to and coordinate security, compliance, and architecture reviews, partnering with internal specialists on areas like data protection and regulatory requirements
  • Capture recurring themes, patterns, and gaps from the field and feed them back into Product and Engineering roadmaps as the technical Voice of the Customer
  • Contribute to internal enablement and external content (reference architectures, example repos, talks, and blog posts) that help developers be successful on Vercel
What we offer
What we offer
  • Competitive compensation package, including equity
  • Inclusive Healthcare Package
  • Learn and Grow - we provide mentorship and send you to events that help you build your network and skills
  • Flexible Time Off
  • We will provide you the gear you need to do your role, and a WFH budget for you to outfit your space as needed
  • Fulltime
Read More
Arrow Right
New

Parts Specialist

The Parts Specialist will provide all retail and installer customers with a high...
Location
Location
United States , Meridianville
Salary
Salary:
Not provided
oreillyauto.com Logo
O'Reilly Auto Parts
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Ability to quickly match alphanumeric sequences
  • Ability to provide outstanding, friendly and professional customer service
  • Must be able to multitask, handling customers on the phone and in the store at the same time
Job Responsibility
Job Responsibility
  • Follow and promote all company customer service programs
  • Promptly greet retail/walk-in customers in a friendly, courteous manner and assist them in their selection of merchandise
  • Assist managers and/or installer service specialists in serving the professional customers as needed and directed
  • Complete assigned company training relevant to position
  • Provide excellent service to customers calling the store on the phone
  • Responsible for accurately maintaining and securing the cash drawer
  • Process exchanges and returns for credit in a friendly manner
  • Address and resolve customer complaints in a friendly manner
  • Assist with the completion of daily image maker, and planogram updates
  • Perform various daily operational tasks
What we offer
What we offer
  • Competitive Wages & Paid Time Off
  • Stock Purchase Plan & 401k with Employer Contributions Starting Day One
  • Medical, Dental, & Vision Insurance with Optional Flexible Spending Account (FSA)
  • Team Member Health/Wellbeing Programs
  • Tuition Educational Assistance Programs
  • Opportunities for Career Growth
  • Fulltime
Read More
Arrow Right

Controller

We are offering an exciting opportunity in the location of Jamestown, New York. ...
Location
Location
United States , Jamestown
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • A minimum of 10 years of experience in a similar role, ideally as a Controller
  • Proficient in using Accounting Software Systems
  • Extensive knowledge and understanding of Accounting Functions
  • Proven experience in Auditing
  • Familiarity with Budget Processes and ability to effectively manage them
  • Adept at leading and managing a team
  • Excellent problem-solving skills
  • Strong communication and interpersonal skills
  • High attention to detail and accuracy
  • Ability to work under pressure and meet deadlines
Job Responsibility
Job Responsibility
  • Implement efficient and effective financial systems, procedures and reports, making revisions as necessary
  • Support the development and periodic review of written policies and procedures related to financial operations
  • Oversee the collection of all relevant financial and statistical information required for the preparation of monthly General Ledger, Accounts Payable, Payroll, and Revenue and Expense Reports
  • Ensure accuracy of the general ledger and associated financial statements, ensuring compliance with Generally Accepted Accounting Principles and related Regulatory requirements
  • Establish guidelines and parameters for interdepartmental allocations consistent with Regulatory requirements
  • Develop and prioritize an internal audit schedule to address all areas of financial reporting to ensure accuracy and compliance with internal policies and procedures as well as external regulatory requirements
  • Assist in the preparation and execution of quarterly financial reviews with program staff and provide financial assistance to Program Directors/applicable management as requested
  • Keep the Chief Financial Officer updated on all pertinent issues involving fiscal issues, addressing problems in these areas with a listing of viable solutions
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • company 401(k) plan
Read More
Arrow Right

Shift Supervisor

We’re building a world of health around every individual — shaping a more connec...
Location
Location
United States , Saint Peters
Salary
Salary:
16.50 - 24.00 USD / Hour
https://www.cvshealth.com/ Logo
CVS Health
Expiration Date
May 10, 2026
Flip Icon
Requirements
Requirements
  • Deductive reasoning ability, analytical skills and computer skills
  • Advanced communication skills and supervision skills
  • Ability to work a flexible schedule, including some early morning, overnight and weekend shifts, to work overtime as needed, and to respond to urgent issues at the store when they arise
Job Responsibility
Job Responsibility
  • Work effectively with store management and store crews
  • Supervise the store’s crew through assigning, directing and following up of all activities
  • Effectively communicate information both to and from store management and crews
  • Assist customers with their questions, problems and complaints
  • Promote CVS customer service culture. (Greet, offer help, and thank)
  • Handle all customer relations issues in accordance with company policy and promote a positive shopping experience for all CVS customers
  • Maintain customer/patient confidentiality
  • Price merchandise
  • Stock shelves
  • Execute the displays, sign and inventory of weekly, promotional, and seasonal merchandise
What we offer
What we offer
  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Parttime
Read More
Arrow Right

Machinist

Location
Location
Antarctica , McMurdo Station
Salary
Salary:
Not provided
amentum.com Logo
Amentum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High School diploma or GED
  • Four years of experience as a Machinist is required
  • Must be innovative and creative in adapting available materials and parts to meet a wide range of needs and applications
  • Must be proficient in layout and design of prototypes
  • Willingness and ability to deploy to Antarctica for extended periods
  • Successful completion of Medical and Dental examinations required by the NSF for deployment to Antarctica
  • Successful completion of drug screening and background check required by employer
  • Successful completion of Federal Background Check required by the NSF
  • Must be willing and able to lift and move items, parts, assemblies, and equipment up to the safety regulation maximum as required by the position
  • Must be willing and able to perform physical activities including heavy lifting, climbing in and out of equipment, crawling, and working outdoors in extreme cold as required by the position
Job Responsibility
Job Responsibility
  • Operates a full range of machine shop equipment including but not limited to Summit 16B engine lathe, Sharp 2480V heavy duty lathe, Sharp VH3 milling machine, steel band saws, boring, honing, and surfacing machines
  • Performs preventative maintenance and repairs to all machine shop equipment
  • Performs alignment procedures on the machines
  • May perform work involved in fabrication, modification, or repair of equipment used for scientific research projects
  • Performs machine shop work as necessary for vehicle and construction equipment repair
  • Performs machine shop work as required by other McMurdo-based work centers
  • Complies with all ASC Environmental, Health & Safety, and Quality Assurance requirements and goals
  • Provides documentation as necessary to ensure adequate legal documentation
  • Maintains a safe workplace and ensures that safety is the highest priority in the workplace
  • Performs other duties as required
  • Fulltime
Read More
Arrow Right

Integrated Media Supervisor

We’re looking for an Integrated Media Supervisor who can bridge strategy and exe...
Location
Location
United States , Scottsdale
Salary
Salary:
Not provided
The James Agency
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6–10 years of experience in media planning and buying (agency experience required)
  • Strong understanding of both traditional (OOH, radio, TV, print) and digital (paid social, search, programmatic) media
  • Ability to translate data into stories. You understand CPMs, CPAs, and ROAS but can explain them in simpler terms
  • Strong negotiation skills and vendor management experience
  • Comfort with analytics and reporting tools
  • Pathlabs or similar platforms a plus
  • A proactive, collaborative, solution-oriented mindset
  • Excellent communication and presentation skills
Job Responsibility
Job Responsibility
  • Develop integrated media strategies and plans across traditional and digital channels
  • Lead vendor negotiations and manage media buys for OOH, radio, broadcast, and digital platforms
  • Oversee campaign setup, pacing, optimization, and reporting in partnership with Pathlabs
  • Translate research and data (Scarborough, GWI, Pathlabs, Google, Meta, etc.) into actionable insights
  • Craft client-ready rationale decks and media POVs that clearly connect media to business outcomes
  • Collaborate with creative and account teams to ensure alignment from concept through launch
  • Manage and mentor the Media Buyer/Planner to ensure high-quality execution and professional growth
  • Maintain strong vendor and platform relationships to keep TJA ahead of emerging media trends
  • Present plans and performance recaps directly to clients when needed
  • Fulltime
Read More
Arrow Right

Electrician

Seasonal Contract position supporting the National Science Foundation managed Un...
Location
Location
Antarctica , McMurdo Station
Salary
Salary:
Not provided
amentum.com Logo
Amentum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • High School diploma or GED
  • A current Journeyman Electrician license, a current Master Electrician license, or equivalent Union membership in good standing. A Journeyman license requiring less than 8,000 hours or of the Residential type shall not be considered.
  • Minimum four years (8,000 Hours) documented experience in all phases of electrical work including rough-in, trim-out, wiring of electrical equipment and fixtures, conduit installation, and power distribution wiring including voltages up to 480
  • Thorough knowledge of the current National Electrical Code and the principles of electricity
  • Experience in construction and/or maintenance of commercial and/or industrial facilities
  • Skilled in ability to read and interpret design drawings, specifications, diagrams, maintenance manual and manufacturer’s troubleshooting guides to accomplish tasking
  • Experienced developing written work plans, job hazard analysis, and job safety analysis related to the work activities under a permit to work system
  • Proficient in the use of electrical/electronic and mechanical test equipment, instruments, measuring devices, and shop tools such as soldering units, meters, scopes, and probes
  • Highly skilled at troubleshooting using appropriate testing devices
  • Extremely knowledgeable regarding occupational hazards of the electrical trade and related safety precautions
Job Responsibility
Job Responsibility
  • Install, maintain, adjust, operate, test and repair branch circuits, power distribution wiring, electrical equipment, fixtures, lighting systems, and conduit in compliance with the National Electric Code (NEC)
  • Diagnose, repair, and perform scheduled Preventative Maintenance on existing electrical systems, equipment, fixtures, and infrastructure
  • Install fixtures, wiring conduits, motors and other electrical equipment
  • Inspect transformers and circuit breakers and other electrical components
  • Accomplish tasking working from verbal information, schematics, blueprints, wiring diagrams, documents, vendor/manufacturer manuals, and applicable electrical specifications and codes
  • Work standard computations relating to load requirements of wiring or electrical equipment
  • Use a variety of electrician’s hand tools and measuring and testing instruments
  • Analyze situations and make technical decisions based on specifications, safety, and electrical codes
  • Perform ongoing risk and hazard assessment while executing tasking
  • Appropriately use personnel protective equipment to include Arc Flash, electrical rated boots, gloves and non-conductive equipment
  • Fulltime
Read More
Arrow Right