Senior AIOps Engineer, Incident Response

Jobgether • United State

Remote

Apply

AI Summary

Lead production reliability, incident response, and AI-driven operational transformation for critical cloud-based systems at Quanata.

Key Highlights

Own end-to-end production health monitoring, incident management, and root cause analysis across distributed cloud environments

Design and deploy AI-driven agents and automation workflows to reduce operational toil and improve incident response

Requires 6-8 years of experience in SRE or production operations with strong incident management expertise

Key Responsibilities

Own production health monitoring, reliability processes, and operational support across critical services

Lead incident response, stakeholder communication, root cause analysis, and post-incident reviews

Identify recurring production issues and implement long-term fixes to reduce operational toil

Design and deploy AI-driven agents and automation workflows to streamline operational tasks

Collaborate with engineering, product, and AI orchestration teams to improve system resilience

Develop and maintain runbooks, operational documentation, and knowledge bases for human and AI use

Support observability, monitoring, and troubleshooting across distributed cloud environments

Participate in on-call rotations and continuously improve incident response readiness

Technical Skills Required

AWS Jira Confluence Observability platforms Incident management Root cause analysis DevOps SDLC Change management

Benefits & Perks

Competitive salary range: $215,000 - $280,000

Fully remote-first work environment within the United States

Comprehensive health coverage including medical, dental, and vision insurance

Life insurance and supplemental income protection plans

401(k) retirement plan with company match

$2,000 home office equipment stipend

Annual $5,000 learning and professional development budget

LinkedIn Learning subscription and access to coaching platforms

Headspace subscription and monthly wellness allowance

Four weeks of paid time off in the first year

Twelve weeks of fully paid parental leave

Flexible remote work culture with core collaboration hours (9AM-2PM PT)

Nice to Have

Experience with cloud platforms such as AWS and distributed system architectures

Exposure to AI/LLM systems, automation frameworks, or intelligent agents

Job Description

This position is posted by Jobgether on behalf of Quanata. We are currently looking for a Senior AIOps Engineer, Incident Response in United States.

This role sits at the intersection of production reliability, incident response, and AI-driven operational transformation within a modern, cloud-first environment. The position is responsible for ensuring the stability and resilience of critical production systems while continuously improving how incidents are detected, managed, and resolved. It offers the opportunity to shape next-generation operational practices by integrating automation and intelligent agent workflows into daily support processes. Working closely with engineering, product, and AI teams, the role drives improvements in system observability, scalability, and response efficiency. It is ideal for someone who thrives in high-impact environments where reliability and innovation must evolve together. The position also involves participating in on-call rotations and supporting fast-moving production systems at scale.

Accountabilities

The Senior AIOps Engineer, Incident Response is responsible for ensuring end-to-end production reliability and continuously improving operational maturity across cloud-based systems. This includes leading incident management activities and driving systemic improvements through data-driven insights and automation initiatives.

Own production health monitoring, reliability processes, and operational support across critical services
Lead incident response, stakeholder communication, root cause analysis, and post-incident reviews
Identify recurring production issues and implement long-term fixes to reduce operational toil
Design and deploy AI-driven agents and automation workflows to streamline operational tasks
Collaborate with engineering, product, and AI orchestration teams to improve system resilience
Develop and maintain runbooks, operational documentation, and knowledge bases for human and AI use
Support observability, monitoring, and troubleshooting across distributed cloud environments
Participate in on-call rotations and continuously improve incident response readiness

Requirements

This role requires strong experience in production operations or site reliability engineering, with a proven ability to manage incidents in complex distributed systems. Candidates should be comfortable operating in fast-paced, cross-functional environments with a strong focus on automation and continuous improvement.

Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

6-8 years of experience in production operations, SRE, or technical support engineering roles
Strong expertise in incident management, root cause analysis, and production troubleshooting
Experience working in DevOps, SDLC, and change management environments
Familiarity with tools such as Jira, Confluence, and modern observability platforms
Strong analytical mindset with ability to identify trends and operational inefficiencies
Excellent communication skills for cross-functional collaboration with engineering and leadership teams
Bachelor’s degree in Computer Science, Engineering, or equivalent experience
Experience with cloud platforms such as AWS and distributed system architectures (bonus)
Exposure to AI/LLM systems, automation frameworks, or intelligent agents (strong advantage)

Benefits

Competitive salary range: $215,000 - $280,000 (based on experience and internal equity)
Fully remote-first work environment within the United States
Comprehensive health coverage including medical, dental, and vision insurance
Life insurance and supplemental income protection plans

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

401(k) retirement plan with company match
$2,000 home office equipment stipend
Annual $5,000 learning and professional development budget
LinkedIn Learning subscription and access to coaching platforms
Headspace subscription and monthly wellness allowance
Four weeks of paid time off in the first year
Twelve weeks of fully paid parental leave
Flexible remote work culture with core collaboration hours (9AM-2PM PT)

How Jobgether Works

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Job Overview

Posted Date Jun 02, 2026

Employment Type Full-time

Experience Level Not Applicable

Location United State

Annual Salary 215,000 - 280,000 USD

Category Programming

Company Jobgether

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Senior Full Stack Java Developer (Remote, U.S. Citizens/Green Card Holders)

Programming

•

12m ago

Visa Sponsorship Relocation Remote

Job Type Contract

Experience Level Mid-Senior level

Take2 Consulting, LLC

United State

Senior Mobile Game Developer (Solar2D - Corona SDK)

Programming

•

17m ago

Visa Sponsorship Relocation Remote

Job Type Contract

Experience Level Not Applicable

hire feed

United State

Senior Golang Developer - Cloud-Native Backend Engineer

Programming

•

30m ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

Bright Vision Technologies

United State

Senior AIOps Engineer, Incident Response

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Senior Full Stack Java Developer (Remote, U.S. Citizens/Green Card Holders)

Take2 Consulting, LLC

Senior Mobile Game Developer (Solar2D - Corona SDK)

hire feed

Senior Golang Developer - Cloud-Native Backend Engineer

Bright Vision Technologies

Subscribe our newsletter