Site Reliability Engineer

digitalxc ai • India

Remote

Apply

AI Summary

Design, build, and maintain highly available, scalable, and secure infrastructure for DigitalXC AI's GenAI and automation platform. Monitor system performance, manage incident response, perform root-cause analysis, and implement reliability improvements. Collaborate with engineering teams on CI/CD pipelines, observability, capacity planning, and disaster recovery.

Key Highlights

Design and maintain infrastructure for GenAI-powered hyper-automation platform

Monitor system performance and manage incident response with root-cause analysis

Implement CI/CD pipelines, observability, and capacity planning best practices

Collaborate with software engineering teams on reliability and automation

Key Responsibilities

Design, build, and maintain highly available, scalable, and secure infrastructure

Monitor system performance and manage incident response

Perform root-cause analysis for production issues

Implement reliability and performance improvements

Collaborate with software engineering teams on resilient service design

Automate deployments and improve observability

Implement capacity planning and disaster recovery strategies

Define and refine SLOs/SLIs

Manage CI/CD pipelines

Contribute to tooling that reduces operational toil

Technical Skills Required

Linux system administration Cloud platforms (AWS, Azure, GCP) Python or Go programming Kubernetes and container orchestration

Benefits & Perks

Remote work

Job Description

Company Description DigitalXC AI is a GenAI-powered hyper-automation and employee experience platform focused on transforming enterprise IT operations and support. The platform enables self-service, self-heal, self-help, and operations automation across major IT domains, backed by an app store of 650+ prebuilt automated services that can drive 50–60% automation within 12–18 months. DigitalXC AI delivers a consumer-grade, omnichannel experience through web and mobile apps, chat and voice bots, and integrations with tools like ServiceNow. Its intelligent virtual assistants and AI agents enhance productivity by supporting user queries, content creation, enterprise search, technical support, and more. The platform integrates with a wide range of enterprise technologies, including cloud, digital workplace, service desk, DevOps, networks, security, and leading business applications.
Role Description This is a full-time, remote role for a Site Reliability Engineer at DigitalXC AI. The Site Reliability Engineer will design, build, and maintain highly available, scalable, and secure infrastructure that powers the company’s GenAI and automation platform. Day-to-day responsibilities include monitoring system performance, managing incident response, performing root-cause analysis, and implementing reliability and performance improvements. The role involves collaborating with software engineering teams to design resilient services, automate deployments, improve observability, and implement best practices for capacity planning and disaster recovery. The Site Reliability Engineer will also help define and refine SLOs/SLIs, manage CI/CD pipelines, and contribute to tooling that reduces operational toil.
Qualifications

Candidates should possess strong Site Reliability Engineering skills, including observability, incident management, capacity planning, and reliability best practices.
Candidates should possess deep System Administration and Infrastructure skills, such as managing Linux-based systems, cloud platforms (e.g., AWS, Azure, GCP), networking basics, and infrastructure-as-code tooling.

Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Candidates should possess solid Software Development skills, including proficiency in at least one programming or scripting language (e.g., Python, Go, Java, or Bash) and experience building automation and internal tools.
Candidates should possess advanced Troubleshooting skills for diagnosing complex production issues across applications, infrastructure, and third-party integrations.
Experience with CI/CD pipelines, containers and orchestration (e.g., Docker, Kubernetes), and monitoring/logging stacks (e.g., Prometheus, Grafana, ELK, or similar) is highly beneficial.

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Understanding of security best practices for cloud-native environments, including access control, secrets management, and patching, is preferred.
Effective communication skills, a collaborative mindset, and the ability to work independently in a remote, distributed team are essential.
Bachelor’s degree in Computer

Job Overview

Posted Date Jun 23, 2026

Employment Type Full-time

Experience Level Entry level

Location India

Category Devops

Company digitalxc ai

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Senior Infrastructure Engineer - PostgreSQL & Cloud (AWS/GCP)

Devops

•

20h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

Jobgether

India

Senior Windows Infrastructure Engineer - Global Role

Devops

•

4d ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

broadridge india

India

Global Remote Tech Talent Acquisition Specialist

Devops

•

4d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

insurancedekho

India

Site Reliability Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Senior Infrastructure Engineer - PostgreSQL & Cloud (AWS/GCP)

Jobgether

Senior Windows Infrastructure Engineer - Global Role

Premium Job

broadridge india

Global Remote Tech Talent Acquisition Specialist

insurancedekho

Subscribe our newsletter