Senior SRE Tech Lead

Huxley • India
Relocation
Apply
AI Summary

We are seeking an experienced Senior SRE Tech Lead to guide our team in building and maintaining robust infrastructure across public and private cloud environments. This role will drive the adoption of SRE best practices, improve system performance, and ensure service continuity.

Key Highlights
Define and execute strategies to enhance reliability, performance, and scalability
Oversee monitoring, alerting, logging, and tracing systems
Establish and maintain SLOs/SLAs, manage error budgets, and lead improvement initiatives
Technical Skills Required
Kubernetes Prometheus Grafana ELK Datadog UNIX-like systems TCP/IP HTTP Shell Python Jenkins GitLab CI/CD CircleCI
Benefits & Perks
Up to 70 lahks INR
Relocation required

Job Description


Salary: Up to 70 lahks INR


Overview

We are a rapidly growing technology organization focused on delivering innovative, high-quality platforms that enable seamless integration and scalability. Our mission is to build reliable, efficient, and secure systems that support millions of users and transactions daily.

Why This Role Matters

As our services expand, ensuring reliability, scalability, and operational excellence becomes critical. We are seeking an experienced SRE Tech Lead to guide our team in building and maintaining robust infrastructure across public and private cloud environments. This role will drive the adoption of SRE best practices, improve system performance, and ensure service continuity.

Key Responsibilities

  • SRE Strategy & Roadmap: Define and execute strategies to enhance reliability, performance, and scalability.
  • Observability Leadership: Oversee monitoring, alerting, logging, and tracing systems to ensure optimal observability.
  • Service Quality: Establish and maintain SLOs/SLAs, manage error budgets, and lead improvement initiatives.
  • Performance Optimization: Identify and resolve bottlenecks in latency and throughput.
  • Incident Management: Act as incident commander during outages, lead RCA efforts, and implement preventive measures.
  • Automation & Efficiency: Drive automation of operational tasks to reduce toil and improve scalability.
  • Team Leadership: Mentor and guide SRE team members, fostering technical growth and collaboration.
  • Cross-functional Collaboration: Partner with development, infrastructure, and security teams to promote a DevOps culture.

Mandatory Qualifications

  • 5+ years of experience in SRE or infrastructure engineering, with at least 2 years in a leadership role.
  • Proven experience managing production systems in public or private cloud environments (AWS, GCP, Azure, etc.).
  • Expertise in designing and operating Kubernetes clusters at scale.
  • Strong knowledge of monitoring and logging tools (Prometheus, Grafana, ELK, Datadog).
  • Deep understanding of UNIX-like systems and networking fundamentals (TCP/IP, HTTP).
  • Hands-on experience with CI/CD pipelines (Jenkins, GitLab CI/CD, CircleCI).
  • Proficiency in scripting languages (Shell, Python) for automation.
  • Excellent communication and collaboration skills.

Preferred Qualifications

  • Background in web application development.
  • Experience with test automation or as a Software Engineer in Test (SET).
  • Practical experience with observability metrics and error budget management.
  • Track record of reducing operational toil through automation.
  • Experience working with globally distributed teams.

Location

Relocation required.


Subscribe our newsletter

New Things Will Always Update Regularly