Senior Site Reliability Engineer - Observability

itd • United Kingdom
Remote
Apply
AI Summary

Lead the design, development, and operation of large-scale observability systems ensuring service reliability and performance. Manage Prometheus architecture, Elasticsearch clusters, Kafka pipelines, and alerting systems for global customer infrastructure. Requires 5+ years distributed systems experience and 2+ years in Python, Go, or similar languages.

Key Highlights
Lead design and operation of observability systems for Meraki's global cloud infrastructure
Work with Prometheus, Elasticsearch, Kafka, and alerting workflows handling billions of requests daily
12-month remote position in the UK with 11am-7pm working hours
Direct W2 candidate only - no visa sponsorship available
Key Responsibilities
Lead design, development, and operation of large-scale secure observability systems
Deploy and scale Prometheus architecture to handle 100+ million active series
Deploy and operate high-performance Elasticsearch clusters holding 2000+TB of data
Build and grow high-throughput data pipelines using Kafka
Design alerting systems for engineering teams across multiple data sources
Develop libraries and APIs for self-service monitoring and logging access
Use Terraform to deploy public and private cloud infrastructure
Participate in internal practice community meetings and thought leadership
Complete client case studies and learning materials
Attend internal networking events and work with leadership on career fast-track opportunities
Technical Skills Required
Python Go Prometheus Elasticsearch
Benefits & Perks
Comprehensive medical benefits
401k plan with matching
Paid holidays

Job Description


itD is seeking a Sr. Software Engineer/SRE (Site Reliability Engineer). We are looking for a highly motivated Lead Site Reliability Engineer on the Observability team. You will lead the design, development and operation of large-scale, secure observability systems that make sure our services stay online and performant. We're a team of passionate software engineers that value quality and customer experience. Our team is based in the US and EMEA, and we embrace hybrid and remote work.

The opportunity is remote in the UK.

We provide comprehensive medical benefits, a 401k plan, paid holidays, and more. Please note that we are only considering direct W2 candidates at this time, as we are unable to offer sponsorship.

Internal Responsibilities

  • Attend regular internal practice community meetings.
  • Collaborate with your itD practice team on industry thought leadership.
  • Complete client case studies and learning material (blogs, media material).
  • Build out material to contribute to the Digital Transformation practice.
  • Attend internal itD networking events (in person and virtual).
  • Work with leadership on career fast-track opportunities.

Job Title: Senior Site Reliability Engineering for Observability

Work Location: UK (Remote)

Duration: 12 months

Working hours: 11am to 7pm (mostly)

Interviews: 2 Webex Video Rounds

The Meraki cloud supports millions of customer devices from 8 data centers around the world. Meraki’s customer base has grown by a factor of 2-3 every year, serving billions of HTTP requests per day globally. Our customers depend on our products to run their critical infrastructure of network switches, security appliances, wireless APs and security cameras.

As SREs at Meraki, we are responsible for building and growing the cloud that supports these customers and their networks. As a Lead Site Reliability Engineer on the Observability team you will lead the design, development and operation of large-scale, secure observability systems that make sure our services stay online and performant. We're a team of passionate software engineers that value quality and customer experience. Our team is based in the US and EMEA, and we embrace hybrid and remote work.

Examples Of Projects Our Team Works On

  • Design, deploy and scale our Prometheus architecture to handle 100+ million active series and beyond.
  • Deploy and operate large, high-performance ElasticSearch clusters holding 2000+TB of data.
  • Deploy and grow high-throughput data pipelines built on Kafka, handling hundreds of thousands of events per second.
  • Design and build an alerting system that allows engineering teams to construct alerts from multiple data sources and alerting workflows.
  • Write libraries and APIs that give engineers self-service access to our monitoring, logging, and other observability systems.
  • Use Terraform to deploy public and private cloud infrastructure.

You Are An Ideal Candidate If You

  • Have 5+ years experience designing, deploying and operating mid to large size distributed systems on VMs or bare metal machines running Linux (we run Debian and Ubuntu).
  • Have 2+ years experience developing with languages like Ruby, Python, Go, Scala, or Bash.
  • Are excited by the challenge of solving difficult problems in large distributed systems that deal with huge amounts of data.
  • Want to work on a highly autonomous team that cares deeply about quality and customer experience.
  • Are curious, learn fast and feel comfortable diving into unfamiliar code and systems to solve problems.
  • Understand the value of observability and can work with other teams to help them better monitor their services.
  • Are willing to be part of a production on-call rotation.
  • Have direct experience with the following technologies (or similar): Elasticsearch Logstash Kibana (ELK) stack, Kafka, Prometheus/Thanos/Cortex, Graphite, Ansible, Terraform, Consul.
  • Have strong experience in building out solitons based on Software engineering best practices.

Preferred Skills

Enterprise and Tech Industry experience: Cisco, Meraki, Google, ServiceNow, Meta, etc.

Education

Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a closely related field.

Company Description

About itD: We are part of a new generation of consulting and software development company that blends diversity, innovation, and integrity with real business results. Our structure rejects any strong hierarchy, empowering us to deliver excellent results. We are a woman- and minority-led firm. Every day, we challenge ourselves to be considerate, fair and to re-think what great outcomes mean for our customers. This permeates down to how we approach every interaction, on every project, for every client. You’ll thrive here if you are a dynamic self-starter, a difference-maker or someone who wants to deliver great results, without constraints.

The itD Digital Experience: Joining us means you’ll be part of our global community, you have a say about your own career journey, and you’ll get a chance to give back to causes that matter. You will experience working with Fortune 500 companies and high-performance teams across numerous industries. itD offers our employees excellent benefits such as medical, dental, vision, life insurance, paid holidays, 401K + matching, networking & career learning and development programs. We are growing and we want to see you grow! Visit https://itdtech.com/careers to learn more about what working at itD can mean for you.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law. itD is committed to working with and providing reasonable accommodation to individuals with disabilities. If, because of a medical condition or disability, you need a reasonable accommodation for any part of the application process, or to perform the essential functions of a position, please contact us at recruiting@itdtech.com and let us know the nature of your request and your contact information.

Additional Info

Dynamic environment in a culture of respect, empowerment and recognition for a job well done, apply today!

Similar Jobs

Explore other opportunities that match your interests

Senior Shopify Plus Developer - Global E-commerce Growth

Programming
•
4h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

medik8

United Kingdom

C++ Developer - AI Agent Systems

Programming
•
6h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

YO IT CONSULTING

United Kingdom

Staff+ Platform Engineer - Shape Astronomer's Product Future

Programming
•
6h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

astronomer

United Kingdom

Subscribe our newsletter

New Things Will Always Update Regularly