Senior Site Reliability Engineer (Remote - US/Canada)

Motion Recruitment • United State
Remote
Apply
AI Summary

Cutting-edge tech startup seeks a Senior SRE to own observability, reliability, and scalability for a next-gen content delivery platform. This high-impact role involves designing systems for rapid growth, enterprise-grade security, and global uptime. You will define metrics, implement SRE best practices, and establish the company's first dedicated reliability function. This is a fully remote position open to candidates in the US or Canada.

Key Highlights
Own observability, reliability, and scalability strategy for a next-generation content delivery platform.
Design systems for rapid growth, enterprise-grade security, and global uptime.
Establish the company's first dedicated reliability function from the ground up.
Technical Skills Required
Terraform Kubernetes AWS Prometheus Grafana Datadog Honeycomb Python Go Bash CI/CD
Benefits & Perks
Medical Insurance
Dental Insurance
Vision Insurance
Vacation Time
Stock Options

Job Description


A cutting edge technology start-up is currently looking for a Senior Site Reliability Engineer to join their distributed web services team. This individual will own observability, reliability and scalability strategy for a next-generation content delivery platform that powers real time 3D and AR/VR experiences across the globe.

In this high impact role, you'll work closely with infrastructure, software, and platform teams to design systems that can handle rapid growth while maintaining enterprise grade security and uptime. The ideal candidate will bring deep expertise in cloud infrastructure automation, monitoring and alerting frameworks, and multi-tenant architecture, with a passion for building systems that run seamlessly at global scale. You'll define metrics, implement SRE best practices, and be hands-on establishing the company's very first dedicated reliability function. It's an opportunity to build something from the ground up and make a measurable impact on the reliability of next-generation of 3D content delivery

This is a fully remote position open to candidates based in the US or Canada.

Required Skills & Experience

  • 7+ years of experience in Site Reliability, DevOps, Cloud or Software engineering
  • Hands on expertise with Terraform, Kubernetes, and AWS
  • Strong knowledge of observability principles and tools such as Prometheus, Grafana, Datadog or Honeycomb
  • Proven experience designing and maintaining multi-tenant, globally distributed systems
  • Ability to define and implement SLIs, SLOs, and error budgets to measure and improve reliability
  • Demonstrated success automating infrastructure and improving operational efficiency across CI/CD environments
  • Experience with incident response, post mortems and escalation frameworks
  • ???????Solid understanding of networking, CDN optimization, and distributed content delivery
  • Familiarity with SOC 2, ISO 27001, and GDPR compliance monitoring

Desired Skills & Experience

  • Experience mentoring DevOps or infrastructure engineers in SRE practices
  • Proficiency with Python, Go or Bash for automation
  • Exposure to 3D, media, or streaming platforms at global scale
  • Interest in AR/VR, spatial computing or emerging graphics technologies

What You Will Be Doing

Tech Breakdown

  • 100% AWS

Daily Responsibilities

  • 100% Hands on

The Offer

  • Salary + Benefits

You Will Receive The Following Benefits

  • Medical, Dental, and Vision Insurance
  • Vacation Time
  • Stock Options

Applicants must be currently authorized to work in the US on a full-time basis now and in the future.

Posted By: Jordan Carbonell

Subscribe our newsletter

New Things Will Always Update Regularly