Senior Site Reliability Engineer (SRE) for Blockchain Observability

remotehunter • United State
Remote
Apply
AI Summary

The Senior Site Reliability Engineer (SRE) role focuses on increasing self-service and reducing cognitive load across teams with a strong emphasis on DevOps, GitOps, and observability. The Observability Team supports development and engineering efforts by building and maintaining reliable observability infrastructure.

Key Highlights
Build and orchestrate a modern OTEL-based observability platform
Support multiple telemetry types including metrics, logs, and traces
Define and support governance in observability and large-scale problem management
Technical Skills Required
OTEL Prometheus Grafana ELK Stack Splunk Grafana Stack Kubernetes C C++ Java Python Go Perl Ruby
Benefits & Perks
Fully remote and global roles with some expectation to overlap working hours with Eastern Standard Time (EST)

Job Description


About the Opportunity:

The organization is an industry-standard oracle platform enabling capital markets to operate onchain and powering most decentralized finance (DeFi) applications. It provides essential data, interoperability, compliance, and privacy standards for advanced blockchain use cases such as institutional tokenized assets, lending, payments, and stablecoins. Since inventing decentralized oracle networks, the organization has enabled tens of trillions in transaction value and secures the majority of DeFi. The Observability Team supports development and engineering efforts by building and maintaining reliable observability infrastructure. The Senior Site Reliability Engineer (SRE) role focuses on increasing self-service and reducing cognitive load across teams with a strong emphasis on DevOps, GitOps, and observability.


Responsibilities:

• Build and orchestrate a modern OTEL-based observability platform

• Support multiple telemetry types including metrics, logs, and traces

• Define and support governance in observability and large-scale problem management

• Ensure reliability, security, and performance exceed defined SLAs

• Collaborate with engineers to troubleshoot issues, deploy products, and improve velocity

• Lead design and deployment of monitoring and observability services with alerting capabilities

• Ingest, aggregate, transform, and utilize data from various sources in a real-time pipeline

• Oversee availability, performance, and supportability of observability infrastructure

• Create and manage alert response processes to ensure reliable data delivery

• Recommend metrics collection for alert creation during new feature releases

• Champion reliability and security by prioritizing quality in all work


Requirements:

• 7 or more years of relevant experience in DevOps, infrastructure, SRE, or platform roles

• Ability to develop software beyond typical infrastructure configurations

• Experience programming in one or more of the following: C, C++, Java, Python, Go, Perl, Ruby

• Expert knowledge in designing, developing, and managing large real-time systems

• Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, Splunk, or Grafana Stack

• Experience with distributed systems and container orchestration including Kubernetes

• Strong communication skills with ability to provide and receive constructive feedback


Benefits & Perks:

• Fully remote and global roles with some expectation to overlap working hours with Eastern Standard Time (EST)


Note:

RemoteHunter is not the Employer of Record (EOR) for this role. Our purpose in this opportunity is to connect exceptional candidates with leading employers. We help job seekers worldwide discover roles that match their goals and guide them to complete their full application directly through the hiring company’s career page or ATS.


Subscribe our newsletter

New Things Will Always Update Regularly