Observability Engineer

Jobgether • United State

Remote

Apply

AI Summary

Jobgether is seeking an Observability Engineer to design and operate large-scale observability platforms, define observability standards, and manage high-volume time-series and log storage systems. The ideal candidate will have 5+ years of experience in SRE, platform engineering, or observability-focused roles, with hands-on ownership of production monitoring systems at scale. Strong expertise with Prometheus, Grafana, and commercial observability platforms is required.

Key Highlights

Design and operate large-scale observability platforms

Define observability standards

Manage high-volume time-series and log storage systems

Key Responsibilities

Design and operate large-scale observability platforms

Define observability standards

Manage high-volume time-series and log storage systems

Develop self-service tooling, dashboards, and reusable templates

Improve incident response workflows

Technical Skills Required

Prometheus Grafana Datadog OpenTelemetry Go Python Java Kubernetes Linux Networking Distributed Systems

Benefits & Perks

Competitive salary

Fully remote work

Comprehensive healthcare coverage

Paid time off and standard leave benefits

Job Description

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Observability Engineer (Prometheus / Grafana / Datadog) in United States.

This role is focused on building and operating the observability backbone that enables engineering teams to understand and trust complex distributed systems. You will design and maintain end-to-end telemetry pipelines across metrics, logs, and traces, ensuring high-quality signals at scale. Working closely with SRE, platform, and product engineering teams, you will turn noisy system data into actionable insights that improve reliability and performance. The environment is highly technical and cloud-native, requiring strong experience across modern observability stacks and SRE practices. You will help define standards for instrumentation, alerting, and SLO-driven operations across the organization. This is a high-impact role where your work directly shapes how systems are monitored, debugged, and improved in production.

Accountabilities

Design and operate large-scale observability platforms covering metrics, logs, traces, and synthetic monitoring using tools such as Prometheus, Grafana, Datadog, and OpenTelemetry, ensuring reliability, scalability, and usability across engineering teams.
Define and enforce observability standards including instrumentation practices, metric naming conventions, structured logging, and distributed tracing approaches to ensure consistent telemetry quality.
Build and maintain SLO/SLI frameworks, error budgets, and alerting systems that reduce noise while improving incident detection and operational response effectiveness.
Manage high-volume time-series and log storage systems, optimizing for retention, performance, cost efficiency, and query reliability across distributed environments.
Develop self-service tooling, dashboards, and reusable templates that enable product and platform teams to adopt observability best practices with minimal friction.
Improve incident response workflows through better alerting, dashboards, runbooks, and post-incident analysis, while partnering closely with SRE and platform engineering teams.

Requirements

Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

5+ years of experience in SRE, platform engineering, or observability-focused roles, with hands-on ownership of production monitoring systems at scale.
Strong expertise with Prometheus, Grafana, and at least one commercial observability platform such as Datadog, New Relic, or Splunk in production environments.
Deep understanding of OpenTelemetry, distributed tracing, structured logging, and modern telemetry pipelines across cloud-native architectures.
Strong programming skills in at least one language such as Go, Python, or Java, with the ability to build automation and observability tooling.
Solid knowledge of SRE principles including SLOs, error budgets, incident management, and reliability engineering practices.
Experience operating Kubernetes or container-based environments, with strong Linux, networking, and distributed systems fundamentals.
Strong communication skills with the ability to influence engineering teams and drive adoption of observability standards.

Benefits

Competitive salary aligned with experience and market benchmarks
Fully remote work across the United States
Long-term, stable engagement with multi-year project scope

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Comprehensive healthcare coverage (medical, dental, and vision)
Paid time off and standard leave benefits
Opportunities to work with modern cloud-native and open-source observability technologies
Career growth in a high-impact, platform-focused engineering environment

How Jobgether Works

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Job Overview

Posted Date May 20, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location United State

Annual Salary 138,550 USD

Category Devops

Company Jobgether

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

AI Cloud Infrastructure Engineer

Devops

•

11h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

omni studio

United State

Cloud Application Architect

Devops

•

1d ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

NTT DATA North America

United State

System Engineer - Infrastructure

Devops

•

1d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Associate

remotehunter

United State

Observability Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

AI Cloud Infrastructure Engineer

omni studio

Cloud Application Architect

Premium Job

NTT DATA North America

System Engineer - Infrastructure

remotehunter

Subscribe our newsletter