Design and operate observability platforms for bare-metal servers, GPUs, and high-performance networks. Build metrics, logs, traces, and alerting pipelines for internal teams and external customers. Requires 8+ years of infrastructure engineering experience with strong telemetry pipeline expertise.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
Voltage Park is seeking an Infrastructure Engineer with a focus on Observability to join our Infrastructure Engineering team. Our engineers design and operate the systems that manage thousands of bare-metal servers, GPUs, and high-performance networks across multiple data centers.
This role combines the breadth of a core infrastructure engineer with a specialty in observability and telemetry. You’ll design and operate metrics, logs, traces, and alerting pipelines that provide actionable insights for both internal teams and external customers — helping to ensure reliability and transparency at scale.
This is a fully remote position, although candidates must be based in the continental United States. Unfortunately, we are unable to provide sponsorship for this role.
Responsibilities
- Design, build, and maintain observability platforms spanning metrics, logs, traces, and events.
- Create dashboards and alerting for internal stakeholders (InfraOps, Engineering, Customer Success) and scoped visibility for external customers.
- Ingest and correlate telemetry from GPUs, CPUs, networking (Ethernet & InfiniBand), containers, APIs, and BMC/Redfish.
- Implement noise-resistant alerting pipelines that improve detection and reduce operational load.
- Collaborate with infrastructure, platform, and customer-facing teams to embed observability into workflows.
- Contribute to broader infrastructure engineering projects beyond observability.
Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
- 8+ years in infrastructure engineering, SRE, or observability roles.
- Strong experience with monitoring systems (Prometheus, Grafana, ELK, VictoriaMetrics, or similar).
- Proficiency in Python, Go, or bash for automation and data integration.
- Familiarity with container/Kubernetes observability.
- Understanding of streaming telemetry pipelines (Kafka, OTEL, Promtail, or equivalent).
- Strong written and verbal communication skills.
- Experience with GPU observability, particularly NVIDIA DCGM.
- Designing multi-tenant observability solutions with RBAC and scoped queries.
- Prior work with correlation engines for RCA, forecasting, or predictive alerting.
- Broader exposure to infrastructure domains (networking, storage, provisioning).
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
- You enjoy working with a small, highly motivated team.
- You’re comfortable balancing autonomy with company-wide priorities.
- You value clarity, documentation, and actionable insights in observability systems.
- You’re excited to specialize in observability while contributing as a core infrastructure engineer.
Similar Jobs
Explore other opportunities that match your interests
Saragossa
talentola