Senior DevOps/Infrastructure Engineer

djagora university • United State
Remote
Apply
AI Summary

Design, scale, and operate distributed systems for a modern observability and multi-cloud intelligence platform. Lead multi-region cloud architecture, build resilient telemetry pipelines, and define long-term infrastructure strategy. Collaborate with senior engineers to shape product direction and customer impact.

Key Highlights
Multi-region cloud architecture
Resilient telemetry pipelines
Long-term infrastructure strategy
Key Responsibilities
Architect and operate multi-region, multi-cloud deployments
Design and maintain high-throughput telemetry ingestion pipelines
Build event-driven architectures supporting real-time observability
Implement autoscaling, failover strategies, and fault-tolerant system design
Own production observability using Prometheus, Grafana, distributed tracing, and alerting frameworks
Define and manage Production SLOs, incident response, and reliability engineering practices
Develop and maintain CI/CD pipelines, GitOps workflows, and deployment automation
Collaborate with backend engineering on API performance and infrastructure reliability
Harden infrastructure for security, compliance, and tenant isolation
Drive the long-term infrastructure roadmap and architectural direction
Manage Infrastructure-as-Code (Terraform or similar) and full environment lifecycle
Technical Skills Required
Kubernetes Docker Container orchestration Prometheus Grafana Tracing/alerting frameworks Terraform CI/CD pipelines GitOps workflows Deployment automation Python Go AWS GCP Azure
Benefits & Perks
Expense reimbursement
Professional training and certification support
Advancement and leadership growth opportunities
Meaningful equity participation
Significant ownership over core infrastructure decisions
Nice to Have
Cloudflare (DNS, CDN, WAF, SSL)
Helm
Kustomize
Kubernetes deployment tooling
Time-series databases
Vector databases
High-throughput storage systems
SRE
Platform engineering
Observability tooling
AI/ML workloads
GPU-based systems
OpenTelemetry
Jaeger
Distributed tracing frameworks

Job Description


Role Overview

We are seeking a Senior DevOps / Infrastructure Engineer to design, scale, and operate the distributed systems powering a modern observability and multi-cloud intelligence platform built for AI and data-intensive environments.

This role sits at the core of product reliability and performance. You will lead multi-region cloud architecture, build resilient high-ingest telemetry pipelines, and define the long-term infrastructure strategy in a high-ownership engineering environment.

You will work within a small, senior team where architectural decisions directly influence product direction and customer impact.

Key Responsibilities

  • Architect and operate multi-region, multi-cloud deployments across AWS, GCP, or Azure
  • Design and maintain high-throughput telemetry ingestion pipelines
  • Build event-driven architectures supporting real-time observability
  • Implement autoscaling, failover strategies, and fault-tolerant system design
  • Own production observability using Prometheus, Grafana, distributed tracing, and alerting frameworks
  • Define and manage Production SLOs, incident response, and reliability engineering practices
  • Develop and maintain CI/CD pipelines, GitOps workflows, and deployment automation
  • Collaborate with backend engineering on API performance and infrastructure reliability
  • Harden infrastructure for security, compliance, and tenant isolation
  • Drive the long-term infrastructure roadmap and architectural direction
  • Manage Infrastructure-as-Code (Terraform or similar) and full environment lifecycle

Requirements

Required Qualifications

  • Deep expertise in Kubernetes, Docker, and container orchestration
  • Strong background in distributed systems and multi-region architectures
  • Experience with high-ingest, streaming, or event-driven systems
  • Hands-on experience with Prometheus, Grafana, and tracing/alerting frameworks
  • Proficiency with Terraform or similar Infrastructure-as-Code tools
  • Experience building and maintaining CI/CD pipelines
  • Strong working knowledge of AWS, GCP, or Azure
  • Proficiency in Python or Go for automation and tooling
  • Experience operating high-availability, production-critical systems

Preferred Experience

  • Experience with Cloudflare (DNS, CDN, WAF, SSL)
  • Familiarity with Helm, Kustomize, or Kubernetes deployment tooling
  • Experience with time-series databases, vector databases, or high-throughput storage systems
  • Background in SRE, platform engineering, or observability tooling
  • Experience supporting AI/ML workloads or GPU-based systems
  • Familiarity with OpenTelemetry, Jaeger, or distributed tracing frameworks

Benefits

  • Expense reimbursement
  • Professional training and certification support
  • Advancement and leadership growth opportunities
  • Meaningful equity participation
  • Significant ownership over core infrastructure decisions

Work Environment

This is a fully remote, United States-based role within a senior engineering team operating in a high-ownership, low-overhead environment. You will work closely with experienced engineers, influence architectural direction, and build infrastructure that directly shapes a category-defining observability platform.

If this background aligns with your experience, submit your information for review.

Similar Jobs

Explore other opportunities that match your interests

Senior DevOps Engineer

Devops
•
2h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

testRigor

United State

Senior Cloud Platform Engineer

Devops
•
3h ago
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Mid-Senior level

TRC Talent Solutions

United State
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Entry level

crossing hurdles

United State

Subscribe our newsletter

New Things Will Always Update Regularly