Design, scale, and operate distributed systems for a modern observability and multi-cloud intelligence platform. Lead multi-region cloud architecture, build resilient telemetry pipelines, and define long-term infrastructure strategy. Collaborate with senior engineers to shape product direction and customer impact.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
Role Overview
We are seeking a Senior DevOps / Infrastructure Engineer to design, scale, and operate the distributed systems powering a modern observability and multi-cloud intelligence platform built for AI and data-intensive environments.
This role sits at the core of product reliability and performance. You will lead multi-region cloud architecture, build resilient high-ingest telemetry pipelines, and define the long-term infrastructure strategy in a high-ownership engineering environment.
You will work within a small, senior team where architectural decisions directly influence product direction and customer impact.
Key Responsibilities
- Architect and operate multi-region, multi-cloud deployments across AWS, GCP, or Azure
- Design and maintain high-throughput telemetry ingestion pipelines
- Build event-driven architectures supporting real-time observability
- Implement autoscaling, failover strategies, and fault-tolerant system design
- Own production observability using Prometheus, Grafana, distributed tracing, and alerting frameworks
- Define and manage Production SLOs, incident response, and reliability engineering practices
- Develop and maintain CI/CD pipelines, GitOps workflows, and deployment automation
- Collaborate with backend engineering on API performance and infrastructure reliability
- Harden infrastructure for security, compliance, and tenant isolation
- Drive the long-term infrastructure roadmap and architectural direction
- Manage Infrastructure-as-Code (Terraform or similar) and full environment lifecycle
Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
Required Qualifications
- Deep expertise in Kubernetes, Docker, and container orchestration
- Strong background in distributed systems and multi-region architectures
- Experience with high-ingest, streaming, or event-driven systems
- Hands-on experience with Prometheus, Grafana, and tracing/alerting frameworks
- Proficiency with Terraform or similar Infrastructure-as-Code tools
- Experience building and maintaining CI/CD pipelines
- Strong working knowledge of AWS, GCP, or Azure
- Proficiency in Python or Go for automation and tooling
- Experience operating high-availability, production-critical systems
- Experience with Cloudflare (DNS, CDN, WAF, SSL)
- Familiarity with Helm, Kustomize, or Kubernetes deployment tooling
- Experience with time-series databases, vector databases, or high-throughput storage systems
- Background in SRE, platform engineering, or observability tooling
- Experience supporting AI/ML workloads or GPU-based systems
- Familiarity with OpenTelemetry, Jaeger, or distributed tracing frameworks
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
- Expense reimbursement
- Professional training and certification support
- Advancement and leadership growth opportunities
- Meaningful equity participation
- Significant ownership over core infrastructure decisions
This is a fully remote, United States-based role within a senior engineering team operating in a high-ownership, low-overhead environment. You will work closely with experienced engineers, influence architectural direction, and build infrastructure that directly shapes a category-defining observability platform.
If this background aligns with your experience, submit your information for review.
Similar Jobs
Explore other opportunities that match your interests
testRigor
TRC Talent Solutions