Senior Observability Engineer

Remote
Apply
AI Summary

Lead the design, implementation, and migration of Dynatrace. Deploy, configure, and manage Dynatrace for full-stack observability. Partner with Development, Operations, and SRE teams to improve system reliability and service health.

Key Highlights
Lead the design, implementation, and migration of Dynatrace
Deploy, configure, and manage Dynatrace for full-stack observability
Partner with Development, Operations, and SRE teams
Technical Skills Required
Dynatrace Splunk AppDynamics Linux Unix Windows Kubernetes AWS Azure GCP Terraform Ansible
Benefits & Perks
100% remote contract with global exposure
High-impact role influencing platform stability and performance

Job Description


Key Responsibilities
  • Lead the design, implementation, and migration from AppDynamics and Splunk to Dynatrace.
  • Deploy, configure, and manage Dynatrace for full-stack observability across applications, infrastructure, and cloud environments.
  • Build and maintain dashboards, alerts, DQL queries, SLOs, health rules, and anomaly detection models.
  • Instrument applications, microservices, operating systems, and cloud platforms with deep hands-on involvement.
  • Analyze metrics, logs, traces, and events to support incident detection, root cause analysis (RCA), and performance optimization.
  • Partner with Development, Operations, and SRE teams to improve system reliability and service health.
  • Automate observability processes and enforce platform best practices and governance.
  • Support production issues and provide expert guidance during incidents and post-mortems.
Required Skills & Experience
  • 8+ years of experience in IT operations, SRE, or observability engineering roles.
  • Strong hands-on expertise in Dynatrace administration, configuration, automation, and platform design.
  • Proven experience designing and implementing observability using:
  • Dynatrace
  • Splunk
  • AppDynamics
  • Strong experience with telemetry (metrics, logs, traces, events) and observability best practices.
  • Solid operating system knowledge (Linux/Unix, Windows) with strong troubleshooting skills.
  • Experience instrumenting:
  • Applications and microservices
  • Containers and Kubernetes (preferred)
  • Cloud platforms (AWS, Azure, or GCP preferred)
  • Strong analytical skills for performance tuning and problem resolution.
  • Excellent communication skills and ability to collaborate across engineering teams.
Nice to Have
  • Experience with SRE practices (SLIs, SLOs, error budgets).
  • Experience with Infrastructure as Code (Terraform, Ansible, etc.).
  • Exposure to CI/CD and DevOps pipelines.
  • Cloud-native and microservices architecture experience.
Why Join
  • Opportunity to lead a large-scale enterprise observability transformation.
  • Work with modern reliability and monitoring platforms.
  • 100% remote contract with global exposure.
  • High-impact role influencing platform stability and performance.



Subscribe our newsletter

New Things Will Always Update Regularly