Design, build, and manage large-scale observability solutions across hybrid cloud environments using Splunk Observability Cloud, OpenTelemetry, and automation tools. Manage Splunk Observability platform, including ingestion pipelines, detectors, dashboards, and alerting. Drive observability adoption across systems and applications.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Avacend Inc, is seeking the following. Apply via Dice today!
Title: Splunk Observability Engineer
Location: 100% remote & EST Time Zone preferred
Duration: Start/End Dates: 12/1/2025 - 10/23/2026
- 100% remote & EST Time Zone preferred
Skills: Splunk (Certified), Linux (RHEL), Automation (Python and Ansible)
Job Description:
Role Summary
We are seeking a highly skilled Splunk Observability Engineer with a strong System Administration and Infrastructure Automation background.
The ideal candidate will design, build, and manage large-scale observability solutions across hybrid cloud environments, leveraging Splunk Observability Cloud, OpenTelemetry, and automation tools to enable end-to-end visibility, incident response, and performance insights.
Technical Expertise
- Education & Experience:
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field (preferred)
- 8–10 years of relevant experience in Splunk observability with Certification.
- Proven experience in managing environments with 10,000+ compute systems across RHEL, OpenStack, and VMware.
- Core Infrastructure Expertise:
- Strong hands-on experience with Red Hat Enterprise Linux — build, configuration, hardening, patching, and lifecycle operations.
- Working knowledge of VMware vSphere, Cisco UCS compute infrastructure, and OpenStack environments.
- Deep understanding of networking fundamentals, firewalls, iptables, and system security best practices.
- Automation & Configuration Management:
- Expertise with Ansible, Terraform, and Puppet for OS provisioning, configuration, and lifecycle automation.
- Experience with CI/CD pipelines using Git, Jenkins, or similar tools for automated delivery and testing.
- Strong scripting and automation background in Python or Ruby.
- Security & Compliance:
- Understanding of OS hardening, CIS benchmarks, and security patch management.
- Experience working with InfoSec and compliance teams to remediate vulnerabilities and maintain secure build standards.
- Experience designing and managing Splunk Observability Cloud, Splunk Enterprise, or equivalent monitoring platforms.
- Ability to instrument applications and systems using OpenTelemetry, Telegraf, or custom agents.
- Knowledge of metrics, logs, traces, and events correlation to build actionable insights and alerts.
- Participate actively in Agile scrum ceremonies and sprint planning.
- Design and implement automated provisioning pipelines for RHEL and observability agents using Ansible, Terraform, and CI/CD workflows.
- Manage the Splunk Observability platform — ingestion pipelines, detectors, dashboards, and alerting.
- Monitor, diagnose, and resolve complex infrastructure performance issues.
- Drive observability adoption across systems and applications, improving MTTR and SLO compliance.
- Maintain documentation, runbooks, and configuration repositories for infrastructure and observability systems.
- Proven experience working in Agile environments with globally distributed teams.
- Strong communication and collaboration skills, with the ability to influence cross-functional stakeholders.
- Demonstrated problem-solving and troubleshooting ability in complex distributed environments.
- Self-motivated and proactive with a strong sense of ownership and accountability.
- Experience with open-source ecosystems and community-driven development practices.
- Splunk certifications (e.g., Splunk Certified Admin / Observability Engineer / Core Certified Power User).
- Exposure to Kubernetes, AWS CloudWatch, or Grafana/Prometheus integration.
- Experience implementing AIOps, anomaly detection, or predictive alerting solutions.