Datadog Monitoring Systems Engineer
Job Description
Job Title: Lead Monitoring Systems Engineer
Location: Washington, DC (100% Remote)
Reports To: Manager, Systems Monitoring Team
Job Category/Level: Systems/Monitoring/Lead
Background: We are seeking a Lead Systems Engineer to support Systems Monitoring initiatives for several upcoming projects in 2024 and beyond. This role will focus on the administration of systems and applications monitoring tools, with a strong emphasis on DataDog.
Key Responsibilities:
- Administer and maintain the DataDog monitoring tool on a Linux platform, including application performance monitoring (APM), log management, and network monitoring.
- Instrument Java-based applications running on Tomcat using DataDog.
- Configure centralized logging for various sources, ensuring seamless integration with DataDog.
- Create dashboards and data visualizations in DataDog to monitor key performance metrics.
- Develop and implement end-user monitoring and synthetic monitoring solutions using CloudBeat and Selenium scripts.
- Analyze tool data, prepare weekly status reports, and communicate potential issues to management.
- Collaborate with Systems and Application Architecture teams to ensure monitoring requirements are met during the development process.
- Provide training and documentation for monitoring tools and procedures.
Qualifications:
- Education: Bachelor of Science in Computer Science or related field, or equivalent experience.
- Experience: 5-8 years in IT with a focus on monitoring tools, including a minimum of 3 years of hands-on experience with DataDog.
- Strong experience with Linux platforms (preferably Red Hat) and Java application instrumentation.
- Familiarity with the ELK Stack (Elasticsearch, Logstash, Kibana) is a plus.
- Proficient in scripting languages (Python, Shell, Ansible) for automation tasks.
- Understanding of SSL setup, encryption methods, and network components (e.g., routers, switches, load balancers).
- Experience with systems monitoring strategies in large-scale environments and service level management.
Competencies:
- Excellent organizational, interpersonal, and analytical skills.
- Self-motivated with the ability to adapt to changing priorities and tight deadlines.
- Strong problem-solving initiative and technical proficiency.
- Effective communication skills, both verbal and written.
- Familiarity with Agile methodologies and software development life cycles (SDLC).
Preferred Qualifications:
- ITIL Foundations v3 certification (to be obtained within 180 days).
- SAFe certification.
- Experience integrating cloud monitoring solutions (e.g., AWS CloudWatch) with DataDog.