Site Reliability Engineer

groupa United State
Visa Sponsorship Relocation Remote
Apply
AI Summary

Join our team to help make communities more adaptive and sustainable by pairing external data with artificial intelligence to identify areas of high risk and prevent catastrophic loss for utilities and critical infrastructure owners across the country.

Key Highlights
Design High-Availability Systems
Maintain System and Network Security
Logging, Metrics and Alerting
Diagnosis and Troubleshooting
Customer Support
Guiding Development Team with Best Practices
Build Engineering
Continuous Learning
Mentorship
Technical Skills Required
GCP Kubernetes CI/CD pipelines Azure AWS Kubernetes GCP CI/CD pipelines Azure AWS Operating Systems Computer Architecture Programming
Benefits & Perks
Remote work
Visa sponsorship
Relocation package

Job Description


We are recruiting for a full-time, direct, and fully remote Site Reliability Engineer to join our client company’s Resiliency Solutions team to help make communities more adaptive and sustainable. This is done by pairing external data with artificial intelligence to identify areas of high risk and prevent catastrophic loss for utilities and critical infrastructure owners across the country. Join a team of close-knit engineers, subject matter experts, and business leaders who obsess over problem-solving, new technologies, and making a positive impact in our communities.


Duties & Responsibilities:

  • Design High-Availability Systems - ensure that all of the systems that we deploy and depend on are configured to maintain full uptime. Plan out deployment strategies to ensure that uptime is maintained during upgrades and maintenance. Design and build out infrastructure-as-code projects. Perform resiliency, load, and disaster recovery tests.
  • Maintain System and Network Security - patch management, ensure that dependencies are kept up to date. Stay informed about zero-day vulnerabilities and any risks that cannot be immediately patched and come up with alternative methods to mitigate their risk.
  • Logging, Metrics and Alerting - set up and monitor logs, metrics, and alerts for the systems.
  • Diagnosis and Troubleshooting - diagnose and resolve production issues. Contribute to retrospectives and post-mortems. Participate in the on-call rotation.
  • Customer Support: Regularly interface directly with customers to take direct feedback and provide top-tier customer support in resolving issues
  • Guiding Development Team with Best Practices - working with the development team to ensure that the software being built will be practical to deploy and maintain.
  • Build Engineering - managing build/deployment pipelines and ensuring best practices are followed in this.
  • Continuous Learning - Stay up-to-date with industry best practices, tools, and technologies related to infrastructure..
  • Mentorship - Work with a team of SREs, providing guidance, coaching, and technical expertise in infrastructure management.


Required Skills & Experience:

  • 5+ years of experience designing and maintaining application systems in the cloud - GCP (preferred) Azure or AWS
  • Extensive experience in Kubernetes and CI/CD pipelines
  • Excellent experience working directly with customers to take feedback and resolve issues
  • Ability to provide top-tier customer service
  • Bachelor's degree in a related field or equivalent experience.
  • People first, technology second.
  • A deep understanding of operating systems and computer architecture experience
  • Good programming abilities - for application diagnosis, infrastructure-as-code, and scripting and glue components.
  • Excellent communication and organizational skills are a must


Subscribe our newsletter

New Things Will Always Update Regularly