Site Reliability Engineer

Jobgether • United State
Remote
Apply
AI Summary

Jobgether is seeking a Site Reliability Engineer to ensure the reliability, scalability, and performance of complex systems across cloud and on-premises environments. The ideal candidate will have experience in infrastructure engineering and operational best practices. This role involves hands-on management of large-scale data centers, automation of deployment workflows, and integration of observability tools.

Key Highlights
Design and maintain scalable infrastructure using containers, microservices, and Kubernetes
Monitor system performance and troubleshoot reliability issues
Manage CI/CD pipelines and GitOps workflows
Key Responsibilities
Design, implement, and maintain scalable infrastructure using containers, microservices, and Kubernetes
Monitor system performance and troubleshoot reliability issues
Manage CI/CD pipelines and GitOps workflows
Implement configuration management processes using tools like Ansible
Operate and optimize high-throughput Kafka clusters
Collaborate with development teams to influence system design and operational policies
Technical Skills Required
Linux systems expertise Containers Kubernetes CI/CD pipeline management GitOps workflows Prometheus Grafana ELK Stack Ansible Kafka ArgoCD Helm charts Kustomize
Benefits & Perks
Competitive salary: $118,000–$158,000 USD
Comprehensive medical, dental, and vision coverage
Employer-paid income protection benefits
Flexible spending accounts
Retirement plan with 401(k) and employer match
Employee Stock Purchase Plan (ESPP) and potential bonuses
Paid time off, sick leave, and company-observed holidays
Employee Assistance Program and additional perks

Job Description


This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer in United States.

This role is responsible for ensuring the reliability, scalability, and performance of complex systems across cloud and on-premises environments. The Site Reliability Engineer will work closely with development, operations, and product teams to design and maintain resilient infrastructure, implement CI/CD pipelines, and manage containerized applications and Kubernetes clusters. You will proactively monitor system performance, troubleshoot critical issues, and optimize operational processes to maintain high service availability. This position involves hands-on management of large-scale data centers, automation of deployment workflows, and integration of observability tools. The ideal candidate is highly analytical, detail-oriented, and experienced in both infrastructure engineering and operational best practices. Success in this role directly impacts system uptime, operational efficiency, and overall customer satisfaction.

Accountabilities

  • Design, implement, and maintain scalable, highly available infrastructure using containers, microservices, and Kubernetes.
  • Monitor system performance, troubleshoot reliability issues, and ensure optimal operation of both cloud-based and on-premises systems.
  • Manage CI/CD pipelines and GitOps workflows, including ArgoCD, Helm charts, and Kustomize configurations for efficient software deployment.
  • Implement configuration management processes using tools like Ansible to ensure consistent environments across data centers.
  • Operate and optimize high-throughput Kafka clusters for event streaming, including replication, partitioning, and disaster recovery strategies.
  • Collaborate with development teams to influence system design, operational policies, and best practices.
  • Maintain comprehensive technical documentation, runbooks, architectural diagrams, and incident response procedures.
  • Participate in on-call rotations and conduct blameless post-mortems for critical incidents.
  • Continuously evaluate emerging technologies to enhance operational efficiency and reliability.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field; advanced degree preferred.
  • 5+ years of experience in site reliability engineering or a related field focused on production systems and service delivery.
  • Strong Linux systems expertise, including configuration, tuning, and troubleshooting.
  • Hands-on experience with containers, Kubernetes, and microservices architecture.
  • Proficient in CI/CD pipeline management and GitOps workflows, including ArgoCD, Helm charts, and automation tools.
  • Experience with observability tools such as Prometheus, Grafana, and ELK Stack.
  • Proven ability to manage large on-premises data centers with hundreds of bare metal servers and VMs.
  • Familiarity with networking concepts, protocols, and configuration management tools.
  • Strong analytical and troubleshooting skills with the ability to resolve complex system issues.
  • Excellent communication skills and experience collaborating across cross-functional teams.

Benefits

  • Competitive salary: $118,000–$158,000 USD, depending on experience and location.
  • Comprehensive medical, dental, and vision coverage for employees and dependents.
  • Employer-paid income protection benefits including life, AD&D, short- and long-term disability.
  • Flexible spending accounts for healthcare and dependent care.
  • Retirement plan with 401(k) and employer match, plus Roth options.
  • Employee Stock Purchase Plan (ESPP) and potential bonuses.
  • Paid time off, sick leave, and company-observed holidays.
  • Employee Assistance Program and additional perks such as commuter benefits, discount programs, and identity theft protection.
  • Fully remote work opportunity within the U.S.

Why Apply Through Jobgether?

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.


Similar Jobs

Explore other opportunities that match your interests

Lead DevOps Engineer

Devops
•
7h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Jobs via Dice

United State

Generative AI Platforms Architect

Devops
•
7h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Jobs via Dice

United State
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

RAZOR

United State

Subscribe our newsletter

New Things Will Always Update Regularly