Senior DevOps/Platform Reliability Engineer

Jobgether • United State
Remote
Apply
AI Summary

Jobgether seeks a Senior DevOps/Platform Reliability Engineer to build and evolve infrastructure for a next-generation intelligent automation platform. The role requires strong ownership of CI/CD, cloud infrastructure, observability, and security. The ideal candidate will have experience with modern AI tools and DevOps workflows.

Key Highlights
Building and evolving infrastructure for a next-generation intelligent automation platform
Strong ownership of CI/CD, cloud infrastructure, observability, and security
Experience with modern AI tools and DevOps workflows
Key Responsibilities
Own and evolve CI/CD pipelines using modern tools such as GitHub Actions
Design and manage Infrastructure as Code solutions using Terraform and CloudFormation
Operate and scale Kubernetes-based infrastructure (EKS + Argo CD)
Manage cloud networking and edge infrastructure including Cloudflare, AWS networking services, API gateways, load balancers, and DNS configurations
Oversee data and event infrastructure such as Aurora MySQL, Redis, S3, and Kafka (MSK)
Build and maintain serverless and event-driven systems using AWS Lambda
Develop observability platforms using Prometheus, Grafana, and OpenTelemetry
Strengthen security and compliance posture (SOC 2, HIPAA) through IAM design, secrets management, scanning, and policy-as-code enforcement
Technical Skills Required
GitHub Actions Terraform CloudFormation Kubernetes AWS networking services API gateways Load balancers DNS configurations Aurora MySQL Redis S3 Kafka Prometheus Grafana OpenTelemetry AWS Lambda Python Bash Linux
Benefits & Perks
Competitive compensation package
100% employer-covered employee health premiums
75%-80% coverage for dependent health, dental, and vision plans
401(k) retirement plan
Paid parental leave
Unlimited PTO policy
Fully remote work flexibility across the United States
Up to $200/month co-working space reimbursement
Home office stipend up to $500 for setup
Monthly $100 stipend for internet, phone, and related expenses

Job Description


This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior DevOps / Platform Reliability Engineer in the United States.

This role sits at the intersection of platform engineering, SRE, and AI-driven operations, supporting a next-generation intelligent automation platform used by enterprise-scale customers. You will be responsible for building and evolving the infrastructure backbone that powers production AI and multi-agent systems at scale. The environment is highly technical and fast-moving, requiring strong ownership of CI/CD, cloud infrastructure, observability, and security. You will work closely with engineering teams to ensure safe, reliable, and scalable deployments across complex distributed systems. A key aspect of the role involves integrating modern AI tools into DevOps workflows to reduce operational toil and improve system intelligence. This is a high-impact position where your work directly shapes platform reliability, developer velocity, and production safety.

Accountabilities

  • Own and evolve CI/CD pipelines using modern tools such as GitHub Actions, ensuring safe, scalable, and reversible deployments for microservices and AI workloads
  • Design and manage Infrastructure as Code solutions using Terraform and CloudFormation to automate provisioning and environment consistency
  • Operate and scale Kubernetes-based infrastructure (EKS + Argo CD), including autoscaling, ingress, security controls, and multi-tenant isolation
  • Manage cloud networking and edge infrastructure including Cloudflare, AWS networking services, API gateways, load balancers, and DNS configurations
  • Oversee data and event infrastructure such as Aurora MySQL, Redis, S3, and Kafka (MSK), ensuring reliability, backups, and disaster recovery readiness
  • Build and maintain serverless and event-driven systems using AWS Lambda where appropriate
  • Develop observability platforms using Prometheus, Grafana, and OpenTelemetry, including telemetry for AI/LLM systems and agentic workflows
  • Strengthen security and compliance posture (SOC 2, HIPAA) through IAM design, secrets management, scanning, and policy-as-code enforcement
  • Drive FinOps initiatives including cost optimization, workload attribution, and LLM usage cost control
  • Partner with engineering teams to define deployment standards, operational SLOs, and platform best practices
  • Improve system reliability through monitoring, incident response, automation, and continuous infrastructure improvements
  • Document infrastructure, processes, and operational standards to enable scalability and knowledge sharing

Requirements

  • 5+ years of experience in DevOps, SRE, or Platform Engineering supporting production systems on AWS
  • Strong hands-on experience with CI/CD systems such as GitHub Actions, GitLab CI, Jenkins, or CircleCI
  • Deep experience operating Kubernetes environments (EKS preferred), including scaling, upgrades, and production operations
  • Strong AWS networking knowledge including VPC design, routing, security groups, load balancing, and DNS management
  • Proficiency with Terraform and Infrastructure as Code practices, ideally using OIDC-based authentication
  • Experience with production databases and storage systems including Aurora/RDS MySQL, Redis, and S3
  • Strong observability expertise using Prometheus, Grafana, and OpenTelemetry
  • Experience with Argo CD for GitOps-based deployments
  • Strong understanding of Cloudflare and AWS edge/networking services
  • Experience with Kafka/MSK and event-driven architectures
  • Strong scripting skills in Python, Bash, and Linux environments
  • Solid understanding of security practices including IAM, KMS, secrets management, and supply chain security
  • Experience with compliance and vulnerability scanning tools
  • Ability to work independently while collaborating effectively in high-ownership engineering teams

Benefits

  • Competitive compensation package
  • 100% employer-covered employee health premiums
  • 75%-80% coverage for dependent health, dental, and vision plans
  • 401(k) retirement plan
  • Paid parental leave
  • Unlimited PTO policy
  • Fully remote work flexibility across the United States
  • Up to $200/month co-working space reimbursement
  • Home office stipend up to $500 for setup
  • Monthly $100 stipend for internet, phone, and related expenses
  • Opportunity to work on cutting-edge AI-native infrastructure and agentic systems
  • High-autonomy engineering culture focused on ownership and innovation

How Jobgether Works

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.


Similar Jobs

Explore other opportunities that match your interests

DevOps Engineer

Devops
•
26m ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

sundayy

United State

AI Cloud Infrastructure Engineer

Devops
•
17h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

omni studio

United State

Cloud Application Architect

Devops
•
1d ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

NTT DATA North America

United State

Subscribe our newsletter

New Things Will Always Update Regularly