Jobgether seeks a Senior DevOps/Platform Reliability Engineer to build and evolve infrastructure for a next-generation intelligent automation platform. The role requires strong ownership of CI/CD, cloud infrastructure, observability, and security. The ideal candidate will have experience with modern AI tools and DevOps workflows.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior DevOps / Platform Reliability Engineer in the United States.
This role sits at the intersection of platform engineering, SRE, and AI-driven operations, supporting a next-generation intelligent automation platform used by enterprise-scale customers. You will be responsible for building and evolving the infrastructure backbone that powers production AI and multi-agent systems at scale. The environment is highly technical and fast-moving, requiring strong ownership of CI/CD, cloud infrastructure, observability, and security. You will work closely with engineering teams to ensure safe, reliable, and scalable deployments across complex distributed systems. A key aspect of the role involves integrating modern AI tools into DevOps workflows to reduce operational toil and improve system intelligence. This is a high-impact position where your work directly shapes platform reliability, developer velocity, and production safety.
Accountabilities
- Own and evolve CI/CD pipelines using modern tools such as GitHub Actions, ensuring safe, scalable, and reversible deployments for microservices and AI workloads
- Design and manage Infrastructure as Code solutions using Terraform and CloudFormation to automate provisioning and environment consistency
- Operate and scale Kubernetes-based infrastructure (EKS + Argo CD), including autoscaling, ingress, security controls, and multi-tenant isolation
- Manage cloud networking and edge infrastructure including Cloudflare, AWS networking services, API gateways, load balancers, and DNS configurations
- Oversee data and event infrastructure such as Aurora MySQL, Redis, S3, and Kafka (MSK), ensuring reliability, backups, and disaster recovery readiness
- Build and maintain serverless and event-driven systems using AWS Lambda where appropriate
- Develop observability platforms using Prometheus, Grafana, and OpenTelemetry, including telemetry for AI/LLM systems and agentic workflows
- Strengthen security and compliance posture (SOC 2, HIPAA) through IAM design, secrets management, scanning, and policy-as-code enforcement
- Drive FinOps initiatives including cost optimization, workload attribution, and LLM usage cost control
- Partner with engineering teams to define deployment standards, operational SLOs, and platform best practices
- Improve system reliability through monitoring, incident response, automation, and continuous infrastructure improvements
- Document infrastructure, processes, and operational standards to enable scalability and knowledge sharing
Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
- 5+ years of experience in DevOps, SRE, or Platform Engineering supporting production systems on AWS
- Strong hands-on experience with CI/CD systems such as GitHub Actions, GitLab CI, Jenkins, or CircleCI
- Deep experience operating Kubernetes environments (EKS preferred), including scaling, upgrades, and production operations
- Strong AWS networking knowledge including VPC design, routing, security groups, load balancing, and DNS management
- Proficiency with Terraform and Infrastructure as Code practices, ideally using OIDC-based authentication
- Experience with production databases and storage systems including Aurora/RDS MySQL, Redis, and S3
- Strong observability expertise using Prometheus, Grafana, and OpenTelemetry
- Experience with Argo CD for GitOps-based deployments
- Strong understanding of Cloudflare and AWS edge/networking services
- Experience with Kafka/MSK and event-driven architectures
- Strong scripting skills in Python, Bash, and Linux environments
- Solid understanding of security practices including IAM, KMS, secrets management, and supply chain security
- Experience with compliance and vulnerability scanning tools
- Ability to work independently while collaborating effectively in high-ownership engineering teams
- Competitive compensation package
- 100% employer-covered employee health premiums
- 75%-80% coverage for dependent health, dental, and vision plans
- 401(k) retirement plan
- Paid parental leave
- Unlimited PTO policy
- Fully remote work flexibility across the United States
- Up to $200/month co-working space reimbursement
- Home office stipend up to $500 for setup
- Monthly $100 stipend for internet, phone, and related expenses
- Opportunity to work on cutting-edge AI-native infrastructure and agentic systems
- High-autonomy engineering culture focused on ownership and innovation
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Similar Jobs
Explore other opportunities that match your interests
sundayy
omni studio
Cloud Application Architect