Senior Platform & Site Reliability Engineer - Cloud Platform Leadership

genius match • Portugal

Remote

Apply

AI Summary

Lead platform architecture and engineering standards for a rapidly growing enterprise software organization. Own and operate shared cloud platform, CI/CD, observability, and event streaming infrastructure. Collaborate with U.S.-based team, available until at least 3:00 PM EST.

Key Highlights

Establish platform engineering standards across multiple products

Design and maintain event streaming and batch processing infrastructure

Own and improve observability, CI/CD, and deployment automation

Key Responsibilities

Own the architecture and operation of the shared platform

Define, implement, and enforce platform engineering standards

Build and maintain Infrastructure as Code using Terraform or OpenTofu

Design and maintain event streaming infrastructure supporting real-time processing workloads

Ensure reliability, scalability, performance, and cost efficiency of platform services

Design, build, and maintain CI/CD pipelines using GitHub Actions

Own and maintain the observability platform using Grafana, Prometheus, Loki, CloudWatch, and related monitoring tools

Plan and execute platform integration and modernization initiatives

Technical Skills Required

AWS Terraform/OpenTofu GitHub Actions

Benefits & Perks

Competitive market salary

Fully remote work

Nice to Have

Experience building shared platform engineering capabilities supporting multiple products or business units

Familiarity with AI-assisted engineering workflows and infrastructure automation

Job Description

Our client is a rapidly growing enterprise software organization that acquires and scales B2B SaaS products. They are building a shared cloud platform that serves as the engineering foundation for a growing portfolio of enterprise applications. This platform provides standardized infrastructure, deployment, observability, automation, and reliability capabilities across multiple products while enabling future growth without proportionally increasing operational complexity.

The organization is investing in modern platform engineering practices, cloud-native technologies, Infrastructure as Code, AI-assisted engineering, and operational automation to build a scalable, highly reliable engineering ecosystem.

They are looking for an experienced Senior Platform & Site Reliability Engineer to take ownership of the shared platform, establish engineering standards, and design the infrastructure that supports multiple enterprise SaaS products. This is a hands-on technical leadership role where you will influence platform architecture, developer experience, operational reliability, and engineering best practices.

Working Hours: This role requires daily collaboration with a U.S.-based engineering team. Candidates must be available to work until at least 3:00 PM EST (U.S. Eastern Time), with flexibility to work beyond these hours when business needs require.

Responsibilities

Platform Engineering

Own the architecture and operation of the shared platform, including CI/CD, observability, deployment automation, secrets management, and developer tooling.
Define, implement, and enforce platform engineering standards across multiple products.
Build and maintain Infrastructure as Code using Terraform or OpenTofu, ensuring all infrastructure is version-controlled, reviewed, and provisioned through automation.
Develop self-service platform capabilities that enable engineering teams to deploy independently.

Event Streaming & Data Processing

Design and maintain event streaming infrastructure supporting real-time processing workloads.
Build and support batch processing infrastructure alongside live transactional systems.
Ensure reliability, scalability, performance, and cost efficiency of platform services.

CI/CD & Deployment

Design, build, and maintain CI/CD pipelines using GitHub Actions.
Automate recovery for common pipeline failures and improve deployment reliability.
Implement release management strategies, rollback mechanisms, and deployment patterns such as canary or blue-green deployments where appropriate.

Observability & Site Reliability

Own and maintain the observability platform using Grafana, Prometheus, Loki, CloudWatch, and related monitoring tools.
Define Service Level Objectives (SLOs), error budgets, and reliability metrics across multiple products.
Build intelligent alerting and monitoring solutions that provide actionable diagnostic information.
Design incident response processes, escalation procedures, and post-incident review practices.

Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Implement safe automated remediation for well-understood operational scenarios while ensuring human oversight for complex incidents.

Platform Expansion & Integration

Assess newly onboarded products for infrastructure maturity, Infrastructure as Code coverage, observability, and security.
Plan and execute platform integration and modernization initiatives while minimizing operational disruption.
Support the adoption of standardized platform capabilities across multiple engineering teams.

Engineering Automation

Leverage AI-assisted engineering tools and automation where appropriate to reduce operational overhead.
Automate infrastructure provisioning, CI/CD workflows, monitoring, secrets management, and operational tasks while maintaining engineering oversight for high-impact decisions.

Preferred Technology Stack

AWS
Terraform / OpenTofu
GitHub Actions
Grafana
Prometheus
Loki
AWS CloudWatch
AWS Secrets Manager or HashiCorp Vault
Amazon ECS and EKS
Event streaming technologies
Cost monitoring and cloud optimization tools

Requirements

8–12 years of experience in Platform Engineering, Site Reliability Engineering (SRE), DevOps, or Cloud Infrastructure Engineering.
Proven experience designing and operating production platform infrastructure across multiple environments or products.

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Strong hands-on experience with Terraform (or OpenTofu) and Infrastructure as Code.
Extensive experience designing and maintaining CI/CD pipelines using GitHub Actions.
Experience operating event streaming infrastructure in production environments.
Strong AWS expertise, including ECS, EKS, IAM, VPC, RDS, CloudWatch, networking, and cloud infrastructure.
Hands-on experience with Grafana, Prometheus, Loki, and enterprise observability platforms.
Strong understanding of SRE principles, including SLOs, error budgets, incident response, and operational excellence.
Experience designing scalable, secure, highly available cloud infrastructure.
Strong troubleshooting, automation, and problem-solving skills.
Excellent communication skills with the ability to establish engineering standards across multiple teams.

Nice to Have

Experience building shared platform engineering capabilities supporting multiple products or business units.
Experience integrating newly acquired products or modernizing legacy platforms.
Experience designing developer self-service platforms.
Familiarity with AI-assisted engineering workflows and infrastructure automation.
Experience supporting high-volume enterprise SaaS products and distributed systems.
Strong focus on cloud cost optimization and operational efficiency.

What We Offer

Competitive market salary.
Fully remote work.
Opportunity to build and shape the engineering platform supporting a growing portfolio of enterprise SaaS products.
Work alongside experienced international engineering teams.
Exposure to modern cloud technologies, AI-assisted engineering, automation, and large-scale platform initiatives.
Professional growth through ownership of platform architecture, operational reliability, and engineering standards.
Daily collaboration with a U.S.-based engineering team, with availability required until at least 3:00 PM EST and flexibility to work longer when needed.

Job Overview

Posted Date Jun 30, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location Portugal

Category Devops

Company genius match

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Senior Cloud Platform Engineer (Remote, Portugal)

Devops

•

5h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

TMC

Portugal

Senior AWS DevOps Engineer

Devops

•

1w ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

DataCareers

Portugal

Senior Cloud Engineer – AWS, CI/CD, Kubernetes, IaC

Devops

•

1w ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

WE ARE META

Portugal

Senior Platform & Site Reliability Engineer - Cloud Platform Leadership

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Senior Cloud Platform Engineer (Remote, Portugal)

Premium Job

TMC

Senior AWS DevOps Engineer

DataCareers

Senior Cloud Engineer – AWS, CI/CD, Kubernetes, IaC

WE ARE META

Subscribe our newsletter