Senior Cloud Infrastructure Engineer - Kubernetes & Networking Focus

Pragmatike Armenia
Remote
Apply
AI Summary

Lead Kubernetes and Linux-based infrastructure operations, design network architecture, and drive reliability improvements in a fully remote EU timezone role.

Key Highlights
Operate and maintain Linux-based infrastructure (Debian/Ubuntu)
Deploy, manage, and scale Kubernetes clusters across diverse environments
Design and maintain complex network architectures
Implement automation for provisioning and operations using Ansible, Bash/Python, and GitOps workflows
Key Responsibilities
Operate and maintain Linux-based infrastructure (Debian/Ubuntu)
Deploy, manage, and scale Kubernetes clusters across bare-metal, virtualized, and on-prem environments
Oversee full cluster lifecycle: upgrades, node pools, networking, storage, and security hardening
Design and maintain networking architecture including VLANs, L2/L3 routing, VPNs, and multi-site connectivity
Implement automation for provisioning and operations using Ansible, Bash/Python, and GitOps workflows
Deploy and maintain observability stacks (Prometheus/Grafana, Loki, ELK, Graylog)
Lead incident response and escalation activities across the platform
Improve system availability and reduce latency at all levels
Define and implement SLOs/SLIs at multiple infrastructure levels
Optimize alerting and monitoring pipelines to provide actionable insights
Establish and maintain on-call schedules to ensure coverage across timezones
Develop Standard Operating Procedures (SOPs) for repeatable operations and maintenance tasks
Coordinate physical maintenance for Policlouds
Manage virtualization and orchestration layers (OpenStack, Proxmox, VMware)
Help develop and maintain overall architecture across all products
Plan resources for future initiatives, accounting for demand and growth projections
Work with development teams to improve overall quality and optimize resource utilization
Collaborate with cross-functional stakeholders (Hivenet, Policloud, Customer Success teams)
Technical Skills Required
Kubernetes Linux Systems Administration (Debian/Ubuntu) Network Engineering (VLANs, L2/L3 routing, VPNs, multi-site connectivity)
Benefits & Perks
100% remote work with flexible hours
High-impact role with autonomy and ownership
Nice to Have
Experience with service mesh (Istio, Linkerd) or advanced CNI implementations
Knowledge of Cloudflare APIs, DNS automation, or tunnel configurations
Experience with GPU infrastructure, node preparation, or resource scheduling
Familiarity with security best practices (RBAC, firewalls, network policies)
Exposure to IT asset management or license tracking workflows
Experience working in multi-timezone environments and coordinating across distributed teams
Background establishing reliability practices and SRE frameworks in growing organizations

Job Description


Job Description

Location: Fully remote EU timezone (CET ±2h)

Start date: ASAP

Languages: Fluent English is mandatory

Industry: Cloud Computing

We are hiring at Pragmatike to expand our team and drive the growth of our internal projects.

Our focus is on developing cutting-edge solutions in Cloud Computing, while fostering a culture of collaboration and innovation. Joining us means being part of a passionate team where your ideas and skills directly contribute to shaping tomorrows technologies.

If you're excited about working on ambitious projects in a dynamic and flexible environment, we'd love to hear from you!

Responsibilities

  • Operate and maintain Linux-based infrastructure (Debian/Ubuntu).
  • Deploy, manage, and scale Kubernetes clusters across bare-metal, virtualized, and on-prem environments.
  • Oversee full cluster lifecycle: upgrades, node pools, networking, storage, and security hardening.
  • Implement automation for provisioning and operations using Ansible, Bash/Python, and GitOps workflows.
  • Design and maintain networking architecture including VLANs, L2/L3 routing, VPNs, and multi-site connectivity.
  • Build automated deployment workflows (PXE boot, Preseed, cloud-init).
  • Deploy and maintain observability stacks (Prometheus/Grafana, Loki, ELK, Graylog).
  • Lead incident response and escalation activities across the platform.
  • Improve system availability and reduce latency at all levels.
  • Define and implement SLOs/SLIs at multiple infrastructure levels (physical network/hardware, platform virtualization, software services).
  • Optimize alerting and monitoring pipelines to provide actionable insights.
  • Establish and maintain on-call schedules to ensure coverage across timezones.
  • Develop Standard Operating Procedures (SOPs) for repeatable operations and maintenance tasks.
  • Coordinate physical maintenance for Policlouds (periodic maintenance, hardware issues, DC-Ops).
  • Manage virtualization and orchestration layers (OpenStack, Proxmox, VMware).
  • Help develop and maintain overall architecture across all products.
  • Plan resources for future initiatives, accounting for demand and growth projections.
  • Work with development teams to improve overall quality and optimize resource utilization.
  • Collaborate with cross-functional stakeholders (Hivenet, Policloud, Customer Success teams).

Requirements

  • Expert-level, hands-on experience operating Kubernetes in production environments.
  • Strong network engineering skills (VLANs, L2/L3 routing, VPNs, multi-site connectivity) - this is essential for the role.
  • Strong proficiency with Linux systems administration (Debian/Ubuntu).
  • Solid understanding of networking fundamentals and ability to design complex network architectures.
  • Experience building and maintaining automation workflows (Ansible, Bash/Python, Git-based).
  • Experience with observability stacks such as Prometheus, Grafana, ELK, Loki, or Graylog.
  • Background with virtualization technologies (OpenStack, Proxmox, VMware).
  • Experience with bare-metal provisioning and MAAS (Metal as a Service).
  • Strong understanding of distributed systems and container orchestration.
  • Process-oriented mindset with ability to develop SOPs and operational procedures from scratch.
  • Experience with incident response, escalation procedures, and on-call rotations.
  • Ability to work autonomously in a fast-paced, engineering-driven environment.
  • Strong technical skills combined with alignment to team values.

Nice To Have

  • Experience with service mesh (Istio, Linkerd) or advanced CNI implementations.
  • Knowledge of Cloudflare APIs, DNS automation, or tunnel configurations.
  • Experience with GPU infrastructure, node preparation, or resource scheduling.
  • Familiarity with security best practices (RBAC, firewalls, network policies).
  • Exposure to IT asset management or license tracking workflows.
  • Experience working in multi-timezone environments and coordinating across distributed teams.
  • Background establishing reliability practices and SRE frameworks in growing organizations.

Why Join Us:

  • 100% remote work with flexible hours
  • High-impact role with autonomy and ownership
  • Collaborative and international engineering team
  • Cutting-edge tech stack with strong focus on reliability and automation.

Similar Jobs

Explore other opportunities that match your interests

Tech Support Specialist II

Networking
1h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

Modern Family Law

United State
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

CommandLink

Philippines

Server Support Engineer

Networking
3h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Adit

India

Subscribe our newsletter

New Things Will Always Update Regularly