Job Description
Key Responsibilities:
- Design, build, and maintain reliable infrastructure across AWS and GCP.
- Develop and manage Terraform modules for infrastructure provisioning.
- Build Docker images and manage container orchestration pipelines.
- Write Go-based tooling to automate deployment and monitoring tasks.
- Implement CI/CD pipelines to push images for new machines and services.
- Monitor system performance and troubleshoot issues across environments.
- Collaborate with development and operations teams to ensure system reliability and scalability.
- Participate in on-call rotations and incident response.
Required Qualifications:
- 3+ years of experience in Site Reliability Engineering or DevOps.
- Strong hands-on experience with AWS and GCP cloud platforms.
- Proficiency in Terraform for infrastructure as code.
- Experience with Docker and container lifecycle management.
- Solid programming skills in Go (Golang).
- Familiarity with CI/CD tools (e.g., GitHub Actions, Jenkins, CircleCI).
- Strong understanding of Linux systems and networking fundamentals.
- Excellent communication and collaboration skills.
Preferred Qualifications:
- Experience with Kubernetes or other container orchestration platforms.
- Knowledge of observability tools (e.g., Prometheus, Grafana, Datadog).
- Prior experience in a remote-first or distributed team environment.