Senior Site Reliability Engineer | San Jose, CA OR Seattle, WA
Job Description
The e-commerce industry has seen tremendous growth in recent years and has become a hotly contested space amongst leading Internet companies, and its future growth cannot be underestimated. With millions of loyal users globally, this firm is an ideal platform to deliver a brand new and better e-commerce experience to our users. Their product engineering team is responsible for building an e-commerce ecosystem that is innovative, secure and intuitive for our users. They are looking for passionate and talented people to join us as we drive the future of e-commerce.
- This is an ON-SITE position and offers the opportunity to work within their San Jose, CA or Seattle, WA offices.
- This firm offers potential relocation assistance and has the potential to sponsor VISA's as well.
Responsibilities
- Be part of global SRE on-call rotation and be responsible for Tier-1 online incident response and DevOps support.
- Be responsible for service levels of mission critical, revenue-generating E-commerce platform as well as all supporting infrastructure and services. This role will focus on service reliability, highly-scalable design, and release management in a cloud-native environment.
- Define service level indicators and data-driven objectives, and develop devops / SRE standards, processes and methodologies, to uphold and improve uptime, latency, and system health of a core global e-commerce production platform.
- Collaborate cross-team with engineering and product to ensure that key stability and maintainability requirements, such as capacity planning and launch reviews, are performed to enable transparent service delivery to customers.
- Design strategies for risk detection and mitigation, disaster recovery & simulation, release management, cost optimization, engineering quality etc...
- Automation geared towards infrastructure-as-code, scalability and service resiliency.
- Implement best practices around incident management, post-mortems while being part of on-call rotations.
Qualifications
- Bachelor's or higher degree in Computer Science, similar technical field of study, or equivalent practical experience.
- 5+ years experience developing, provisioning or maintaining production-grade large scaled distributed systems.
- High level of proficiency in Linux OS internals, networking, microservices, databases, caches, in cloud-native environments.
- Demonstrable familiarity with programming or scripting languages (Go/Python/Bash/C++ etc).
- Demonstrable experience in the development and implementation of devops and SRE methodologies.
- Experience in designing, analyzing, and troubleshooting large-scale distributed systems.
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
Please do not hesitate to apply!