Provide daily support, issue resolution, and system maintenance for high-performance computing (HPC) storage systems. Manage system health, coordinate with teams and HPE, and ensure the functionality of large-scale distributed storage. Collaborate with team members and customers on system issues and follow-up on assigned tasks.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
One of Myticas Consulting's direct clients is seeking a DAOS System Administrator for a 100% Remote contract position.
NOTE: W2 Only and Must be a US Citizen.
Job Description
Seeking an HPE DAOS (Distributed Asynchronous Object Storage) System Administrator for a confidential client, to provide daily support, issue resolution, and system maintenance for high-performance computing (HPC) storage systems. The role involves managing system health, coordinating with teams and HPE, performing diagnostics, and ensuring the functionality of large-scale distributed storage.
The Core Tasks May Include
Service delivery
- Log in to HPC system storage components and run commands and scripts to manage and maintain the system health and performance
- Review log files for errors and warnings indicating possible issues with system health and performance, and collaborate with customers and HPE on resolution
- Plan ahead and exercise appropriate caution and care when executing privilege-level commands
- Collaborate with team members using face-to-face, phone, and online communication including email, Teams, and Slack
- Support planning and scheduling of system software upgrades including estimation of time and effort required to complete
- The Subcontractor will provide three (3) months of DAOS SME oversight in addition to the the Subcontractor designated support TSE
- The Subcontractor designated TSE will have direct access to the DAOS SME as well as the development and testing team for ongoing elevated technical issues or concerns including product training
- Participate in regular team discussions of system issues and follow-up on assigned tasks
- Perform triage and diagnosis of issues to DAOS storage
- Prescribe HW actions to be taken for repairs
- Provide status to Site Team Lead and/or directly to customer
- Create new Cases (if necessary) and elevate Cases to Level 2/Level 3 support as needed difficult problem diagnostics or issue resolution
- Specific touchpoints and roles & responsibilities between ANL system administrators and HPE system administrators may vary according to ALCF needs and service level specific to DAOS.
- Professional development
- Complete required training when assigned on specialized storage operating systems in a timely manner.
- Maintain training and certification prescribed by the Subcontractor and/or ALCF.
- Monitor HPE technical communications (e.g. field notices) for awareness of potential issues that could be encountered on the system at their site.
- In addition to the Services defined throughout this Proposal, the Subcontractor shall coordinate activities of all HPE resources.
- Must have experience in operating high-performance storage systems.
- Must have experience with computer hardware.
- Must have experience with Linux administration, including command-line operations.
- Must be proficient in scripting languages for automation.
- Experience with HPC clusters is preferred.
- Experience in distributed storage systems, especially in large-scale environments, is required.