HPC Cluster Administrator (Scientist 2/3)
- Req. Number: IRC132879
- Organization : HPC-OPS/High Performance Computing Systems Group
- City, State: Los Alamos, New Mexico
- Recruiter Name: Wroblewski, Alex Christopher
- Recruiter Email: alexwrob@lanl.gov
Join the High Performance Computing Systems Group (HPC-OPS) in operating and maintaining some of the fastest supercomputers in the world for the betterment of our nation and the world. Designing, operating and maintaining these systems requires highly skilled personnel that specialize in both the hardware and software aspects of High Performance Computing. Innovators at heart, HPC-OPS cluster administrators work both independently and collaboratively across teams, to maintain capability and implement continuous capability improvements across a complex and heterogeneous computing environment.
The selected HPC Cluster Administrator (Scientist 2/3) will provide strategic design, testing, analysis, administration, configuration management, verification, and validation of both existing HPC systems and systems in development, including modifications and additions to systems, code, and methods, in support of LANL's HPC capability. HPC Cluster Administrators apply existing scientific principles, techniques, methods, and tools to both maintain production computing systems as well as diagnose root cause of system failures in collaboration with administrators of other HPC subsystems; bring up new hardware and test functionality; and document, design, and implement new ideas, technical innovations, and best practices. In addition, the selected candidate will have the opportunity to develop technical products such as documentation, presentations, technical papers, and reports, and to communicate findings internally or at conferences. Mentoring of students, junior staff, and peers in technical and professional growth activities is highly valued, as is maintaining state-of-the-art technical expertise and knowledge within HPC system administration and developing new skills in related disciplines. This is your chance to directly support our national security mission and continue to make LANL the best place to work as a member of a dynamic, team-oriented, and leading-edge technical capability team.
Position requires a skilled professional who has specialized experience with and broadly applies cluster computing system administration knowledge and best industry practices, including a full knowledge of a range of related disciplines, across a complex and heterogeneous computing environment in a professional setting to resolve diverse issues in creative, practical, and robust ways. The selected candidate will have the capacity to produce technical products, reports, documentation, presentations, and concept papers, as well as the ability to present findings at national technical meetings.
What You Need
Minimum Job Requirements:
Computer Scientist (Scientist 2: $96,100 - $159,000)
- Advanced Linux Administration Expertise: Demonstrated knowledge of administering production Linux computer systems, including strong command line Linux operating system skills, working knowledge of or experience with hardware and software security practices, and experience scripting in Bash, Perl, Python, or similar languages.
- Configuration Management Expertise: Demonstrated experience with configuration and automation tools and practices, such as Chef, Puppet, Ansible, Salt, or similar tools.
- Troubleshooting and Technical Analysis Acumen: Significant knowledge and demonstrated experience in formulating and testing hypotheses, investigating alternative solutions, and recommending solutions to technical problems.
- Computer Networking Expertise: Working knowledge of networking concepts and practices.
- Communication and Teaming Skills: Demonstrated effective communication skills, both verbal and written, including the ability to communicate technical information to both technical and non-technical personnel, to provide assistance and knowledge to peers, to collaborate with Group members, other HPC Group personnel and vendor representatives, as required, and to formulate and communicate technical results and findings to technical audiences and readerships (examples can include publications, team projects, and presentations).
- Troubleshooting skills: Demonstrated ability to troubleshoot hardware and software errors, prioritizing problems and assessing impact to stakeholders, documenting problems and solutions.
- Clearance: Ability to obtain a DOE Q-clearance (To obtain a clearance, an individual must be at least 18 years of age; U.S. citizenship is required except in very limited circumstances. See DOE Order 472.2 for additional information.
Computer Scientist (Scientist 3: $115,500 - $194,900)
- Container experience: Demonstrated experience and knowledge with containerization such as Kubernetes, Charliecloud, Docker, etc.
- Virtualization: Demonstrated experience and knowledge with virtualization and hypervisors.
- Computer Networking Expertise: High performance interconnects, preferably Mellanox InfiniBand or Omni-Path.
- Leadership: Demonstrated experience with project planning and management. Ability developing and leading complex projects, generating formal project plans, delegating tasks, and providing routine updates to management.
- HPC Experience: Demonstrated experience building, installation, and administration of HPC systems. Experience with modern image building and provisioning tools.
Education/Experience the lower level: Positions requires a Bachelor' degree in a STEM field from an accredited college and university and 4 years of related experience
Education/Experience the higher level: Position requires a Master's degree in a STEM field from an accredited college or university and 6 years of relevant experience or an equivalent combination of education and experience directly related to the occupation.
Desired Qualifications:
- Experience with Git, creating issues, branches, merge requests and using CI/CD pipelines
- Experience modifying Unix/Linux operating systems (e.g., enabling/disabling kernel modules).
- Practical experience with Splunk or other monitoring tools.
- Knowledge of or demonstrated experience with parallel and distributed storage systems; knowledge of file systems such as ZFS, EXT, XFS; working knowledge of file system structures and algorithms; and/or experience with Object storage and RESTful storage interfaces.
- Demonstrated ability to develop new methods, techniques, or approaches to address critical technical problems and/develop new technical capabilities.
- Knowledge with virtualization and containerization
- Ability to mentor and lead individual junior team members and students.
- Active DOE Q Clearance.
Work Location: The work location for this position is hybrid and is located in Los Alamos, NM. Hybrid is defined as working partially onsite/partially offsite but within 2 hours ground commute of this location. All work locations are at the discretion of management and can change at any time with appropriate notice.
Position commitment: Regular appointment employees are required to serve a period of continuous service in their current position in order to be eligible to apply for posted jobs throughout the Laboratory. If an employee has not served the time required, they may only apply for Laboratory jobs with the documented approval of their Division Leader. The position commitment for this position is 1 year.
Note to Applicants:
For full consideration, applicants should submit a resume, contact information, and a cover letter describing how they meet the required and desired qualifications for the position.
Where You Will Work
Located in beautiful northern New Mexico, Los Alamos National Laboratory (LANL) is a multidisciplinary research institution engaged in strategic science on behalf of national security. Our generous benefits package includes:
§ PPO or High Deductible medical insurance with the same large nationwide network
§ Dental and vision insurance
§ Free basic life and disability insurance
§ Paid childbirth and parental leave
§ Award-winning 401(k) (6% matching plus 3.5% annually)
§ Learning opportunities and tuition assistance
§ Flexible schedules and time off (PTO and holidays)
§ Onsite gyms and wellness programs
§ Extensive relocation packages (outside a 50 mile radius)
Additional Details
Directive 206.2 - Employment with Triad requires a favorable decision by NNSA indicating employee is suitable under NNSA Supplemental Directive 206.2. Please note that this requirement applies only to citizens of the United States. Foreign nationals are subject to a similar requirement under DOE Order 142.3A.
Clearance: Q (Position will be cleared to this level). Selected applicants will be subject to a background investigation conducted by or on behalf of the Federal Government, and must meet eligibility requirements* for access to classified matter. This position requires a Q clearance. and obtaining such clearance requires US Citizenship except in extremely rare circumstances. Dependent upon the position, additional authorization to access classified information may be required, which may or may not be available to dual citizens. Receipt of a Q clearance and additional access authorization ultimately is a decision of the Federal Government and not of Triad.
*Eligibility requirements: To obtain a clearance, an individual must be at least 18 years of age; U.S. citizenship is required except in very limited circumstances. See DOE Order 472.2 for additional information.
New-Employment Drug Test: The Laboratory requires successful applicants to complete a new-employment drug test and maintains a substance abuse policy that includes random drug testing. Although New Mexico and other states have legalized the use of marijuana, use and possession of marijuana remain illegal under federal law. A positive drug test for marijuana will result in termination of employment, even if the use was pre-offer.
Regular position: Term status Laboratory employees applying for regular-status positions are converted to regular status.
Internal Applicants: Regular appointment employees who have served the required period of continuous service in their current position are eligible to apply for posted jobs throughout the Laboratory. If an employee has not served the required period of continuous service, they may only apply for Laboratory jobs with the documented approval of their Division Leader. Please refer to Policy Policy P701 for applicant eligibility requirements.
Equal Opportunity: Los Alamos National Laboratory is an equal opportunity employer and supports a diverse and inclusive workforce. All employment practices are based on qualification and merit, without regard to race, color, national origin, ancestry, religion, age, sex, gender identity, sexual orientation, marital status or spousal affiliation, physical or mental disability, medical conditions, pregnancy, status as a protected veteran, genetic information, or citizenship within the limits imposed by federal laws and regulations. The Laboratory is also committed to making our workplace accessible to individuals with disabilities and will provide reasonable accommodations, upon request, for individuals to participate in the application and hiring process. To request such an accommodation, please send an email to applyhelp@lanl.gov or call 1-505-664-6947 option 2 and then option 3.