HPC Consultant (Scientist 2/3) | Los Alamos, NM | Los Alamos National Laboratory

HPC Consultant (Scientist 2/3)

What You Will Do

This position will be filled at either the Scientist 2 or Scientist 3 level, depending on the skills of the selected candidate. Additional job responsibilities (outlined below) will be assigned if the candidate is hired at the higher level.

The High Performance Computing (HPC) Division provides production high performance computing systems services to the Laboratory. Our work starts with the early phases of acquisition, development, and production readiness of HPC platforms, and continues through the maintenance and operation of these systems and the facilities in which they are housed. HPC Division also manages the network, parallel file systems, storage, and visualization infrastructure associated with the HPC platforms. The Division directly supports the Laboratory's HPC user base and aids, at multiple levels, in the effective use of HPC resources to generate science. Additionally, we support selected research activities that we deem important to our mission.

The HPC Environments group (HPC-ENV) has the main responsibility of managing how users interact with the HPC systems at LANL. There are several teams within the group that take responsibility for the broad range of HPC platforms, monitoring, data analytics and cybersecurity, programming and runtime environments, software, software engineering, procurements, application support and readiness, user support & services for a large and diverse customer base. We provide support and services to many production platforms at a world-class computing facility to ensure customers can accomplish their research and mission at extreme scale.

The HPC Consulting team provides direct support to HPC customers to enhance the productivity of the LANL HPC user community by providing quality technical support in a timely manner. We achieve this by providing customer focused, single point of contact for HPC systems, services, tools and technical support. The Team provides support for all production HPC platforms in various networks, including help for programming languages, debugging, parallel computing, HPC operating systems, utilities, libraries, file systems and interconnects, scripting, archival storage, desktop backup (TSM), file transfer, HPC network, and Tri-Lab support. The Division's goal is to create an effective HPC environment in which scientists can be as productive as possible. Additionally, we support selected research activities that we deem important to our mission.

To learn more visit https://www.lanl.gov/org/ddste/aldsc/hpc/index.php

Responsibilities include the following:
Scientist 2 ($96,100 - $159,000)

The successful candidate will perform the full spectrum of tasks, including but not limited to:

• Analyze existing configurations and scientific workloads, recommending and implementing changes to increase system efficiency

• Perform internal tool development related to scheduling and resource management

• Managing Slurm software including configuration, setup, and maintenance. Also communicating with Slurm developers about issues and bugs.

• Propose and implement solutions when presented with projects in our HPC environment

• Provide in-depth customer support as part of the Consulting and Workload Management Team within HPC-ENV to scientific users in the areas of job scheduling, programming languages, operating systems, storage, libraries, utilities, code performance and other facets of the HPC environment

• Participate in the weekly on-call rotations and support schedule by answering tickets, solving technical problems by telephone and email, and in person

• Interact with customers and HPC support teams

• Work independently and also interactively with other support team members

• Contribute to technical documentation, presentations, and/or giving tutorials in classroom, user group, or team situations

• Communicating and collaborating frequently with customers, other cross-Group and cross-Division teams as well as other HPC sites
Scientist 3 ($115,500 - $194,900)

In addition to the Job Requirements outlined above, qualification at the Scientist 3 level requires:
  • Lead technical efforts and projects in the area of user support and/or workload management
  • Develop technical documentation, presentations, and/or giving tutorials in classroom, user group, or team situations
  • Work with LANL staff and system vendors to optimize the performance of DOE applications on future HPC systems.
  • Work hand in hand with production system administration in determining the most difficult problems involving applications running on HPC systems.
  • Work with scheduling and resource management vendors and Trilab counterparts

What You Need

Minimum Job Requirements:

Linux Expertise

Linux knowledge and experiences, including usage and commands.

Programming Skills

Skilled in a high-level programming language such as C/ C++ or Fortran.

Scripting Skills

Demonstrated scripting experience in at least one: Bash, Perl, Python, or similar scripting language.

Strong interpersonal and Communication Skills

Including demonstrated ability to work within a team environment and with customers. Outstanding written and oral technical communication. Experience with technical writing and/or publishing papers. Strong interpersonal communication skills with the ability to work with groups of people of various levels of technical knowledge or understanding. Demonstrated experience working effectively under the pressure of frequent interruptions and conflicting priorities.

Additional Job Requirements for Scientist 3:

Leadership
  • Experience as the technical lead on small or large technical projects

Advanced HPC Experience
  • Demonstrated knowledge and experience with HPC environments, software, operating systems, parallel file systems, archives, job schedulers, and resource managers

Technical Training & Communication Skills
  • Demonstrated effective communication skills in classroom or team situations, such as making technical presentations, system documentation, user manuals, delivering HPC courses, or speaking as a representative for HPC teams/groups

Message Passing Experience
  • Familiarity with parallel processing, and parallel programming libraries, including message passing interface (MPI) and shared memory methodologies

Education/Experience at Scientist 2 level

Position requires a Bachelor' degree in a STEM field from an accredited college and university and 4 years of related experience, typically with experience at a university or National Lab or equivalent experience directly related to the occupation.

Education/Experience at Scientist 3 level

Position requires a Master's degree in a STEM field from an accredited college or university and 6 years of relevant experience or an equivalent combination of education and experience directly related to the occupation.

Desired Qualifications:

Customer Service
Recent experience in a customer service role, ticketing systems

HPC Debugging
Experience with tools and methods for optimization and debugging in a highly parallel environment.

DevOps & Software Development

Experience with DevOps and continuous integration (CI) tools.

Linux Provisioning and Configuration Management

Experience with automating Linux provisioning and configuration management such as Ansible, CFEngine, Puppet, etc.

DOE/NNSA Applications

Experience with DOE/NNSA Weapons codes.

Linux Virtual Machines and Linux Containers
Experience with virtual machines, Linux containers or related concepts.

HPC Computing Experience
Knowledge of High Performance Computing, their environments and supporting infrastructure. Knowledge of distributed systems, including system architectures, computer networks, software and multi-tenant. Experience with networking and file systems in an HPC environment, experience with parallel file systems (Lustre, GPFS, etc.), experience with archive solutions (HPSS, TSM, etc.), experience with data movement tools.

Machine Learning and/or Artificial Intelligence
Experience ML/AI programming or toolkits such as PyTorch, Tensorflow, SciPy, NLTK

Clearance

Active DOE "Q" clearance and/or SCI and experience in a classified computing environment.

This position will be filled at either the Scientist 2 or Scientist 3 level, depending on the skills of the selected candidate. Additional job responsibilities (outlined below) will be assigned if the candidate is hired at the higher level.

Location: This position will be located in Los Alamos, NM.

COVID Vaccine: The COVID vaccine is mandatory for all Laboratory employees, on-site contractors, and on-site subcontractors unless granted an accommodation under applicable state or federal law. This requirement will apply to those working on-site, those teleworking, and all new hires.

Position commitment: Regular appointment employees are required to serve a period of continuous service in their current position in order to be eligible to apply for posted jobs throughout the Laboratory. If an employee has not served the time required, they may only apply for Laboratory jobs with the documented approval of their Division Leader. The position commitment for this position is 1 year.

Note to Applicants: For consideration, applications should submit a cover letter addressing how their knowledge, skills and abilities meet the minimum requirements with a resume.
Where You Will Work

Located in beautiful northern New Mexico, Los Alamos National Laboratory (LANL) is a multidisciplinary research institution engaged in strategic science on behalf of national security. Our generous benefits package includes:
  • PPO or High Deductible medical insurance with the same large nationwide network
  • Dental and vision insurance
  • Free basic life and disability insurance
  • Paid childbirth and parental leave
  • Award-winning 401(k) (6% matching plus 3.5% annually)
  • Learning opportunities and tuition assistance
  • Flexible schedules and time off (paid sick, vacation, and holidays)
  • Onsite gyms and wellness programs
  • Extensive relocation packages (outside a 50 mile radius)
Additional Details

Directive 206.2 - Employment with Triad requires a favorable decision by NNSA indicating employee is suitable under NNSA Supplemental Directive 206.2. Please note that this requirement applies only to citizens of the United States. Foreign nationals are subject to a similar requirement under DOE Order 142.3A.

Clearance: Q(Position will be cleared to this level). Applicants selected will be subject to a Federal background investigation and must meet eligibility requirements* for access to classified matter. This position requires a Q clearance which requires US Citizenship except in extremely rare circumstances. Dependent upon position, additional authorization to access nuclear weapons information may be required that may or may not be available to dual citizens depending upon the circumstances.

*Eligibility requirements: To obtain a clearance, an individual must be at least 18 years of age; U.S. citizenship is required except in very limited circumstances. See DOE Order 472.2 for additional information.

New-Employment Drug Test: The Laboratory requires successful applicants to complete a new-employment drug test and maintains a substance abuse policy that includes random drug testing.

Regular position: Term status Laboratory employees applying for regular-status positions are converted to regular status.

Internal Applicants: Regular appointment employees who have served the required period of continuous service in their current position are eligible to apply for posted jobs throughout the Laboratory. If an employee has not served the required period of continuous service, they may only apply for Laboratory jobs with the documented approval of their Division Leader. Please refer to Policy Policy P701 for applicant eligibility requirements.

Equal Opportunity: Los Alamos National Laboratory is an equal opportunity employer and supports a diverse and inclusive workforce. All employment practices are based on qualification and merit, without regard to race, color, national origin, ancestry, religion, age, sex, gender identity, sexual orientation or preference, marital status or spousal affiliation, physical or mental disability, medical conditions, pregnancy, status as a protected veteran, genetic information, or citizenship within the limits imposed by federal laws and regulations. The Laboratory is also committed to making our workplace accessible to individuals with disabilities and will provide reasonable accommodations, upon request, for individuals to participate in the application and hiring process. To request such an accommodation, please send an email to applyhelp@lanl.gov or call 1-505-665-4444 option 1.Employment StatusFull Time