HPC Consultant & Workload Management Support (Scientist 2/3) | Los Alamos, NM | Los Alamos National Laboratory

HPC Consultant & Workload Management Support (Scientist 2/3)

What You Will Do

The High Performance Computing (HPC) Division provides production high performance computing systems services to the Laboratory. Our work starts with the early phases of acquisition, development, and production readiness of HPC platforms, and continues through the maintenance and operation of these systems and the facilities in which they are housed. HPC Division also manages the network, parallel file systems, storage, and visualization infrastructure associated with the HPC platforms. The Division directly supports the Laboratory's HPC user base and aids, at multiple levels, in the effective use of HPC resources to generate science. Additionally, we support selected research activities that we deem important to our mission.

This position focuses on 2 main areas: LANL HPC user consulting AND workload management support. This includes assisting in the resolution of issues encountered by HPC resource management and job scheduling on HPC production platforms as well as team projects. In addition, the HPC Consulting component involves direct user support, documentation, user training, and supporting critical computational simulations.

Los Alamos National Lab's High-Performance Computing (HPC) Division provides production high performance computing systems services to the Laboratory. The HPC Environments group (HPC-ENV) has the main responsibility of managing how users interact with the HPC systems at LANL. There are several teams within the group that take responsibility for the broad range of HPC platforms, monitoring, data analytics and cybersecurity, programming and runtime environments, software, software engineering, procurements, application support and readiness, user support & services for a large and diverse customer base. We provide support and services to many production platforms at a world-class computing facility to ensure customers can accomplish their research and mission at extreme scale. To learn more visit


This position will be filled at either the Scientist 2/Scientist 3 level, depending on the skills of the selected candidate. Additional job responsibilities (outlined below) will be assigned if the candidate is hired at the higher level.

Scientist 2 ($96,100 - $159,000)

The successful candidate will perform the full spectrum of tasks, including but not limited to:

• Analyze existing configurations and scientific workloads, recommending and implementing changes to increase system efficiency

• Perform internal tool development related to scheduling and resource management

• Managing Slurm software including configuration, setup, and maintenance. Also communicating with Slurm developers about issues and bugs.

• Propose and implement solutions when presented with projects in our HPC environment

• Provide in-depth customer support as part of the Consulting and Workload Management Team within HPC-ENV to scientific users in the areas of job scheduling, programming languages, operating systems, storage, libraries, utilities, code performance and other facets of the HPC environment

• Participate in the weekly on-call rotations and support schedule by answering tickets, solving technical problems by telephone and email, and in person

• Interact with customers and HPC support teams

• Work independently and also interactively with other support team members

• Contribute to technical documentation, presentations, and/or giving tutorials in classroom, user group, or team situations

• Communicating and collaborating frequently with customers, other cross-Group and cross-Division teams as well as other HPC sites

Scientist 3 ($115,500 - $194,900)

In addition to the Job Requirements outlined above, qualification at the Scientist 3 level requires:

• Lead technical efforts and projects in the area of user support and/or workload management

• Develop technical documentation, presentations, and/or giving tutorials in classroom, user group, or team situations

• Work with LANL staff and system vendors to optimize the performance of DOE applications on future HPC systems.

• Work hand in hand with production system administration in determining the most difficult problems involving applications running on HPC systems.

• Work with scheduling and resource management vendors and Trilab counterparts

What You Need
Minimum Job Requirements:

Linux Expertise

Strong Linux knowledge and expertise as an administrator. Broad knowledge of administration of production Linux computer systems, utilities, and tools, including experience building, configuring, and administering production Linux computer systems. Experience with multiple Linux distributions.

Programming Skills

Expertise in a high-level programming language such as C/ C++ or Fortran./ Experience Using the Message Passing Interface (MPI) or similar parallel programming models

Scripting Skills

Demonstrated scripting experience in Bash, Perl, Python, or similar scripting languages as well as experience with more advanced programming languages.

Strong interpersonal and Communication Skills

Including demonstrated ability to work within a team environment and with customers. Outstanding written and oral technical communication. Experience with technical writing and/or publishing papers. Strong interpersonal communication skills with the ability to work with groups of people of various levels of technical knowledge or understanding. Demonstrated experience working effectively under the pressure of frequent interruptions and conflicting priorities.

Technical Presentations and Communications

Advanced knowledge and proven ability in formulating and presenting results to technical audiences

Additional Job Requirements for Scientist 3:

In addition to the requirements outlined above, qualification at the higher level requires:

• Experience as the technical lead on small or large technical projects

• Demonstrated knowledge and experience with HPC environments, operating systems, parallel file systems, archives, job schedulers, and resource managers

• Demonstrated effective oral communication skills in classroom or team situations, such as making technical presentations, delivering HPC courses, or speaking as a representative for HPC teams/groups

• Demonstrated effective written communications skills, such as software or system documentation, user manuals, and/or software utility descriptions

• Familiarity with parallel processing, and parallel programming libraries, including message passing interface (MPI) and shared memory methodologies

• Knowledge of debuggers in a Linux environment

Production Computing Experience

Experience working in a production computing environment, preferably with HPC systems or at large scale. Working knowledge of networking concepts and practices. Knowledge of or experience with hardware and software security practices.

Education/Experience at lower level: Position requires a Bachelor' degree in a STEM field from an accredited college and university and 4 years of related experience, typically with post-doctoral research experience at a university or National Lab or equivalent experience directly related to the occupation.

Education/Experience at higher level: Position requires a Master's degree in a STEM field from an accredited college or university and 6 years of relevant experience or an equivalent combination of education and experience directly related to the occupation.

Desired Qualifications:

HPC Debugging
Experience with tools and methods for optimization and debugging in a highly parallel environment

Continuous Integration & Software Development

Experience with continuous integration tools such as Gitlab CI

MPI Experience

Experience with the internals of an MPI implementation or the internals of a similar parallel program model runtime

Linux Provisioning and Configuration Management

Experience with automating Linux provisioning and configuration management such as Ansible, CFEngine, Puppet, etc.

Linux Containers and Tools

Demonstrated experience with Linux containers, registries, and orchestration tools. (i.e. Docker, Podman, Quay, Gitlab, Kubernetes, etc.)

DOE/NNSA Applications

Experience with DOE/NNSA Weapons codes

Linux Virtual Machines and Linux Containers
Experience with virtual machines, Linux containers or related concepts

Knowledge of High Performance Computing, their environments and supporting infrastructure. Knowledge of distributed systems, including system architectures, computer networks, software and multi-tenant. Experience with networking and file systems in an HPC environment, experience with parallel file systems (Lustre, GPFS, etc.), experience with archive solutions (HPSS, TSM, etc.), experience with 0data movement tools.


Active DOE "Q" clearance and/or SCI and experience in a classified computing environment.

Location: This position will be located in Los Alamos with the potential for a hybrid work arrangement (partially onsite/partially offsite) from a location within 2 hours ground commute of this location. Reporting onsite will be periodically required. Hybrid is at the discretion of management and can change at any time with appropriate notice.

COVID Vaccine

The COVID vaccine is mandatory for all Laboratory employees, on-site contractors, and on-site subcontractors unless granted an accommodation under applicable state or federal law. This requirement will apply to those working on-site, those teleworking, and all new hires.

Position commitment: Regular appointment employees are required to serve a period of continuous service in their current position in order to be eligible to apply for posted jobs throughout the Laboratory. If an employee has not served the time required, they may only apply for Laboratory jobs with the documented approval of their Division Leader. The position commitment for this position is 1 year.

Note to Applicants:

For consideration, applications should submit a cover letter addressing how their knowledge, skills, and abilities meet the minimum requirements with a resume.
Where You Will Work

Located in beautiful northern New Mexico, Los Alamos National Laboratory (LANL) is a multidisciplinary research institution engaged in strategic science on behalf of national security. Our generous benefits package includes:

§ PPO or High Deductible medical insurance with the same large nationwide network

§ Dental and vision insurance

§ Free basic life and disability insurance

§ Paid childbirth and parental leave

§ Award-winning 401(k) (6% matching plus 3.5% annually)

§ Learning opportunities and tuition assistance

§ Flexible schedules and time off (paid sick, vacation, and holidays)

§ Onsite gyms and wellness programs

§ Extensive relocation packages (outside a 50 mile radius)
Additional Details

Directive 206.2 - Employment with Triad requires a favorable decision by NNSA indicating employee is suitable under NNSA Supplemental Directive 206.2. Please note that this requirement applies only to citizens of the United States. Foreign nationals are subject to a similar requirement under DOE Order 142.3A.

Clearance: Q(Position will be cleared to this level). Applicants selected will be subject to a Federal background investigation and must meet eligibility requirements* for access to classified matter. This position requires a Q clearance which requires US Citizenship except in extremely rare circumstances. Dependent upon position, additional authorization to access nuclear weapons information may be required that may or may not be available to dual citizens depending upon the circumstances.

*Eligibility requirements: To obtain a clearance, an individual must be at least 18 years of age; U.S. citizenship is required except in very limited circumstances. See DOE Order 472.2 for additional information.

New-Employment Drug Test: The Laboratory requires successful applicants to complete a new-employment drug test and maintains a substance abuse policy that includes random drug testing.

Regular position: Term status Laboratory employees applying for regular-status positions are converted to regular status.

Internal Applicants: Regular appointment employees who have served the required period of continuous service in their current position are eligible to apply for posted jobs throughout the Laboratory. If an employee has not served the required period of continuous service, they may only apply for Laboratory jobs with the documented approval of their Division Leader. Please refer to Policy Policy P701 for applicant eligibility requirements.
Equal Opportunity: Los Alamos National Laboratory is an equal opportunity employer and supports a diverse and inclusive workforce. All employment practices are based on qualification and merit, without regard to race, color, national origin, ancestry, religion, age, sex, gender identity, sexual orientation or preference, marital status or spousal affiliation, physical or mental disability, medical conditions, pregnancy, status as a protected veteran, genetic information, or citizenship within the limits imposed by federal laws and regulations. The Laboratory is also committed to making our workplace accessible to individuals with disabilities and will provide reasonable accommodations, upon request, for individuals to participate in the application and hiring process. To request such an accommodation, please send an email to applyhelp@lanl.gov or call 1-505-665-4444 option 1.Employment StatusFull Time