Los Alamos National Laboratory Computing Systems Professional 2/3/4 in Los Alamos, New Mexico

What You Will Do

The High Performance Computing (HPC) Division at Los Alamos National Laboratory provides scientific computing resources consisting of some of the largest HPC systems in the world, including a large (19K+ node) Cray system called Trinity, as well as numerous large commodity cluster systems. The High Performance Computing (HPC) Computer System Professional (CSP) Team within the HPC Systems Group (HPC-SYS) provides vanguard production monitoring, support, testing, and maintenance for existing systems and deployment support for future systems. The selected CSP will work closely with the HPC-SYS Infrastructure, Platforms, and Technical Operations Teams to support and maintain LANL’s world-class supercomputing capability.

This role requires strong communication skills, as well as comprehensive troubleshooting and analytical skills and the ability to communicate and collaborate with other groups, teams, and projects on a daily basis.

The selected candidate will participate in a regularly scheduled rotation of on-call support of production systems, including some systems under 7x24 hour support. In addition, some non-standard working hours may occasionally be required. This position is full-time and is located at Los Alamos National Laboratory in Los Alamos, New Mexico.

This position will be filled at one of the CSP-2, CSP-3 or CSP-4 levels, depending on the skills of the selected candidate. Additional job responsibilities (outlined below) will be assigned if the candidate is hired at the higher level.

Computing Systems Professional 2 (CSP-2) ($72,500 - $118,200)

The successful candidate will perform the full spectrum of Linux computing environment administration, including but not limited to:

  • Work under the supervision of HPC System administrators to provide technical assistance in problem solving, configuration management, and day‐to‐day operation of various computing systems.

  • Participate in daytime trouble ticket triage and resolution activities, as well as periodic on-call responsibilities.

  • Work with team members to make modifications and additions to existing systems, software, and methods.

  • Work with team to bring up new hardware and test functionality.

  • Actively grow HPC skill base and expertise across networking, data storage, system administration as part of HPC-SYS Triage Team

  • Steadily increase responsibilities as knowledge of our environment and HPC systems increases.

  • Actively participate in knowledge sharing and transfer efforts and activities within and across teams.

  • Work independently and interactively with other HPC administrators.

  • Participate in process improvement and resolution in coordination with administrators of other HPC subsystems.

  • Develop and publish updates on resolutions and communicate findings internally.

  • Propose and implement solutions when presented with projects in our HPC environment.

Computing Systems Professional 3 (CSP-3) ($87,800 - $144,800)

In addition to the duties outlined above, the CSP-3 will be required to:

  • Participate in process improvement, including deep multi-system problem isolation and resolution in collaboration with administrators of other HPC subsystems.

  • Work with team members to document, design, and implement new ideas and approaches for newer architectures and improve those for existing ones.

  • Present best practices and experience reports to technical leaders, managers and peers locally, as well as communicate the strategies and successes of HPC Division to other Laboratory Organizations

  • Work on complex issues where analysis of situations or data requires an in-depth evaluation of variable factors. Exercise sound judgement in selecting methods, techniques, and evaluation criteria to troubleshoot, diagnose root cause of system failures, and isolate the components/failure scenarios while working with internal stakeholders.

Computing Systems Professional 4 (CSP-4) ($96,600 - $161,300)

In addition to the duties outlined above, the CSP-4 will be required to:

  • Work on significant and unique issues where analysis of situations or data requires an evaluation of intangibles. Exercise independent judgement in methods, techniques, and evaluation criteria to achieve results.

  • Work closely with fellow HPC administrators as a leader and mentor to define and implement solutions on both tactical and strategic levels.

  • Work as a technical leader/subject matter expert to propose and implement solutions to current problems and future deficiencies in our HPC archive storage environment in conjunction with junior and senior administrators and technical staff within and across teams.

  • Examine our HPC infrastructure through testing and application of experiments and tooling to validate solutions and to detect and diagnose hardware health issues.

  • Interact and/or collaborate with people from other teams, groups, divisions, directorates, and programs to develop, implement, and/or communicate technical solutions.

  • Enhance technical and professional expertise of other staff and students through active mentoring and training activities.

  • Present best practices and technical results to peers internally and at conferences, workshops, and meetings, as well as participate in strategic partnerships.

What You Need

Minimum Job Requirements:

  • Strong interpersonal and communication skills

  • Demonstrated experience working in a team environment

  • Ability to work on multiple projects at a time

  • Intermediate knowledge of the Linux operating system

  • Demonstrated experience installing, configuring, maintaining and troubleshooting servers.

  • Demonstrated wide-ranging knowledge and experience of Linux system administration

  • Basic scripting experience in Bash or another Linux shell

  • Demonstrated experience in automating tasks using programming and scripting

  • Demonstrated experience working with authentication services such as LDAP

  • Ability to program in a compiled or interpretative language

  • Demonstrated experience maintaining various system services (Kerberos, NFS, SSH, Samba, etc.)

  • Experience communicating technical information to both technical and non-technical personnel

  • For consideration, along with a resume, applicants should submit a cover letter addressing how their knowledge, skills and abilities meet the minimum requirements.

Additional Job Requirements for CSP-3: In addition to the Job Requirements outlined above, qualification at the CSP-3 level requires:

  • Extensive knowledge of the Linux operating system

  • Demonstrated experience with centralized configuration management in a heterogeneous computing environment

  • Demonstrated experience leading and mentoring teams, students or junior team members

  • Demonstrated wide-ranging knowledge and experience of Linux system administration

  • Advanced scripting and programming experience, preferably with compiled languages Demonstrated ability to partner with customers.

  • Demonstrated ability to work closely and productively with both customers and suppliers to define expectations and mutual responsibilities.

  • Demonstrated ability to communicate technical strategy, accomplishments, and challenges to management team, as well as cross-organizationally.

Additional Job Requirements for CSP-4: In addition to the Job Requirements outlined above, qualification at the CSP-4 level requires:

  • Ability to leverage broad expertise or unique knowledge to contribute to development of technical objectives and principles as well as to achieve goals in creative and effective ways.

  • Broad demonstrated knowledge of production HPC system management topics, including Linux system administration, networking, programming, operating systems, configuration management, with depth in one or more areas.

  • Demonstrated programming experience including compiled languages and advanced scripting.

  • Demonstrated ability to initiate, design, and lead technical efforts.

  • Demonstrated ability to evaluate competing computing subsystem technologies.

  • Experience interacting with vendors and colleagues within the industry, including presenting technical results and practices to peers locally and at conferences.

Desired Skills:

  • Experience working with ticket tracking systems.

  • Experience working in a production HPC environment.

  • Experience with multiple Linux distributions.

  • Experience with configuring and managing storage and backup solutions.

  • Operational experience with Lustre, GPFS or other parallel file systems.

  • Experience integrating operational metrics into a monitoring system, such as Splunk or Zenoss.

  • Experience administering a NetApp-based storage appliance.

  • Experience with Tivoli Storage System backups (TSM).

  • Experience with Automated Cartridge System Library Software (ACSLS).

  • Experience managing diskless clients.

  • Experience building and maintaining large RAID disk arrays.

  • Basic understanding of Relational Databases and Database Design Methodologies.

  • Familiarity with configuration management software such as Cfengine, Chef, Puppet, Ansible, Salt, or similar configuration and automation tools and practices.

  • Experience configuring networks, network switches and systems.

  • Experience with configuring network firewalls.

  • Experience with revision control systems such as RCS, Subversion, or Git.

  • Experience with low-level system administration tools such as perf, strace, tcpdump, and vmstat.

  • Experience managing computers in a DOE or DOD classified environment.

  • Contribution to open source or non-work-related projects.

  • Knowledge of or experience with hardware and software security practices.

  • An active DOE Q clearance

  • An active SCI clearance

Education:

CSP-2: Position typically requires a bachelor’s degree and a minimum of four years of related experience, or an equivalent combination of education and experience. At this level, applicable advanced vendor and/or professional certification is desirable.

CSP-3: Position typically requires a bachelor’s degree and a minimum of eight years of related experience, or an equivalent combination of education and experience. At this level, applicable advanced vendor and/or professional certification is desirable.

CSP-4: Position typically requires a bachelor’s degree and a minimum of twelve years related experience, or an equivalent combination of education and experience. At this level, advanced vendor and/or professional certifications are highly desirable and postgraduate course work may be expected.

Notes to Applicants:For consideration, applicants should submit a cover letter addressing how their knowledge, skills and abilities meet the minimum requirements along with a resume.

Additional Details:

Clearance: Q(Position will be cleared to this level). Applicants selected will be subject to a Federal background investigation and must meet eligibility requirements* for access to classified matter.

*Eligibility requirements: To obtain a clearance, an individual must be at least 18 years of age; U.S. citizenship is required except in very limited circumstances. See DOE Order 472.2 for additional information.

New-Employment Drug Test: The Laboratory requires successful applicants to complete a new-employment drug test and maintains a substance abuse policy that includes random drug testing.

Regular position:Term status Laboratory employees applying for regular-status positions are converted to regular status.

Internal Applicants:Please refer to Laboratory policy P701 for applicant eligibility.

Equal Opportunity:Los Alamos National Laboratory is an equal opportunity employer and supports a diverse and inclusive workforce. All employment practices are based on qualification and merit, without regards to race, color, national origin, ancestry, religion, age, sex, gender identity, sexual orientation or preference, marital status or spousal affiliation, physical or mental disability, medical conditions, pregnancy, status as a protected veteran, genetic information, or citizenship within the limits imposed by federal laws and regulations. The Laboratory is also committed to making our workplace accessible to individuals with disabilities and will provide reasonable accommodations, upon request, for individuals to participate in the application and hiring process. To request such an accommodation, please send an email to applyhelp@lanl.gov or call 1-505-665-4444 option 1.

Where You Will Work

Located in northern New Mexico, Los Alamos National Laboratory (LANL) is a multidisciplinary research institution engaged in strategic science on behalf of national security. LANL enhances national security by ensuring the safety and reliability of the U.S. nuclear stockpile, developing technologies to reduce threats from weapons of mass destruction, and solving problems related to energy, environment, infrastructure, health, and global security concerns.

The High Performance Computing (HPC) Division provides production high performance computing systems services to the Laboratory. HPC Division serves all Laboratory programs requiring a world-class high performance computing capability to enable solutions to complex problems of strategic national interest. Our work starts with the early phases of acquisition, development, and production readiness of HPC platforms, and continues through the maintenance and operation of these systems and the facilities in which they are housed. HPC Division also manages the network, parallel file systems, storage, and visualization infrastructure associated with the HPC platforms. The Division directly supports the Laboratory’s HP user base and aids, at multiple levels, in the effective use of HPC resources to generate science. Additionally, we engage in research activities that we deem important to our mission.

Work/Life Balance

Our diverse workforce enjoys a collegial work environment focused on creative problem solving, where everyone’s opinions and ideas are valued. We are committed to work-life balance, as well as both personal and professional growth. We consider our creative and dedicated scientific professionals to be our greatest assets, and we take pride in cultivating their talents, supporting their efforts, and enabling their successes. We provide mentoring to help new staff build a solid technical and professional foundation, and to smoothly integrate into the culture of LANL.

Compensation and Benefits include:

  • Multiple options for work schedules

  • Exercise facility free for staff use

  • Choice of comprehensive medical plans

  • Paid sick time and disability insurance

  • 401k (100% match up to 6% + kicker)

  • Fully vested in 401k on day one

  • Relocation Assistance (if needed)

Los Alamos, New Mexico enjoys excellent weather, clean air, and outstanding public schools. This is a safe, low-crime, family-oriented community with frequent concerts and events as well as quick travel to many top ski resorts, scenic hiking & biking trails, and mountain climbing. The short drive to work includes stunning views of rugged canyons and mesas as well as the Sangre de Cristo mountains. Many employees choose to live in the nearby state capital, Santa Fe, which is known for world-class restaurants, art galleries, and opera.

Location: Los Alamos, NM, US

Organization Name: HPC-SYS/ HPC Systems

Job Title: Computing Systems Professional 2/3/4

Appointment Type: Regular

Req ID: IRC65779