Los Alamos National Laboratory HPC Scientific Cloud Architect (Scientist 2/3/4) in Los Alamos, New Mexico

What You Will Do_

The High Performance Computing (HPC) Division at Los Alamos National Laboratory provides scientific computing resources consisting of some of the largest HPC systems in the world. The Futures team within the HPC Division is responsible for developing, deploying, and maintaining non-standard and advanced scientific computing resources. These resources include an ethernet-connect hybrid cloud/HPC platform and a special purpose HPC cluster used to develop advanced technologies such as high performance computing Linux containers.

This position will be filled at the Scientist 2, 3, or 4 level as dictated by current Programmatic needs and the skills of the selected candidate. Job responsibilities will be assigned in accordance with the level at which the selected candidate is hired.

You will be working closely with other Futures team members as well as team members in other groups in the HPC division and other divisions around the Laboratory. The use of cloud technologies to perform scientific work at the Laboratory is an emerging area of interest. You will be helping to develop this capability within the HPC division. You will also be working closely with scientists across the Laboratory to help them adopt cloud technologies where appropriate and also to develop new cloud technologies to meet new scientific workload requirements.

We seek candidates who want to make significant contributions that impact the LANL scientific mission and capabilities and ultimately across the DOE and nation.

Scientist 2 ($87,800 - $144,800)

The successful candidate will be required to:

  • Participate in the day-to-day system administration of resources managed by the Futures team.

  • Participate in a rotating on-call schedule (5 days/week, 8:00 – 5:00) along with other members of the Futures team.

  • Work both independently and collaboratively with other members of the team or group, after receiving initial direction and requirements from technical project leads.

  • Apply and interpret, on a broad basis, existing scientific principles, techniques, methods, and tools to troubleshoot, diagnose root cause of failures, and isolate components/failure scenarios which working with stakeholders.

  • Contribute to the design, testing, analysis, verification, and validation of scientific cloud systems.

  • Work with the team to bring up new hardware and test functionality.

  • Mentor students, junior staff, and peers in technical and professional growth activities.

  • Maintain state-of-the-art technical expertise and knowledge within scientific cloud and related disciplines.

Scientist 3 ($96,600 - $161,300)

In addition to the duties mentioned above, the Scientist 3 will be required to:

  • Architect, procure, provision, deploy, and maintain a production-level on-premise cloud system and train others in its operation and maintenance. Areas of interest include automated provisioning, configuration management, integration with production HPC clusters, filesystems, and services.

  • Work with scientists at the Laboratory to identify existing workloads and develop future workloads that are well suited for running in cloud environments. This includes not only an on-premise cloud but also external commercial and research clouds.

  • Set direction, goals, milestones, and deliverables for project tasks and establish associated scope, schedule and budgets. Assist in the preparation of progress reports to sponsors.

  • Will be the Principal Investigator for a targeted area of research.

  • Present results of work locally and at conferences and workshops.

  • Provide support to system admin staff and help desk staff on various cloud systems, when required by user requests, bugs, or security vulnerabilities.

  • Enhance technical and professional expertise of other staff through active mentoring and training.

  • Contribute to multi-lab and cross organizational proposals for funding both internally and externally to the laboratory.

  • Contribute to peer review of the work of others across organizations or disciplines within the Laboratory.

Scientist 4 ($116,900 - $197,000)

In addition to the duties mentioned above, the Scientist 4 will be required to:

  • Contribute to peer review of the work of others across organizations and disciplines nationally, including participation on HPC and cloud-related conference and workshop committees.

  • Participate in national review boards for DOE in subject area of expertise.

  • Acquire internal/external funding for self and others via responses to competitive requests for proposals and developed collaborations.

  • Work closely with high level project leads and program managers to insure their projects are successful.

  • Assist in defining specifications for new cloud technologies and the writing of RFPs.

What You Need_

Minimum Job Requirements:

  • Strong interpersonal, written, and oral communication skills.

  • Demonstrated ability to work within a team environment.

  • Strong command line Linux operating skills.

  • Demonstrated experience with and broad knowledge of administration of production Linux computer systems, utilities, and tools, including experience building, configuring, and administering production Linux computing systems.

  • Demonstrated scripting (e.g., Bash, Perl, Python, or similar scripting languages) and programming experience.

  • Demonstrated experience with the administration of cloud computing systems.

  • Knowledge of or experience with hardware and software security practices.

  • Ability to mentor and lead individual junior team members and students.

  • For consideration, applicants must submit a cover letter addressing how their knowledge, skills, and abilities meet the minimum requirements along with a resume.

In addition to the Job Requirements outlined above, qualification at the Scientist 3 level requires:

  • Demonstrated record of accomplishment and expertise in scientific cloud architectures.

  • A record of technical leadership in hardware and software activities within a cloud environment. This includes software-defined networking, cloud storage, virtual machine development and deployment, or experience with container technologies such as Kubernetes and Docker.

  • Knowledge and experience with HPC system production support and use.

  • Knowledge and experience with configuration management technologies such as Ansible, CFEngine, or Puppet.

  • Practical experience in programming such as shell scripts, Perl, Python, C/C++.

  • Record of maintaining state-of-the-art technical experience and knowledge within discipline and development of new skills in related disciplines.

  • Technical accomplishments within a team environment under time constraints.

  • Working knowledge of HPC concepts and practices in the areas of high performance system interconnects, parallel filesystems, resource management, and job scheduling.

I n addition to the Job Requirements outlined above, qualification at the Scientist 4 level requires:

  • Extensive experience and advanced knowledge of cloud technologies such as OpenStack, VMware, Ceph, Kubernetes, and Docker.

  • Practical experience and advanced knowledge of high performance system interconnects, parallel filesystems, and compute accelerators (GPU, FPGA, etc.).

  • Demonstrate senior technical leadership that brings various organizations, teams/individuals together with a common goal to create an efficient, cost effective performance-based solution to a particular problem/need.

  • Demonstrate capability of understanding the complete picture of an end-to-end solution for large complex systems. This includes facilities, archive, storage, networks, clusters, and clouds.

  • Exhibited knowledge and experience in working with equipment vendors on specifications and requirements of large-scale scientific system procurements.

  • Demonstrated industry leadership and expertise in the area of scientific cloud computing.

  • Demonstrated ability to initiate large-scale projects to solve technology challenges.

Desired Skills:

  • Demonstrated in-depth experience with OpenStack.

  • Experience with cloud-based scientific computing include HPC-in-the-cloud and science gateways.

  • Experience with high throughput computing and data-intensive workloads.

  • Practical experience in programming using Python, C, or C++.

  • Practical experience with high performance interconnects such as InfiniBand and/or Intel OmniPath.

  • Practical experience with accelerators such as GPUs.

  • Practical experience deploying software defined networks.

  • Practical experience with object storage such as Ceph.

  • Experience in anticipating needs for hardware and software environments.

  • Extensive experience in Linux with complete understanding of configuration files, modifying kernel parameters and building a new kernel, and automated installations.

  • Ability to create reliable/repeatable procedures for production use.

  • Practical experience and advanced knowledge of ethernet switches, routing, TCP/IP, configuration of NICs and routers, VXLAN, and overlay networks.

  • Record of settings direction and goals for yourself and other staff.

  • Demonstrated experience in leading multi-person projects to meet scope, schedule, and budget.

  • Demonstrated experience in formulating and presenting results to technical audiences and readerships.

  • Experience managing computers in a DOE or DOD classified environment.

  • Active DOE Q clearance.

Education: Typical educational requirement is a Bachelor’s, Master’s, or Doctorate degree in a science or engineering field from an accredited college or university and a minimum of five years experience in the HPC and/or cloud computing fields, or an equivalent combination of education and experience.

Additional Details:

Clearance: Q (Position will be cleared to this level). Applicants selected will be subject to a Federal background investigation and must meet eligibility requirements* for access to classified matter.

*Eligibility requirements: To obtain a clearance, an individual must be at least 18 years of age; U.S. citizenship is required except in very limited circumstances. See DOE Order 472.2 for additional information.

New-Employment Drug Test: The Laboratory requires successful applicants to complete a new-employment drug test and maintains a substance abuse policy that includes random drug testing.

Regular position:Term status Laboratory employees applying for regular-status positions are converted to regular status.

Internal Applicants:Please refer to Laboratory policy P701 for applicant eligibility.

Equal Opportunity:Los Alamos National Laboratory is an equal opportunity employer and supports a diverse and inclusive workforce. All employment practices are based on qualification and merit, without regards to race, color, national origin, ancestry, religion, age, sex, gender identity, sexual orientation or preference, marital status or spousal affiliation, physical or mental disability, medical conditions, pregnancy, status as a protected veteran, genetic information, or citizenship within the limits imposed by federal laws and regulations. The Laboratory is also committed to making our workplace accessible to individuals with disabilities and will provide reasonable accommodations, upon request, for individuals to participate in the application and hiring process. To request such an accommodation, please send an email to applyhelp@lanl.gov or call 1-505-665-4444 option 1.

Where You Will Work_

Located in northern New Mexico, Los Alamos National Laboratory (LANL) is a multidisciplinary research institution engaged in strategic science on behalf of national security. LANL enhances national security by ensuring the safety and reliability of the U.S. nuclear stockpile, developing technologies to reduce threats from weapons of mass destruction, and solving problems related to energy, environment, infrastructure, health, and global security concerns.

Location: Los Alamos, NM, US

Contact Name: Doyle, Christine Louise

Organization Name: HPC-DES/HPC Design

Email: cdoyle@lanl.gov

Job Title: HPC Scientific Cloud Architect (Scientist 2/3/4)

Appointment Type: Regular

Req ID: IRC68995