Los Alamos National Laboratory HPC Monitoring Team, Web App Developer (Scientist 2/3) in Los Alamos, New Mexico

What You Will Do

The High-Performance Computing Division (HPC) provides production high performance computing systems services to the Laboratory. The High Performance Computing Environments group (HPC-ENV) invites applicants for a position of Scientist 2 or 3 to join the Monitoring, Security and Data Analytics team and strengthen our HPC monitoring and user interface capabilities. We seek candidates who want to make significant contributions to our long-term efforts of supporting our HPC systems and Users. Team member duties include: System administration of RHEL servers; Building and maintaining end user web interfaces; Designing and developing web-based applications; Managing ticket tracking systems; Database design and management; Diagnosing, solving and implementing solutions for various system operational problems; Communicating and collaborating with other teams, groups and sites. The selected candidate will participate in a regularly scheduled rotation of on-call support of productions systems. In addition, some non-standard working hours may occasionally be required.

HPC-ENV has the main responsibility of managing how users interaction with the HPC systems at LANL. Some of the teams in this group include (1) Consulting and User Services, responsible for direct interaction and problem resolution with the users; (2) Parallel Runtimes and Environments, responsible for installing and maintaining the software and user environments on the HPC clusters; (3) Application Readiness, working to optimize user code for new HPC platforms and technologies; (4) Monitoring, Security and Data Analytics, responsible for collecting, analyzing and displaying HPC system information to administrators and users. Projects typically involve collaborations inside and outside of the Laboratory, in line with the Laboratories’ history of leadership in HPC.

The Monitoring, Security and Data Analytics team within HPC-ENV is responsible for monitoring everything within the HPC Datacenters, including Facilities, Clusters, File Systems, Networking and Support Servers. Splunk serves as or main analysis, display and alerting tool for administrators. Grafana backed by Elasticsearch and OpenTSDB are running on our dedicated Data Analytics System for our larger analysis and machine learning projects. The Monitoring team also maintains the web interfaces to assist users with their HPC needs. This includes hosting HPC documentation under a Web Content Management System for control and uniformity. Maintaining a PHP based account management system to allow users to request access to HPC resources. Managing an RT ticketing system for users to submit issue requests. Developing new web-based applications to feed monitoring data to the users.

This position will be filled at either the Scientist 2 or 3 level, depending on the skills of the selected candidate. Additional job responsibilities (outlined below) will be assigned if the candidate is hired at the higher level.

Scientist 2 ($87,800 - $144,800)

The successful candidate will perform the full spectrum of UNIX/Linux computing environment administration as well as Web and Database management, including but not limited to:

  • Assist in the setup, administration and maintenance of several RHEL servers using a configuration management system

  • Participate in periodic on-call responsibilities as assigned

  • Provide technical assistance in problem solving and day-to-day operation of various HPC support systems

  • Proactively examine our HPC environment and propose projects for enhancements

  • Scripting (e.g., in Bash, Perl, Python, or similar scripting languages) or programming

  • Web server management

  • Ability to use one or more Web Content Management System

  • Database management

  • Development projects with one or more databases

  • Develop web-based applications to display large datasets to users

Scientist 3 ($96,600 - $161,300)

In addition to the duties outlined above, the Scientist 3 will be required to:

  • Work as a technical leader to implement solutions to current problems and future deficiencies in our HPC environment in conjunction with junior and senior administrators and technical members of other HPC teams

  • Communicate the strategies and successes of HPC Division to national peers and participate in national strategic partnerships

  • Propose and implement solutions when presented with problems in our HPC environment

  • Design database solutions for our in-house needs

  • Integrate system and application monitoring data into end user displays

  • Develop transport methods for large quantities of monitoring data

What You Need

Minimum Job Requirements:

  • Strong interpersonal and communication skills

  • Demonstrated scripting (e.g., in Bash, Perl, Python, PHP or similar scripting languages) experience

  • Programming experience (e.g. C, C++, Java or similar languages)

  • Experience working in a production computing environment

  • Experience with using and managing one or more database systems

  • Experience in developing web-based applications

  • Experience with one or more Web Content Management Systems

  • Ability to write and present reports to peers and management

  • Knowledge of administration of production Linux computer systems, utilities, and tools, including experience building, configuring, and administering production Linux computer systems

  • Ability to mentor and lead individual junior team members and students

Additional Job Requirements for Scientist 3: In addition to the Job Requirements outlined above, qualification at the Scientist 3 level requires:

  • Broad knowledge of production system management topics, including networking, programming, file systems, operating systems, and configuration management, with depth in one or more areas

  • Experience leading and mentoring teams, students, or junior team members

  • Experience initiating, designing, and leading projects

  • Experience interacting with vendors and colleagues within the industry, including presenting technical results and practices to peers locally and at conferences

  • Expertise in one or more programming languages (e.g. C, C++, Java or other)

  • Knowledge of or experience with hardware and software security practices

Desired Skills:

  • Experience working in a production HPC environment

  • Experience diagnosing system software problems

  • Knowledge of one or more monitoring tools (Splunk, Ganglia, Grafana, etc.)

  • Experience configuring syslog

  • Experienced in scripting (e.g., in Bash, Perl, Python, or similar scripting languages) and programming

  • Experience with data collection and transport (syslog, IPMI, AMQP)

  • Experience with data storage and databases

  • Knowledge of several database systems

  • Experience hardening server for security

  • Experience building web-based user interfaces

  • Knowledge of one of more Web Content Management System (Drupal, etc.)

  • Experience working with and administrating a ticket tracking systems

  • Experience with multiple Linux distributions

  • Experience modifying Unix/Linux operating systems

  • Experience managing computers in a DOE or DOD classified environment

  • Active DOE Q Clearance

Education:

Scientist 2: Position typically requires a bachelor’s degree and a minimum of eight years’ related experience, or an equivalent combination of education and experience. At this level, applicable advanced vendor and/or professional certification is desirable.

Scientist 3: Position typically requires a bachelor’s degree and a minimum of twelve years related experience, or an equivalent combination of education and experience. At this level, advanced vendor and/or professional certifications are highly desirable and post graduate course work may be expected.

Additional Details:

Clearance: Q (Position will be cleared to this level). Applicants selected will be subject to a Federal background investigation and must meet eligibility requirements* for access to classified matter.

*Eligibility requirements: To obtain a clearance, an individual must be at least 18 years of age; U.S. citizenship is required except in very limited circumstances. See DOE Order 472.2 for additional information.

New-Employment Drug Test: The Laboratory requires successful applicants to complete a new-employment drug test and maintains a substance abuse policy that includes random drug testing.

Regular Position:Term status Laboratory employees applying for regular-status positions are converted to regular status.

Equal Opportunity:Los Alamos National Laboratory is an equal opportunity employer and supports a diverse and inclusive workforce. All employment practices are based on qualification and merit, without regards to race, color, national origin, ancestry, religion, age, sex, gender identity, sexual orientation or preference, marital status or spousal affiliation, physical or mental disability, medical conditions, pregnancy, status as a protected veteran, genetic information, or citizenship within the limits imposed by federal laws and regulations. The Laboratory is also committed to making our workplace accessible to individuals with disabilities and will provide reasonable accommodations, upon request, for individuals to participate in the application and hiring process. To request such an accommodation, please send an email to applyhelp@lanl.gov or call 1-505-665-4444 option 1.

Where You Will Work

Located in northern New Mexico, Los Alamos National Laboratory (LANL) is a multidisciplinary research institution engaged in strategic science on behalf of national security. LANL enhances national security by ensuring the safety and reliability of the U.S. nuclear stockpile, developing technologies to reduce threats from weapons of mass destruction, and solving problems related to energy, environment, infrastructure, health, and global security concerns.

Location: Los Alamos, NM, US

Contact Name: Doyle, Christine Louise

Organization Name: HPC-ENV/High Performance Computing Environments

Email: cdoyle@lanl.gov

Job Title: HPC Monitoring Team, Web App Developer (Scientist 2/3)

Appointment Type: Regular

Req ID: IRC64138