Los Alamos National Laboratory HPC Monitoring Team, Web App Developer (Scientist 2/3) in Los Alamos, New Mexico
What You Will Do
The High-Performance Computing Division (HPC) provides production high performance computing systems services to the Laboratory. The High Performance Computing Environments group (HPC-ENV) invites applicants for a position of Scientist 2 or 3 to join the Monitoring, Security and Data Analytics team and strengthen our HPC monitoring and user interface capabilities. We seek candidates who want to make significant contributions to our long-term efforts of supporting our HPC systems and Users. Team member duties include: System administration of RHEL servers; Building and maintaining end user web interfaces; Designing and developing web-based applications; Managing ticket tracking systems; Database design and management; Diagnosing, solving and implementing solutions for various system operational problems; Communicating and collaborating with other teams, groups and sites. The selected candidate will participate in a regularly scheduled rotation of on-call support of productions systems. In addition, some non-standard working hours may occasionally be required.
HPC-ENV has the main responsibility of managing how users interaction with the HPC systems at LANL. Some of the teams in this group include (1) Consulting and User Services, responsible for direct interaction and problem resolution with the users; (2) Parallel Runtimes and Environments, responsible for installing and maintaining the software and user environments on the HPC clusters; (3) Application Readiness, working to optimize user code for new HPC platforms and technologies; (4) Monitoring, Security and Data Analytics, responsible for collecting, analyzing and displaying HPC system information to administrators and users. Projects typically involve collaborations inside and outside of the Laboratory, in line with the Laboratories’ history of leadership in HPC.
The Monitoring, Security and Data Analytics team within HPC-ENV is responsible for monitoring everything within the HPC Datacenters, including Facilities, Clusters, File Systems, Networking and Support Servers. Splunk serves as or main analysis, display and alerting tool for administrators. Grafana backed by Elasticsearch and OpenTSDB are running on our dedicated Data Analytics System for our larger analysis and machine learning projects. The Monitoring team also maintains the web interfaces to assist users with their HPC needs. This includes hosting HPC documentation under a Web Content Management System for control and uniformity. Maintaining a PHP based account management system to allow users to request access to HPC resources. Managing an RT ticketing system for users to submit issue requests. Developing new web-based applications to feed monitoring data to the users.
This position will be filled at either the Scientist 2 or 3 level, depending on the skills of the selected candidate. Additional job responsibilities (outlined below) will be assigned if the candidate is hired at the higher level.
Scientist 2 ($87,800 - $144,800)
The successful candidate will perform the full spectrum of UNIX/Linux computing environment administration as well as Web and Database management, including but not limited to:
Assist in the setup, administration and maintenance of several RHEL servers using a configuration management system
Participate in periodic on-call responsibilities as assigned
Provide technical assistance in problem solving and day-to-day operation of various HPC support systems
Proactively examine our HPC environment and propose projects for enhancements
Scripting (e.g., in Bash, Perl, Python, or similar scripting languages) or programming
Web server management
Ability to use one or more Web Content Management System
Database management
Development projects with one or more databases
Develop web-based applications to display large datasets to users
Scientist 3 ($96,600 - $161,300)
In addition to the duties outlined above, the Scientist 3 will be required to:
Work as a technical leader to implement solutions to current problems and future deficiencies in our HPC environment in conjunction with junior and senior administrators and technical members of other HPC teams
Communicate the strategies and successes of HPC Division to national peers and participate in national strategic partnerships
Propose and implement solutions when presented with problems in our HPC environment
Design database solutions for our in-house needs
Integrate system and application monitoring data into end user displays
Develop transport methods for large quantities of monitoring data
What You Need
Minimum Job Requirements:
Strong interpersonal and communication skills
Demonstrated scripting (e.g., in Bash, Perl, Python, PHP or similar scripting languages) experience
Programming experience (e.g. C, C++, Java or similar languages)
Experience working in a production computing environment
Experience with using and managing one or more database systems
Experience in developing web-based applications
Experience with one or more Web Content Management Systems
Ability to write and present reports to peers and management
Knowledge of administration of production Linux computer systems, utilities, and tools, including experience building, configuring, and administering production Linux computer systems
Ability to mentor and lead individual junior team members and students
Additional Job Requirements for Scientist 3: In addition to the Job Requirements outlined above, qualification at the Scientist 3 level requires:
Broad knowledge of production system management topics, including networking, programming, file systems, operating systems, and configuration management, with depth in one or more areas
Experience leading and mentoring teams, students, or junior team members
Experience initiating, designing, and leading projects
Experience interacting with vendors and colleagues within the industry, including presenting technical results and practices to peers locally and at conferences
Expertise in one or more programming languages (e.g. C, C++, Java or other)
Knowledge of or experience with hardware and software security practices
Desired Skills:
Experience working in a production HPC environment
Experience diagnosing system software problems
Knowledge of one or more monitoring tools (Splunk, Ganglia, Grafana, etc.)
Experience configuring syslog
Experienced in scripting (e.g., in Bash, Perl, Python, or similar scripting languages) and programming
Experience with data collection and transport (syslog, IPMI, AMQP)
Experience with data storage and databases
Knowledge of several database systems
Experience hardening server for security
Experience building web-based user interfaces
Knowledge of one of more Web Content Management System (Drupal, etc.)
Experience working with and administrating a ticket tracking systems
Experience with multiple Linux distributions
Experience modifying Unix/Linux operating systems
Experience managing computers in a DOE or DOD classified environment
Active DOE Q Clearance
Education:
Scientist 2: Position typically requires a bachelor’s degree and a minimum of eight years’ related experience, or an equivalent combination of education and experience. At this level, applicable advanced vendor and/or professional certification is desirable.
Scientist 3: Position typically requires a bachelor’s degree and a minimum of twelve years related experience, or an equivalent combination of education and experience. At this level, advanced vendor and/or professional certifications are highly desirable and post graduate course work may be expected.
Additional Details:
Clearance: Q (Position will be cleared to this level). Applicants selected will be subject to a Federal background investigation and must meet eligibility requirements* for access to classified matter.
*Eligibility requirements: To obtain a clearance, an individual must be at least 18 years of age; U.S. citizenship is required except in very limited circumstances. See DOE Order 472.2 for additional information.
New-Employment Drug Test: The Laboratory requires successful applicants to complete a new-employment drug test and maintains a substance abuse policy that includes random drug testing.
Regular Position:Term status Laboratory employees applying for regular-status positions are converted to regular status.
Equal Opportunity:Los Alamos National Laboratory is an equal opportunity employer and supports a diverse and inclusive workforce. All employment practices are based on qualification and merit, without regards to race, color, national origin, ancestry, religion, age, sex, gender identity, sexual orientation or preference, marital status or spousal affiliation, physical or mental disability, medical conditions, pregnancy, status as a protected veteran, genetic information, or citizenship within the limits imposed by federal laws and regulations. The Laboratory is also committed to making our workplace accessible to individuals with disabilities and will provide reasonable accommodations, upon request, for individuals to participate in the application and hiring process. To request such an accommodation, please send an email to applyhelp@lanl.gov or call 1-505-665-4444 option 1.
Where You Will Work
Located in northern New Mexico, Los Alamos National Laboratory (LANL) is a multidisciplinary research institution engaged in strategic science on behalf of national security. LANL enhances national security by ensuring the safety and reliability of the U.S. nuclear stockpile, developing technologies to reduce threats from weapons of mass destruction, and solving problems related to energy, environment, infrastructure, health, and global security concerns.
Location: Los Alamos, NM, US
Contact Name: Doyle, Christine Louise
Organization Name: HPC-ENV/High Performance Computing Environments
Email: cdoyle@lanl.gov
Job Title: HPC Monitoring Team, Web App Developer (Scientist 2/3)
Appointment Type: Regular
Req ID: IRC64138