Los Alamos National Laboratory HPC Network Administrator - Scientist 3 in Los Alamos, New Mexico
Vacancy Name: IRC78092
Job Title HPC Network Administrator - Scientist 3
Location Los Alamos, NM, US
Organization Name HPC-SYS/Infrastructure Team/ High Speed Networking
Minimum Salary 98900
Maximum Salary 165100
What You Will Do
The High Performance Computing Division at Los Alamos National Laboratory is responsible for operating and maintaining some of the fastest supercomputers in the world. These supercomputers are tasked with solving complex problems in various scientific disciplines for the betterment of our nation and the world. Designing, operating and maintaining these systems requires highly skilled personnel that specialize in both the hardware and software aspects of High Performance Computing. High speed networks and interconnects are one of the key areas of specialization needed to create these systems. Networking specialists at Los Alamos National Laboratory design, build and maintain some of the largest, fastest, and most secure networks in the world.
Innovators and builders at heart, the HPC Networking Team is seeking its next dynamic team member to help define, deploy, maintain, evaluate, and develop our existing and future high speed networking production environments. LANL seeks highly motivated, productive, and inquisitive candidates who are comfortable working independently as well as part of a team in analyzing, specifying, and integrating network technologies that perform at scale and managing them in a large-scale production environment. We are interested in applicants who can help span the gap between current production requirements and new technology deployments to deliver next generation production networking capability.
This position is full-time and is located at Los Alamos National Laboratory in Los Alamos, New Mexico.
Scientist 3 ($98,900 - $165,100)
Participate in debugging and system maintenance, management, and improvement activities, including deep multi‐system problem isolation and resolution, often in collaboration with administrators of other HPC subsystems.
Work with researchers, production team members, and vendors to research, document, design, and implement new ideas, operating procedures, and approaches for both existing and future network infrastructures.
Present best practices, experience reports, and/or research results to managers and to peers locally or at conferences.
Work as a technical leader/subject matter expert to propose and implement solutions to current problems and deficiencies in our HPC networking environment. This will be done in collaboration with junior and senior administrators and technical staff within and across other organizations.
Proactively create experiments and software tests to validate solutions and to detect and diagnose network and hardware health issues.
Analyze published research papers in the area of networking and high-speed interconnects, summarize, and share implications and connections to ongoing work with team members.
What You Need
Minimum Job Requirements:
Strong interpersonal and communication skills and demonstrated ability to work effectively within a team environment, as well as demonstrated ability to initiate, design, and lead projects, including the ability to mentor and lead individual junior team members and students.
Demonstrated advanced experience with RDMA networks or other high-speed network technologies (such as InfiniBand, Omni-Path, etc.), including administration, management and monitoring of fabrics and various fabric topologies. Demonstrated experience evaluating, building, configuring and managing high-speed networks and interconnects.
Demonstrated experience with Ethernet TCP/IP layer 2 and layer 3 networking, including VLAN configuration, administration, and management, as well as demonstrated experience configuring and managing Ethernet switch and routing hardware.
Significant knowledge of building, configuring, and administering production Linux computer/support systems and network devices, including strong command line Linux operating system skills, working knowledge of or experience with hardware and software security best practices, and experience scripting in Bash, Perl, Python, or similar languages.
Significant knowledge and demonstrated experience in formulating and testing hypotheses, investigating alternative solutions, and recommending solutions to technical problems, including demonstrated ability to formulate and present ideas and results to technical audiences and readerships (examples can include publications, team projects, presentations).
Demonstrated experience administratoring large scale, multi-hop networks across various datacenters and infrastructures as well as optimizing networks for best bandwidth and latencies.
Demonstrated knowledge of production HPC system management topics, including cluster administration, programming, file systems, operating systems, and configuration management, with depth in one or more areas.
Demonstrated leadership and/or subject matter expert roles in the industry.
Practical experience with network security firewalls (such as Juniper or other firewall systems) and network security tools.
Familiarity with Cfengine, Chef, Puppet, Ansible, Salt, or similar configuration and automation tools and practices.
Experience with revision control systems such as Git, Subversion, or RCS.
Experience with parallel file systems (such as Lustre, GPFS, etc.) and/or NFS.
Experience with advanced monitoring tools such as Splunk/Grafana as well as network monitorings tools flow analysis and statistics such as sFlow, Bro, etc.
Experience with test bed and virtual environments to deploy and test infrastructure before production (including validation tests and procedures to ensure consistency into production)
Experience with SDN and/or White box networking solutions, including P4 and other programable network devices (such as FPGAs, etc.)
Ability to analyze published research papers in the area of high-speed networks and interconnects, summarize research results, and share implications and connections to ongoing work with team members. Ability to present technical papers and/or technical work to peers locally and nationally at conferences and meetings.
Experience managing computers in a DOE or DOD classified environment.
Note to Applicants: For consideration, applicants should submit a cover letter addressing how their knowledge, skills and abilities meet the minimum requirements along with a resume.
Education: Minimum of a B.S. degree in Computer Science or a related field from an accredited college or university, or equivalent combination of relevant education and/or experience.
Clearance: Q (Position will be cleared to this level). Applicants selected will be subject to a Federal background investigation and must meet eligibility requirements
for access to classified matter.
Eligibility requirements: To obtain a clearance, an individual must be at least 18 years of age; U.S. citizenship is required except in very limited circumstances. See DOE Order 472.2 for additional information.
New-Employment Drug Test: The Laboratory requires successful applicants to complete a new-employment drug test and maintains a substance abuse policy that includes random drug testing.
Regular position: Term status Laboratory employees applying for regular-status positions are converted to regular status.
Internal Applicants: Please refer to Laboratory policy P701 for applicant eligibility.
Equal Opportunity: Los Alamos National Laboratory is an equal opportunity employer and supports a diverse and inclusive workforce. All employment practices are based on qualification and merit, without regards to race, color, national origin, ancestry, religion, age, sex, gender identity, sexual orientation or preference, marital status or spousal affiliation, physical or mental disability, medical conditions, pregnancy, status as a protected veteran, genetic information, or citizenship within the limits imposed by federal laws and regulations. The Laboratory is also committed to making our workplace accessible to individuals with disabilities and will provide reasonable accommodations, upon request, for individuals to participate in the application and hiring process. To request such an accommodation, please send an email to firstname.lastname@example.org or call 1-505-665-4444 option 1.
Where You Will Work
Located in northern New Mexico, Los Alamos National Laboratory (LANL) is a multidisciplinary research institution engaged in strategic science on behalf of national security.
The High Performance Computing (HPC) Division provides production high performance computing systems services to the Laboratory. HPC Division serves all Laboratory programs requiring a world-class high performance computing capability to enable solutions to complex problems of strategic national interest.
Los Alamos, New Mexico enjoys excellent weather, clean air, and outstanding public schools. This is a safe, low-crime, family-oriented community with frequent concerts and events as well as quick travel to many top ski resorts, scenic hiking & biking trails, and mountain climbing. Many employees choose to live in the nearby state capital, Santa Fe, which is known for world-class restaurants, art galleries, and opera.
Appointment Type Regular
Contact Name Hughes, Jeremy Matthew
Req ID: IRC78092