Milestone-Proposal:Linux-based Supercomputing

To see comments, or add a comment to this discussion, click here.

Docket #:2024-24

This proposal has been submitted for review.

To the proposer’s knowledge, is this achievement subject to litigation? No

Is the achievement you are proposing more than 25 years old? Yes

Is the achievement you are proposing within IEEE’s designated fields as defined by IEEE Bylaw I-104.11, namely: Engineering, Computer Sciences and Information Technology, Physical Sciences, Biological and Medical Sciences, Mathematics, Technical Communications, Education, Management, and Law and Policy. Yes

Did the achievement provide a meaningful benefit for humanity? Yes

Was it of at least regional importance? Yes

Has an IEEE Organizational Unit agreed to pay for the milestone plaque(s)? Yes

Has the IEEE Section(s) in which the plaque(s) will be located agreed to arrange the dedication ceremony? Yes

Has the IEEE Section in which the milestone is located agreed to take responsibility for the plaque after it is dedicated? Yes

Has the owner of the site agreed to have it designated as an IEEE Milestone? Yes

Year or range of years in which the achievement occurred:

1998

Title of the proposed milestone:

The Linux-based Roadrunner Supercomputer, 1998-1999

Plaque citation summarizing the achievement and its significance; if personal name(s) are included, such name(s) must follow the achievement itself in the citation wording: Text absolutely limited by plaque dimensions to 70 words; 60 is preferable for aesthetic reasons.

Roadrunner, one of the first supercomputers based on the Linux operating system and commercial off-the-shelf parts, demonstrated the value of using high-performance system-area networks in Linux-based high-performance computing clusters. Developed by David A. Bader at the University of New Mexico in 1998-1999, its cost-effective design, computational performance, use of open-source code, and connection to NCSA’s National Technology Grid enabled important new scientific and industrial applications and influenced later work.

200-250 word abstract describing the significance of the technical achievement being proposed, the person(s) involved, historical context, humanitarian and social impact, as well as any possible controversies the advocate might need to review.

Since the 1960s, high performance computing (also called supercomputing or HPC) has contributed to scientific discoveries, engineering advances, artificial intelligence, and business innovation. Supercomputers have helped make cars and planes safer and more fuel efficient, enabled more accurate prediction of severe storms, contributed to oil and gas discoveries and to advancing renewable energy, and become a mainstream tool in sectors ranging from financial services to medicine and healthcare to entertainment. But it wasn’t until the 1990s that supercomputing became available beyond the confines of government labs and top-tier research universities.

Research scientist David Bader, then a faculty member at the University of New Mexico, had been experimenting with commercial off-the-shelf (COTS) supercomputers since he was a doctoral student at the University of Maryland in the mid 1990s, as a solution that could be easier and cheaper to implement than traditional supercomputers. Bader continued that work when he came to UNM in 1998, when he received funding from National Science Foundation’s National Computational Science Alliance to support the development of one of the first Linux-based supercomputers. Called “Roadrunner,” the system was the first Linux supercomputer provided to the general research community, offering the high-speed interconnections and low latency needed for high performance on a broad set of applications.

This approach quickly changed HPC; in 2024 Linux-based systems are the foundation for 98% of HPC systems sold. Roadrunner opened up HPC to new communities of users and energized an international open source community dedicated to software and hardware development. In 2022, Hyperion Research found that over the last 25 years, Linux-based HPC contributed to the development of products worth more than $100 trillion and to countless research discoveries. Most recently, Linux-based HPC helped scientists understand and address COVID-19.

IEEE technical societies and technical councils within whose fields of interest the Milestone proposal resides.

IEEE Computer Society, IEEE Computer Society Technical Committee on Parallel Processing, IEEE Computer Society Technical Community on High Performance Computing

In what IEEE section(s) does it reside?

IEEE Albuquerque Section (New Mexico)

IEEE Organizational Unit(s) which have agreed to sponsor the Milestone:

IEEE Organizational Unit(s) paying for milestone plaque(s):

Unit: IEEE Albuquerque Section
Senior Officer Name: Lee Rashkin

IEEE Organizational Unit(s) arranging the dedication ceremony:

Unit: IEEE Albuquerque Section
Senior Officer Name: Lee Rashkin

IEEE section(s) monitoring the plaque(s):

IEEE Section: IEEE Albuquerque Section
IEEE Section Chair name: Lee Rashkin

Milestone proposer(s):

Proposer name: David A. Bader
Proposer email: Proposer's email masked to public

Please note: your email address and contact information will be masked on the website for privacy reasons. Only IEEE History Center Staff will be able to view the email address.

Street address(es) and GPS coordinates in decimal form of the intended milestone plaque site(s):

Electrical and Computing Engineering Building University of New Mexico 498 Terrace St NE Albuquerque, NM 87106

GPS: 35.08398839726963, -106.62219865632657

Describe briefly the intended site(s) of the milestone plaque(s). The intended site(s) must have a direct connection with the achievement (e.g. where developed, invented, tested, demonstrated, installed, or operated, etc.). A museum where a device or example of the technology is displayed, or the university where the inventor studied, are not, in themselves, sufficient connection for a milestone plaque.

Please give the details of the mounting, i.e. on the outside of the building, in the ground floor entrance hall, on a plinth on the grounds, etc. If visitors to the plaque site will need to go through security, or make an appointment, please give the contact information visitors will need. The IEEE Milestone plaque will be placed at The University of New Mexico, in its Electrical and Computer Engineering Building (UNM Building 46), outside of Room 211, one of the department's computer labs. This placement will help to ensure good public viewing opportunity by any interested audience.

Are the original buildings extant?

Yes

Details of the plaque mounting:

The University of New Mexico, School of Engineering, Department of Electrical and Computer Engineering (ECE) agrees to host the proposed IEEE Milestone plaque commemorating Linux-based Supercomputing and to permit the plaque to be installed on an existing wall on the second floor of the ECE Building (UNM Building 46) near room 211, one of the department computer labs. This placement will help to ensure good public viewing opportunity by an interested audience.

How is the site protected/secured, and in what ways is it accessible to the public?

The University of New Mexico is a public university, and its Electrical and Computing Engineering Building is open to the public during normal business hours (8am to 5pm, Monday through Friday, and closed on major holidays). No appointment is needed for visitors to see the plaque. The front door of the Electrical and Computer Engineering Building is open to visitors who can proceed directly to the planned location of the IEEE Milestone plaque. CCTV cameras provide security for the building and plaque.

Who is the present owner of the site(s)?

The University of New Mexico

What is the historical significance of the work (its technological, scientific, or social importance)? If personal names are included in citation, include detailed support at the end of this section preceded by "Justification for Inclusion of Name(s)". (see section 6 of Milestone Guidelines)

Please note that the acronym COTS appears in the Milestone citation and the supporting information, and that the "C" is used to indicate either "commercial," "commodity," or "consumer." As there is no consequential difference in their usage herein, all three interpretations are correct and consistent with each other.

Justification of Name(s) in the Citation

The inclusion of David A. Bader's name in the citation is justified by several key factors:

Pioneering Technical Achievement: Multiple sources confirm that Bader was the first to successfully develop a Linux-based supercomputer using commercial off-the-shelf parts and high-speed, low-latency interconnection networks. This is verified by the IEEE Computer Society International Workshop on Cluster Computing peer-reviewed conference paper (Reference 1), the IEEE Annals of History of Computing article (Reference 2), the Computer History Museum's Timeline of Computer History (Reference 3), and the induction into the Innovation Hall of Fame of University of Maryland's A. James Clark School of Engineering (Reference 6).
Independent Historical Recognition: Larry Smarr, a prominent figure in computing, specifically credits Bader for this "historic event," noting that it was "David's creative energies and innovation" that made it possible to build the first supercomputer with commercial off-the-shelf parts for the National Technology Grid (Reference 4).
Lasting Impact: The significance of Bader's work is validated by leading experts in the field:
- Satoshi Matsuoka, director of RIKEN Center for Computational Science, credits Bader with expanding "the realm of supercomputing from narrow sets of technical computing to be the leading edge of mainstream computing" (Reference 5)
- Steve Wallach, a Seymour Cray Award recipient, attributes the Linux foundation of all Top500 List supercomputers to "Bader's technical contributions and leadership" (Reference 5)
Contemporary Documentation: The Albuquerque Journal (Reference 8) documented Bader's work at the time it occurred in 1999, providing contemporary verification of his role in developing Roadrunner.
Economic Impact: Hyperion Research specifically identifies Bader's pioneering efforts in the mid-1990s as key to transforming the HPC market, leading to Linux becoming "the foundation for 98% of all HPC systems sold" (Reference 7).

This extensive documentation from multiple independent sources clearly establishes Bader's central role in this achievement, making his inclusion in the citation both appropriate and necessary for historical accuracy. The following distinguished honors recognize Bader's contribution to high-performance computing:

2022 IEEE Computer Society President Bill Gropp (on left) presents David Bader (on right) with the Sidney Fernbach Award at SC21
2021 IEEE CS Sidney Fernbach Award: Recipient of one of computing's highest honors "for the development of Linux-based massively parallel production computers and for pioneering contributions to scalable discrete parallel algorithms for real-world applications." (PDF)

Dr. David Bader 2022 inductee of University of Maryland's Innovation Hall of Fame
2022 Innovation Hall of Fame inductee, University of Maryland's A. James Clark School of Engineering "for his leadership in computer engineering, including the first Linux supercomputer, using consumer off-the-shelf parts." (PDF)

2025 Mimms Museum of Technology and Art's Hall of Fame (formerly the Computer Museum of America) "Dr. David Bader revolutionized High-Performance Computing (HPC) supercomputing technology. Through his pioneering work in designing the first commodity-based supercomputer, Bader transformed the industry by drastically reducing costs while maintaining performance. His innovations paved the way for Linux-based supercomputers, now the global standard, generating over $100 trillion in economic impact. His contributions to hardware, software and algorithms have reshaped modern computing infrastructure, earning him the IEEE Sidney Fernbach Award and a place in the University of Maryland’s Innovation Hall of Fame." (PDF)

Computer History Museum: Timeline of Computer History (1998) Linux-based Supercomputing: See Reference 3.

Annotated Citation

A line-by-line elaboration of the citation tells a bit more of what can't be said in 70 words.

Roadrunner, one of the first supercomputers based up the Linux operating system and commercial off-the-shelf parts,

In 1998-1999, Roadrunner pioneered a transformative approach to supercomputing by combining Linux with commodity hardware, demonstrating that this combination could deliver genuine supercomputing capabilities. This represented a fundamental departure from traditional proprietary supercomputers that dominated the industry at that time. Also, while personal computer (PC) clustering projects such as University of California, Berkeley's Network-of-Workstations (NOW) and NASA's Beowulf had emerged using commodity Ethernet, Roadrunner's strategic integration of commercial off-the-shelf components with high-speed, low-latency interconnection networks established a new architectural paradigm that would ultimately define modern supercomputing systems. For a more comprehensive account of this groundbreaking work and its lasting impact on the field of high-performance computing, readers can refer to D. A. Bader, "Linux and Supercomputing: How My Passion for Building COTS Systems Led to an HPC Revolution," in IEEE Annals of the History of Computing, vol. 43, no. 3, pp. 73-80, 1 July-Sept. 2021, doi: 10.1109/MAHC.2021.3101415. (PDF)

demonstrated the value of incorporating high-performance system-area networks in Linux-based high-performance computing clusters.

Roadrunner's incorporation of a high-performance COTS network into a Linux-based supercomputer represented a pivotal architectural innovation that distinguished it from other Linux clusters of the era. Unlike Beowulf systems that were limited to commodity Ethernet due to strict vendor-neutral requirements, Roadrunner recognized the critical importance of interconnect performance for communication-intensive scientific applications. This strategic design choice enabled Roadrunner to efficiently handle tightly-coupled parallel workloads that would struggle on standard Ethernet networks. The three-network architecture (control, data, and diagnostics) provided the foundation for both performance and reliability advantages that would later become standard in high-performance computing. This approach established a blueprint that bridged the gap between academic cluster computing and production-grade supercomputing environments.

During this same period, the Beowulf project at NASA had followed a different approach to Linux-based cluster computing. Developed by Thomas Sterling and Donald Becker in 1994, Beowulf clusters employed a strict mass-market commodity off-the-shelf (M²COTS) philosophy, relying exclusively on commodity Ethernet and rejecting specialized networks. While this approach gained significant publicity as an accessible, single-user workstation model for parallel computing, it faced inherent limitations in communication-intensive applications that required the high-performance interconnects Roadrunner embraced.

Los Alamos National Laboratory's Avalon system, deployed from June to November 1998, represented another early Linux effort. Built with 140 DEC Alpha processors, Avalon achieved recognition for its computationally intensive capabilities while still adhering to commodity networking principles. Unlike Roadrunner, Avalon was designed primarily for specific computational tasks rather than as a general multi-user supercomputing environment, highlighting the contrasting approaches to early Linux supercomputing. It should be noted that Bader built a 40 processor DEC Alpha cluster with low-latency and high-bandwidth interconnection network using a Digital Gigaswitch/ATM years earlier in January 1995 at University of Maryland, College Park. This DEC Alpha cluster was acquired through grants from the National Science Foundation, Digital Equipment Corp., and the Keck Foundation.

Developed by David A. Bader at the University of New Mexico in 1998-1999,

Bader's work on Roadrunner began when he moved to the University of New Mexico in January 1998. By spring 1998, he had built a working Intel/Linux supercomputer prototype using eight dual 333 MHz Intel Pentium II nodes. The full Roadrunner system entered production as part of the National Science Foundation's National Computational Science Alliance (NCSA) National Technology Grid in April 1999 with 64 dual-processor nodes (128 processors total).

its cost-effective design, computational performance, use of open-source code, and connection to NCSA’s National Technology Grid enabled important new scientific and industrial applications and influenced later work.

While traditional supercomputers from vendors like Cray, SGI, and IBM typically cost $5-30 million, Roadrunner's cost-effective approach (approximately $400,000) democratized access to high-performance computing. Bader engineered Roadrunner to overcome the limitations of other Linux clusters, integrating the first high-speed, low-latency COTS interconnection network for Intel/Linux systems. This breakthrough enabled the system to handle the complete range of scientific computing workloads that were previously exclusive to expensive proprietary machines. Its performance on benchmarks demonstrated near-perfect scalability, outperforming many contemporary systems. The open-source foundation allowed for flexibility and customization that proprietary systems couldn't match. This achievement had lasting implications, as Linux-based systems came to dominate the supercomputing landscape, powering scientific discovery across numerous fields and applications.

Following Roadrunner's full-scale deployment in April 1999, Bader quickly expanded on the architectural approach with several companies including VA Linux Systems and IBM. Black Bear, developed in September 1999, was a collaboration between the University of New Mexico and VA Linux Systems. This second-generation Linux supercomputer featured 16 dual-processor nodes with Intel Pentium III 550MHz processors (32 processors total). Black Bear maintained Roadrunner's hybrid networking approach, combining commodity Ethernet with Myricom's Myrinet for high-performance communications.

Building on this success, Bader led the development of another Alliance Linux-based supercomputer, LosLobos, a more ambitious collaboration with IBM that became IBM's first production Linux system. Completed in March 2000, LosLobos featured 256 dual-processor, Intel-based IBM Netfinity servers with Myrinet connections (512 processors total) capable of 375 Gigaflops, premiering on the Top500 list at number 80 in November 2000. These systems enabled a wide range of scientific applications, including computational physics, climate modeling, molecular dynamics, bioinformatics, and fluid dynamics simulations.

The direct knowledge transfer from these pioneering systems prompted IBM to announce its first commercial Linux clusters (the IBM eServer Cluster 1300) in February 2001, bringing Roadrunner's architectural principles to enterprise computing. This watershed moment sparked widespread industry adoption of Linux-based supercomputing with high-performance interconnects. The transformation was comprehensive: by 2018, all of the world's Top500 supercomputers incorporated the architectural paradigm established by Bader's early work, with Linux serving as the foundation for modern high-performance computing. This remarkable evolution demonstrates how Roadrunner's innovative integration of commodity hardware with specialized networking technology fundamentally reshaped the supercomputing landscape within two decades.

Historical Significance

Since the 1960s, high performance computing (also called supercomputing or HPC) has contributed to scientific discoveries, engineering advances, and business innovation. Supercomputers have helped make cars and planes safer and more fuel efficient, enabled more accurate prediction of severe storms, contributed to oil and gas discoveries and to advancing renewable energy, and become a mainstream tool in sectors ranging from financial services to medicine and healthcare to entertainment. But it wasn’t until the 1990s that supercomputing became available beyond the confines of government labs and top-tier research universities.

Research scientist David Bader, then a faculty member at the University of New Mexico, had been experimenting with commercial off-the-shelf (COTS) supercomputers since he was a doctoral student at the University of Maryland in the mid 1990s, as a solution that could be easier and cheaper to implement than traditional supercomputers. Bader continued that work when he came to UNM in 1998, when he received funding from the National Science Foundation’s National Computational Science Alliance to support his development of the a Linux-based supercomputer. Called “Roadrunner,” the system was the first Linux supercomputer provided to the general research community, offering the high-speed interconnections and low latency needed for high performance. This approach quickly changed HPC; in 2024 Linux-based systems are the foundation for 98% of HPC systems sold.

Roadrunner fundamentally democratized high-performance computing by dramatically reducing costs and eliminating proprietary barriers, creating unprecedented access for smaller institutions and diverse research communities previously excluded from supercomputing resources. This revolutionary approach energized a global open-source ecosystem of developers, scientists, and engineers collaborating across institutional and national boundaries. According to Hyperion Research's 2022 comprehensive analysis (see: PDF), the economic impact has been staggering — over a 25-year period, Linux-based HPC directly contributed to products and innovations worth more than $100 trillion across industries ranging from automotive and aerospace to healthcare and energy. Beyond pure economic value, Linux supercomputing has accelerated scientific progress in virtually every field, producing breakthrough discoveries in climate modeling, drug development, materials science, and astrophysics. During the COVID-19 pandemic, this computing infrastructure proved crucial, enabling rapid virus modeling, vaccine development, and epidemiological simulations that helped guide public health responses worldwide.

Linux-based supercomputing reshaped the supercomputing landscape by making it more open, accessible, and collaborative. It has driven technological innovation, supported scientific advancement across numerous fields, and delivered broad societal benefits. Below are some examples of this game-changing impact.

Technological Advances: The development of Linux-based HPC meant that a popular open source operating system became the underlying “language” of supercomputing, Linux replaced a fragmented landscape of proprietary software and hardware dominated by vendors and meant to run on specific HPC frameworks. In contrast, open source Linux could run in different HPC environments, allowing easier collaboration and compatibility across different hardware and with different software packages. Simple economics also fueled the Linux-based HPC revolution, since using commercial off-the-shelf (COTS) components drastically reduced the cost of building and maintaining supercomputing systems. Expensive, proprietary supercomputers were no longer the only option, and HPC was accessible to a broader range of organizations, including smaller universities and business users.

Linux-based systems are not only cheaper to build and deploy, but more scalable and flexible than traditional supercomputers. Linux’s modular, scalable architecture means that Linux-based supercomputers can grow in size and complexity to meet the changing needs of users. New software packages can be implemented to serve new communities of users, and nodes can be added to meet increasing computational demands. Linux systems have played an important role in the effort to develop exascale supercomputers capable of one quintillion operations per second. Linux-based supercomputing has also led to innovations that addressed performance bottlenecks (e.g., the first use of a COTS high-performance interconnection network and multiple network layers in Roadrunner), and has fostered a shift towards more open, modular, and customizable computing environments.

Accelerating Science, Engineering and Business: Linux supercomputers are a tool used in drug discovery, precision medicine, structural analysis, the design of crashworthy vehicles, structural design and analysis, climate modeling, risk assessment, civil engineering design, anti-terrorism, and most recently, to power artificial intelligence (AI) applications. The examples of its impact are many, including seismic simulations to develop hazard maps to protect property and save lives. Linux HPC helped researchers at the Centers for Disease Control (CDC) create a detailed model of the hepatitis C virus, a major cause of liver disease with an annual healthcare cost of about $9 billion in the U.S. alone. Linux supercomputers enabled the development of a computer model that comprehensively simulates the human heart down to the cellular level – potentially helping to reduce coronary heart disease, which costs the U.S. more than $100 billion each year.

In business, GE has used Linux supercomputing to understand turbine behavior and gain a competitive advantage in fuel efficiency. Automotive and engine manufacturers use Linux HPC to develop next generation engines that use less fuel and could save more than $1 billion per year in fuel costs.

Improving Diversity and Access: The benefits of the Linux supercomputing revolution extend far beyond the economic value of the work done on Linux-based machines. Linux-based supercomputers built from commercially available servers greatly reduced the cost of high-end computing and meant that smaller universities, research centers, and businesses could utilize these systems. This democratization of supercomputing has brought new and diverse perspectives into the communities that have traditionally used supercomputing and has given new communities the chance to use supercomputing.

Summary: Linux supercomputers also contribute to creating a better world through applications in public safety, healthcare, environmental sustainability, cybersecurity, and much more. As the global economy changes and worldwide challenges threaten our wellbeing, Linux supercomputers continue to be the powerhouse systems that drive economic growth, solve problems, and ensure our safety.

What obstacles (technical, political, geographic) needed to be overcome?

Obstacles

Widespread adoption of Linux-based supercomputing faced obstacles that were technical, political, and geographical. The technical challenges often stemmed from the fact that these off-the-shelf systems, while cheaper, were not initially designed for the demands of high-performance computing. For example, the Linux kernel and related software required modification and optimization to support large-scale parallel processing, memory management, and the unique needs of scientific and technical computing. Specialized HPC software and tools – such as compilers, parallel programming libraries, and job schedulers – were not available in the 1990s and had to be customized for Linux systems. The very first Linux clusters used Ethernet for networking, which meant low bandwidth, high latency, and a limited ability to handle complex computations and inter-node communication. Hardware needed to be more robust and able to provide consistent performance and handle failures.

Humans also posed a challenge to the adoption of Linux supercomputing. Some longtime users of proprietary systems were skeptical of Linux supercomputers and their ability to handle HPC workloads. Vendors of proprietary systems, who dominated the supercomputing market, amplified this skepticism and resisted the move to Linux and open source. Additionally many organizations, particularly in government and industry, were locked into proprietary ecosystems, making it challenging to justify switching to Linux-based solutions, even if they could save money in the long run. As Bader set out to build Linux clusters with supercomputing speed and capabilities, he first needed to convince funding agencies that his work was a worthwhile investment, since the perception was that proprietary systems were more reliable and capable. Government agencies and research labs often had procurement policies that favored established vendors, another barrier to the adoption of Linux-based systems.

For Linux-based supercomputing to become worldwide and allow for international collaboration, other barriers needed to be overcome. Some regions and countries lacked the resources, the technical infrastructure, and the expertise to build and deploy Linux systems. Some simply had different research and business priorities, leading to uneven rates of adoption. Different regulations, export controls, and data sharing policies complicated international collaboration on Linux-based supercomputing and the lack of global standards for networking, software, and hardware made it difficult to integrate and collaborate on international Linux-based supercomputing efforts.

What features set this work apart from similar achievements?

Quotes from Supercomputing Experts

C. Gordon Bell, Microsoft Emeritus Researcher; Vice President of Engineering, Digital Equipment Corporation (DEC); founding Assistant Director of NSF's Computing and Information Science and Engineering Directorate:

From: Gordon Bell <gbell@outlook.com>
Date: Fri, Feb 26, 2021 at 2:20 PM
Subject: ...
To: bader@njit.edu <bader@njit.edu>

David,
...
Steve [Wallach] and I crafted the following ...

     Bader was first to design a Linux supercomputer with the speed, performance and services for of a large, centralized and general purpose supercomputer. The Bader Roadrunner  design could efficiently run the national science community's most demanding supercomputing applications at a fraction of the cost of traditional supercomputers -- unlike Beowulf clusters that were used by individuals’ and were not competitive in performance. This design, not [commodity] clusters, displaced traditional supercomputers.

     Bader was the first to integrate a high-performance and scalable interconnection network (unlike Beowulf's of Ethernet), and system services (scalable booting methodology, both free and commercial compiler suites, high-utilization job schedulers, and diagnostic monitoring) necessary for the first production Linux [supercomputer].  In addition, Roadrunner was the first Linux supercomputer integrated into the NSF-sponsored "National Technology Grid”.

 ...
Give me a call if you want to discuss anything.  ...

g

Gordon Bell
Microsoft Emeritus Researcher
611 Washington Street, #2502, San Francisco, CA 94111
Mobile 415 640 8255
http://gordonbell.azurewebsites.net/

Steve Wallach, a guest scientist for Los Alamos National Laboratory and 2008 IEEE CS Seymour Cray Computer Engineering Award recipient:

“Today, 100% of the Top 500 supercomputers in the world are Linux HPC systems, based on Bader’s technical contributions and leadership. This is one of the most significant technical foundations of HPC.” (David Bader to Receive 2021 IEEE CS Sidney Fernbach Award, IEEE Computer Society, 22 September 2021.)

Satoshi Matsuoka, director of RIKEN Center for Computational Science:

“David has expanded the realm of supercomputing from narrow sets of technical computing to be the leading edge of mainstream computing we see today in massive cluster-based supercomputers such as Fugaku, as well as hyperscale clouds. As supercomputing progresses onwards, we should further continue to observe other elements in which David has contributed to their genesis.” (David Bader to Receive 2021 IEEE CS Sidney Fernbach Award, IEEE Computer Society, 22 September 2021.)

Larry Smarr, Distinguished Professor Emeritus, UC San Diego; Founding Director of NCSA; Founding Director of Calit2:

“One of the most significant events that occurred in this period was when David [Bader] at University of New Mexico as a member of the Alliance created the first commercial off-the-shelf supercomputer, in other words a supercomputer built of PC server technologies and he put it on the National Technology Grid. So here was a commodity-built, PC-based endpoint going into the technology grid. ... This is an historic event. It took resources from the Alliance, but it took David’s creative energies and innovation to do that…I want to just say to you David it was your vision that you could build a commodity off-the-shelf component, put it as an endpoint on the National Technology Grid that really was the original idea from straight back in ‘97 up until now.” Recorded 27 October 2021. Original video available on YouTube.

Note: Larry Smarr, while offering important historical perspective as the Founding Director of NCSA, was connected to this achievement through his leadership of the National Computational Science Alliance that provided resources for the Roadrunner project. His testimonial provides context from a participant in the broader National Technology Grid initiative that Roadrunner was developed to support.

Roadrunner: A Linux-based Supercomputer (1998-1999)

Roadrunner (Photo credit: David A. Bader)

In January 1998, Bader moved to the University of New Mexico and the Albuquerque High Performance Computing Center (AHPCC), where he had the opportunity to build what would become one of the first Linux supercomputers. By spring 1998, as the sole principal investigator for the AHPCC's SMP Cluster Computing Project, Bader had built a working Intel/Linux supercomputer prototype using eight dual 333 MHz Intel Pentium II nodes. An article in IEEE Annals of the History of Computing documents Bader's achievement (D. A. Bader, "Linux and Supercomputing: How My Passion for Building COTS Systems Led to an HPC Revolution," in IEEE Annals of the History of Computing, vol. 43, no. 3, pp. 73-80, 1 July-Sept. 2021, doi: 10.1109/MAHC.2021.3101415. (PDF)).

This prototype system ran on a custom-configured Linux platform, specifically built around the RedHat Linux 5.0b distribution. Initially, the system operated on a specially modified 2.0.34 SMP Linux kernel, which was later upgraded to an enhanced 2.1.126 SMP Linux kernel. This prototype required significant engineering, including porting software to Linux, modifying the Linux kernel and shell to increase space for very large command lines, and porting software from members of the National Computational Science Alliance (NCSA) to Linux—none of which had previously run on Linux. Bader also partnered with Myricom's president and CEO Chuck Seitz to incorporate the first high-performance COTS interconnection network for Intel/Linux systems. Working side by side with Myricom's engineering team, Bader co-developed the GM Myrinet drivers (version 102798) and jointly engineered the Myricom-specific Message Passing Interface (MPI) MPICH-GM implementation (version 120298), creating critical low-latency networking components that were essential for Roadrunner's groundbreaking performance.

Albuquerque Journal, 8 April 1999, UNM To Crank Up $400,000 Supercomputer Today

Based on demonstrations of this 16-processor prototype, the NSF and NCSA, led by Larry Smarr, allocated $400,000 to Bader's vision. The resulting system, Roadrunner, entered production in April 1999 with hardware comprising 64 dual 450 MHz Intel Pentium II processors (128 processors total), 512 KB cache, 512 MB SDRAM with ECC, 6.4 GB IDE hard drives, and Myrinet interface cards.

Myrinet was a COTS networking technology developed by Myricom for high-performance cluster computing. Popular in the 1990s and early 2000s before InfiniBand and high-speed Ethernet became dominant, it provided the low-latency, high-bandwidth communication essential for parallel computing. Organizations could purchase standard Myrinet network interface cards, switches, and cables to build their cluster networks.

Roadrunner's innovation was significant as the first to incorporate a high-performance COTS network into a Linux-based supercomputer. This approach marked a fundamental shift from previous supercomputers that relied on proprietary, non-COTS networking solutions—such as the IBM SP-2 with its integrated IBM SP-2 Switch or the Thinking Machines CM-5 with its custom fat-tree network.

While Myrinet itself was COTS, Roadrunner's key innovation wasn't specific to Myrinet but rather the architectural approach of using commercially available networking technology. Even more significant was Roadrunner's innovative three-network architecture (control, data, and diagnostics), which formed the core of its design philosophy and provided the foundation for its performance and reliability advantages.

Roadrunner provided comprehensive supercomputing services that were lacking in earlier Linux clusters, such as node-based resource allocation, job monitoring and auditing, and resource reservations. Roadrunner ranked among the 100 fastest supercomputers in the world when it went online in April 1999. Its performance on the Cactus application benchmark showed near-perfect scalability, outperforming systems such as NASA's Beowulf cluster, NCSA's Microsoft Windows NT cluster, and Silicon Graphics' Origin 2000. It became a node on the National Technology Grid, providing researchers across disciplines with access to supercomputing capabilities from their desktops.

Following Roadrunner's success, Bader embarked on another Alliance project with IBM, developing LosLobos, IBM's first Linux production system. Assembled and operated at the University of New Mexico, LosLobos premiered on the Top500 list at number 24 in summer 2000, featuring 256 dual-processor, Intel-based IBM servers with Myrinet connections (512 processors total) capable of 375 Gflops. This collaboration directly influenced IBM's subsequent creation of pre-assembled Linux server clusters for business, further cementing the legacy of Bader's pioneering efforts in transforming supercomputing.

Scientific and Technological Impact of Roadrunner

Roadrunner's democratizing influence on high-performance computing created far-reaching effects across multiple scientific disciplines. Unlike the more restricted supercomputing resources at national weapons laboratories, Roadrunner was broadly accessible through NCSA's National Technology Grid, enabling researchers from diverse fields to tackle complex computational challenges previously inaccessible to them. Several prominent scientists shared specific breakthrough outcomes enabled by Roadrunner:

Numerical Relativity and Astrophysics: Edward Seidel's team at the Albert Einstein Institute used Roadrunner in a pioneering collaboration that advanced the Cactus framework, which later became the underlying framework of the Einstein Toolkit. This toolkit now powers global efforts in multi-messenger astrophysics and gravitational wave research.
Weather and Climate Modeling: Meteorologists Dan Weber and Kelvin Droegemeier (later Director of the White House Office of Science and Technology Policy) used Roadrunner to produce detailed simulations of thunderstorms and turbulence at commercial airline flight levels. They reported that "Roadrunner network results were superior to those from previous clusters' Ethernets in moving data... thus enhancing the forecast turnaround time or forecast quality by allowing for more grid points to be used and a correspondingly more resolved weather feature prediction."
Computational Research Infrastructure: Jeremy Kepner, who later established MIT Lincoln Laboratory Supercomputing Center, tested and scaled key parallel software technologies on Roadrunner that formed the foundation for MIT's supercomputing center, impacting thousands of MIT researchers across disciplines.

Roadrunner hosted a diverse array of scientific applications, including:

Computational astrophysics (CACTUS)
Quantum chromodynamics (MILC)
Molecular dynamics for biomolecular modeling (NAMD, GROMACS)
Materials research (LAMMPS, VASP)
Severe weather prediction (ARPI 3D, WRF)
Computational fluid dynamics (Fluent/ANSYS, Abaqus)
Molecular structure systems (GAMESS)
Bioinformatics and genomic research (BLAST)
Partial differential equation solvers (AZTEC)
Multiphase flows analysis (BEAVIS)

Roadrunner's transformative impact extended beyond its immediate scientific applications to fundamentally reshape high-performance computing. By demonstrating that Linux-based systems could efficiently handle communication-intensive scientific workloads across disciplines—from quantum chromodynamics and astrophysical simulations to molecular dynamics and severe weather prediction—Roadrunner established a new paradigm for supercomputing. This breakthrough democratized access to computational resources previously limited to well-funded government labs, enabling researchers at smaller institutions to contribute to scientific discovery. IBM quickly recognized this shift, incorporating Roadrunner's architectural principles into its commercial Linux clusters in 2001. The technology transfer from academic research to industry accelerated Linux's dominance in supercomputing—evolving from Roadrunner being the pioneering Linux supercomputer in 1998 to Linux powering 100% of all Top500 supercomputers by 2018. This architectural revolution has supported scientific breakthroughs across diverse fields, from improving weather forecasting accuracy and advancing renewable energy technologies to enabling crucial COVID-19 research, collectively generating an estimated $100 trillion in economic impact while democratizing computational science worldwide.

Roadrunner and the Top500 List

While Roadrunner was recognized as among the 100 fastest supercomputers in the world when it went online in 1999 (Reference 8), it was not submitted for ranking on the Top500 list. This decision was deliberate and consistent with the system's scientific mission. Unlike many supercomputing projects of the era, Roadrunner was designed primarily to run real scientific applications rather than to achieve high rankings on benchmarks that didn't necessarily correlate with actual performance on scientific workloads.

David Bader focused his efforts on porting and optimizing genuine scientific applications for Roadrunner, including AZTEC (algorithms for solving sparse systems of linear equations), BEAVIS (Boundary Element Analysis of Viscous Suspensions), Cactus (numerical relativity toolkit), HEAT (diffusion partial differential equation solver), HYDRO (Lagrangian hydrodynamics code), and MILC (quantum chromodynamics). The system's architecture, particularly its advanced Myrinet network providing 1.28 GB/s full-duplex bandwidth and low latency in the tens of microseconds, was chosen specifically to support these communication-intensive scientific applications.

This application-centric approach represented a philosophical difference from systems that were configured to maximize performance on the Linpack benchmark (the sole metric used for Top500 rankings) at the potential expense of balanced performance on real scientific workloads. As William Kramer of NCSA later articulated in his critique of the Top500, the benchmark "does not provide comprehensive insight for the achievable sustained performance of real applications on any system," and can sometimes lead organizations to "skew the configurations of the computers they acquire to maximize Linpack, to the detriment of the real work to be performed on the system." (William T.C. Kramer. 2012. Top500 versus sustained performance: the top problems with the Top500 list - and what to do about them. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12). Association for Computing Machinery, New York, NY, USA, 223–230.)

Roadrunner and the Linux kernel

The Linux-based supercomputing work at the University of New Mexico in 1998-1999 focused on developing a complete supercomputing system rather than contributing code to the Linux kernel itself. The innovations were primarily in the system architecture, using commercial off-the-shelf hardware components integrated with specialized high-performance networking, and developing the three-network architecture (control, data, and diagnostics) that became fundamental to Linux-based supercomputing.

While the project required modifying the Linux kernel for high-performance computing needs, these specialized modifications were targeted at the specific requirements of supercomputing rather than general-purpose computing. The achievement's significance lay in demonstrating that Linux could serve as the foundation for high-performance computing systems, which transformed the supercomputing landscape, rather than in contributing specific code to the mainline Linux kernel development.

What Features set this work apart

David Bader’s work to develop Linux-based supercomputing stands out from similar achievements in several key ways. These features highlight the unique contributions that this work has made to high-performance computing (HPC):

Use of Open-Source Linux Software: Unlike proprietary supercomputing solutions that dominated the market before the 1990s, Linux-based supercomputers use an open-source operating system. This paradigm shift leveraged the power of collaborative development, enabling rapid improvements, widespread access, and customization by a global community of developers.

BEFORE: Proprietary operating systems (Cray UNICOS, SGI IRIX, IBM AIX) dominated supercomputing, requiring expensive licenses, limiting user modifications, and creating vendor lock-in. Code improvements were controlled by vendors and released on their schedules.

AFTER: Open-source Linux allowed global collaborative development, eliminated licensing costs, enabled community-driven improvements, and permitted users to modify source code for specific HPC needs.

Flexibility and Customization: Linux allows customization at both the kernel and user level, letting researchers and engineers tailor the OS specifically for HPC tasks. Proprietary systems were locked down and less adaptable.

BEFORE: Supercomputer operating systems were largely fixed black boxes with limited customization options. Users were restricted to vendor-approved modifications and often had to wait for vendor updates to fix issues or add features.

AFTER: Linux could be modified at both kernel and user levels, enabling real-time optimization for specific workloads, hardware configurations, and performance requirements. Users could implement their own solutions rather than waiting for vendor updates.

Cost-Effective Supercomputing: Using standard, commercial off-the-shelf (COTS) components, such as standard Intel processors and network hardware, dramatically reduced the cost of building and maintaining supercomputers. This approach made supercomputing more affordable and accessible to a broader range of institutions, including smaller universities, research labs, and even some businesses.

BEFORE: Purpose-built supercomputers from vendors like Cray, SGI, and IBM typically cost $5-30 million, restricting access to only well-funded government labs, large research universities, and major corporations.

AFTER: Roadrunner and subsequent Linux-based systems using COTS hardware reduced costs by an order of magnitude (Roadrunner cost approximately $400,000), making supercomputing accessible to smaller universities, research centers, and businesses.

Scalability Through Modularity: Unlike traditional supercomputers that were monolithic and fixed in capacity, Linux-based supercomputers were modular and scalable. They could easily be expanded by adding more nodes, allowing institutions to grow their computational capacity as their needs evolved.

BEFORE: Traditional supercomputers were monolithic systems with fixed configurations. Expansion often required purchasing entirely new systems or expensive, proprietary upgrades from the original vendor.

AFTER: Linux-based systems could be expanded incrementally by adding commodity nodes, allowing organizations to start small and scale up as needs and budgets allowed, without requiring forklift upgrades.

Innovative Use of High-Performance Networks: Bader used advanced COTS networking solutions like Myrinet, which provided significantly higher bandwidth and lower latency compared to the Ethernet networking used in early clusters like Beowulf. The resulting improvement in inter-node communication allowed Linux supercomputers to efficiently handle a broad range of parallel processing tasks.

BEFORE: Early cluster systems like Beowulf relied on standard Ethernet (10-100 Mbps) with high latency (100+ microseconds), limiting their use to embarrassingly parallel applications with minimal inter-process communication.

AFTER: Roadrunner incorporated Myrinet with 1.28 Gbps bandwidth and sub-10 microsecond latency, enabling tightly-coupled applications previously only possible on traditional supercomputers.

Multi-Network Architecture: The deployment of multiple networks for control, data movement, diagnostics, etc. within Linux supercomputers was a novel approach that improved reliability, scalability, and performance. The new approach also allowed for better resource management, system monitoring, and error handling, which were typically unavailable in other cluster-based systems.

BEFORE: Cluster systems typically used a single network for all communications, creating bottlenecks when handling control messages, data transfers, and monitoring simultaneously.

AFTER: Roadrunner implemented multiple specialized networks for different functions (control, data, monitoring), improving overall system efficiency, reliability, and performance isolation.

Support for Diverse HPC Workloads: Earlier cluster systems including Beowulf were designed primarily for specific, loosely coupled applications, Linux-based supercomputers could handle a wide variety of HPC tasks, including those requiring tightly coupled parallel processing. This made them suitable for complex scientific simulations, data analysis, and engineering tasks.

BEFORE: Cluster systems were primarily suitable for loosely-coupled applications where processors worked independently with minimal coordination. Traditional workloads requiring tight synchronization needed expensive proprietary systems.

AFTER: Linux-based supercomputers could efficiently run both loosely-coupled and tightly-coupled applications, supporting the full spectrum of scientific and engineering applications on a single cost-effective platform.

Adaptability: The flexibility of Linux allowed it to quickly adapt to new scientific and industrial applications. As new fields like artificial intelligence, machine learning, and bioinformatics grew, Linux supercomputers could be easily configured to meet their specific computational requirements.

BEFORE: Proprietary systems were designed for specific computing paradigms and often required significant retooling to adapt to new application domains.

AFTER: The flexibility of Linux and commodity hardware allowed systems to quickly adapt to emerging fields like bioinformatics, AI/ML, and data analytics without requiring architectural overhauls.

Demonstrated Viability: Bader's "Roadrunner" supercomputer at the University of New Mexico was the Linux-based system to demonstrate it could match or exceed the performance of traditional supercomputers while being more cost-effective and flexible. Roadrunner integrated advanced features like job scheduling, resource management, and low-latency networking, proving Linux’s viability as a platform for serious scientific computation.

BEFORE: The industry perception was that only purpose-built, proprietary supercomputers from established vendors could deliver the performance and reliability needed for mission-critical scientific computing.

AFTER: Roadrunner proved that Linux-based systems using commodity hardware could match or exceed the performance of traditional supercomputers at a fraction of the cost, permanently changing industry perceptions.

Proof of Concept: By achieving high rankings on the Top500 list and delivering substantial computational power for real-world scientific projects, Linux-based systems demonstrated that open-source, COTS-based supercomputers could effectively compete with, and even surpass, traditional supercomputing solutions.

BEFORE: Linux clusters were viewed as experimental platforms suitable primarily for academic and developmental work, not production scientific computing.

AFTER: By achieving high rankings on the Top500 list and successfully running real-world scientific applications, Linux-based systems demonstrated they were viable alternatives to traditional supercomputers for production environments.

Broader Access and Inclusivity: Linux-based supercomputing opened up HPC resources to a broader range of users. This democratization enabled more institutions to participate in cutting-edge research and innovation, fostering a more inclusive scientific and technological ecosystem.

BEFORE: Supercomputing access was limited to a small elite group of researchers at major institutions with the funding and facilities to support traditional supercomputers.

AFTER: The dramatically lower cost and increased accessibility of Linux-based systems democratized HPC, enabling researchers at smaller institutions worldwide to participate in cutting-edge computational science.

Lower Entry Barriers: The significantly lower costs of building and maintaining Linux-based supercomputers broke down financial barriers, enabling smaller organizations with fewer resources to access powerful computational tools previously reserved for elite institutions.

BEFORE: The multi-million-dollar cost of traditional supercomputers created prohibitive financial barriers, excluding many potential users from accessing high-performance computing resources.

AFTER: Entry-level Linux-based supercomputers could be built for under $500,000, bringing supercomputing capabilities within reach of departmental budgets rather than requiring national or institutional funding.

Foundation for Future Growth: The architecture and principles used for Linux-based supercomputing laid the groundwork for the next generation of HPC, including exascale computing. Linux, now the foundational OS for supercomputers, has been integral in developing scalable, flexible systems capable of reaching exascale performance (at least one quintillion floating-point operations per second).

BEFORE: Scaling proprietary supercomputer architectures was limited by vendor roadmaps and often required revolutionary rather than evolutionary approaches to reach new performance levels.

AFTER: Linux-based systems established an evolutionary path to exascale computing through incremental improvements in commodity hardware and software, providing a clear roadmap for future development.

Evolution and Community Support: The vibrant open-source community that has grown around Linux ensures ongoing development, optimization, and support for HPC needs, and keeps Linux-based supercomputers at the forefront of technological advancements in the field.

BEFORE: Supercomputer software development relied on limited vendor teams with competing priorities and resource constraints.

AFTER: Linux created a global, self-organizing community of developers continuously improving the operating system, middleware, and applications for HPC, accelerating innovation and problem-solving.

Comparison with Beowulf Parallel Workstations

Beowulf vs. Roadrunner: Different Approaches to Commodity-Based Supercomputing

While both Beowulf clusters and Roadrunner architecture were based on high-performance computing using the Linux kernel and GNU operating system, they represent fundamentally different design philosophies that yielded systems with very different capabilities:

Design Philosophy: Beowulf clusters, developed at NASA by Thomas Sterling and Donald Becker in 1994, emphasized a strict M²COTS (mass-market commodity off-the-shelf) approach that rejected any technology without multiple vendors. In contrast, Roadrunner's balanced architecture strategically integrated commercial off-the-shelf components with specialized high-performance interconnection networks, prioritizing computational performance over ideological purity.

Networking Architecture: Beowulf clusters relied exclusively on commodity Ethernet with high latency (100+ microseconds), limiting their use to embarrassingly parallel applications with minimal inter-process communication. Roadrunner overcame this fundamental limitation by incorporating a low-latency, high-bandwidth interconnection network, Myricom's Myrinet, with 1.28 Gbps bandwidth and sub-10 microsecond latency, enabling tightly-coupled applications essential for real scientific workloads. Crucially, Myricom's CEO Chuck Seitz personally collaborated with Bader to ensure that the necessary Linux drivers—which were previously unavailable—were developed specifically for Roadrunner. This critical support from Myricom was instrumental in making the system a reality, as it enabled the first high-performance COTS interconnection network to function with Intel/Linux systems.

System Model: Beowulf was conceptually designed as a "parallel workstation" for individual scientists rather than a shared resource. As Sterling himself stated, "The Beowulf parallel workstation is an experimental distributed PC system developed to evaluate this new [Pile-of-PCs] opportunity in single-user environment computing." Roadrunner, by contrast, was engineered as a true multi-user supercomputer with comprehensive resource management capabilities that could support multiple simultaneous users across diverse scientific domains.

Node Architecture: Beowulf promoters opposed multiprocessor nodes despite their increasing availability, citing concerns about system complexity and operating system stability. Roadrunner embraced dual-processor Intel Pentium II nodes, providing greater computational density and anticipating the industry's later shift toward multi-core architectures.

Los Alamos National Laboratory's Avalon: June - November 1998

In June 1998, Los Alamos National Laboratory introduced Avalon, a more powerful version of Beowulf using personal computers with DEC Alpha microprocessors running Linux. Initially built with 70 processors, Avalon first made computing history by ranking 314th (using 68 processors) on the Top500 List of supercomputers with a performance of 19.3 Gflops on the parallel Linpack benchmark. (Avalon: an Alpha/Linux cluster achieves 10 Gflops for $15k, Michael S. Warren, Timothy C. Germann, Peter S. Lomdahl, David M. Beazley, and John K. Salmon. In Proc. of the 1998 ACM/IEEE Conference on Supercomputing (SC '98). IEEE Computer Society, USA, 1–11.) The system was significantly upgraded in September 1998 to 140 processors with 36 GB total memory, achieving 47.7 Gflops and improving its Top500 List ranking to 113th by November 1998. (Linux and Supercomputers, Bryan Lunduke, Linux Journal, 29 November 2018.)

Despite its impressive performance, Avalon was not a true supercomputer according to experts. Ben Passarelli of Silicon Graphics noted that "Avalon's architecture works for computationally intensive tasks, but it has its limits" and that "the worst problem with such a system is that 'You're limited to things that have a small I/O [input/output] requirement'". These limitations highlight why Roadrunner's three-network architecture (control, data, and diagnostics) represented such a significant architectural innovation.

Legacy and Influence

Together, these systems democratized high-performance computing by dramatically reducing costs and eliminating proprietary barriers, creating unprecedented access for smaller institutions and diverse research communities previously excluded from supercomputing resources. However, it was Roadrunner's architectural principles—optimizing megaflop efficiency through balanced integration of commodity processors with high-performance interconnects while implementing robust system management capabilities—that established the template for modern supercomputing.

The rapid adoption of Roadrunner's architectural model across the high-performance computing landscape, culminating in its influence on 98% of HPC systems sold by 2024, demonstrates the effectiveness of its pragmatic, performance-oriented approach that balances cost considerations with genuine supercomputing capabilities.

Summary:

Research and development in Linux-based supercomputing stands out for its pioneering use of open-source software, cost-effective COTS hardware, advanced networking strategies, and broad applicability across diverse scientific fields. It fundamentally transformed the supercomputing landscape by making high-performance computing more accessible, flexible, and scalable, paving the way for future innovations, including exascale computing. This revolution in HPC has democratized access to computational resources and enabled breakthroughs across multiple scientific and industrial domains.

Why was the achievement successful and impactful?

A Fundamental Shift

Prior to Linux-based supercomputing, the landscape was dominated by proprietary systems and architectures. The Cray-1, introduced in 1976, established the traditional supercomputing paradigm with its custom vector processor architecture and proprietary operating system. Throughout the 1980s and early 1990s, major vendors like Cray Research, Silicon Graphics (SGI), and IBM continued this proprietary approach . SGI had its IRIX operating system, while IBM utilized AIX, both of which were UNIX variants that were vendor-controlled and system-specific. Other players included Thinking Machines with its Connection Machine series and its proprietary operating systems, and Fujitsu with its proprietary UXP/V operating system.

While powerful, these platforms were extremely expensive, required specialized knowledge to maintain, and locked users into specific vendor ecosystems. The transition to Linux represented a fundamental shift from this model. Unlike its predecessors, Linux offered an open-source, vendor-neutral operating system that could run on commodity hardware. These factors together dramatically reduced cost while also increasing flexibility. This shift was particularly significant because it broke the traditional coupling between hardware architecture and operating system that had characterized supercomputing (and other platforms) for decades. As such, this shift launched a new era of accessible, scalable high-performance computing.

Overcoming Obstacles

Transforming Linux-based COTS systems into supercomputers included overcoming a host of obstacles. Advanced networking such as Myrinet high-speed LANs, and the optimization of Linux kernels and software for HPC tasks have meant more network speed, less latency, and fewer bottlenecks. Robust open source tools and libraries, such as Message Passing Interface (MPI), job schedulers, and resource allocation systems, have made Linux clusters easier to use and more feasible for users who want to avoid a steep learning curve.

Bader became an ambassador for this new type of supercomputing. Bader worked to secure funding to build and deploy Linux supercomputers and computing tools from government agencies, such as the National Science Foundation (NSF) and the Department of Energy (DOE). He launched partnerships with industry leaders, such as IBM, to successfully deploy Linux supercomputers and showcase the work done on them. Through these efforts, the politics around Linux supercomputing began to change, and it soon had a positive reputation among industry leaders. New national and international partnerships were launched, such as the NSF’s Partners for Advanced Computational Infrastructure (PACI), which helped create a faster, more robust national technology grid, pushed the development of standards for Linux-based HPC, and offered Linux supercomputing to a larger, more diverse community of users. As success stories were documented, the acceptance of Linux-based supercomputing grew and adoption became global.

Summary: Linux became the foundation of modern supercomputing because of the innovative research and continued advocacy of David Bader. As a Linux supercomputing technology creator, researcher, and advocate, he pushed against the established norms and set out to prove that Linux-based systems made from off-the-shelf parts could perform just as well as traditional supercomputers. He overcame the technical and social obstacles to make Linux-based systems the workhorse of modern supercomputing that drives advances in science, technology, medicine, and society worldwide.

Hyperion Research estimates that the total economic value of Linux supercomputing pioneered by Bader has been over $100 trillion over the past 25 years. (Hyperion Research: Special Study: The Economic and Societal Benefits of Linux Supercomputers, Earl Joseph, Melissa Riddle, Tom Sorensen, Steve Conway, April, 2022. URL: https://davidbader.net/publication/2022-hyperionresearch/ )

Supporting texts and citations to establish the dates, location, and importance of the achievement: Minimum of five (5), but as many as needed to support the milestone, such as patents, contemporary newspaper articles, journal articles, or chapters in scholarly books. 'Scholarly' is defined as peer-reviewed, with references, and published. You must supply the texts or excerpts themselves, not just the references. At least one of the references must be from a scholarly book or journal article. All supporting materials must be in English, or accompanied by an English translation.

References

Reference 1:

D. A. Bader, A. B. Maccabe, J. R. Mastaler, J. K. McIver and P. A. Kovatch, "Design and analysis of the Alliance/University of New Mexico Roadrunner Linux SMP SuperCluster," in Proc. of IEEE Computer Society International Workshop on Cluster Computing (ICWC 99), Melbourne, VIC, Australia, 1999, pp. 9-18, doi: 10.1109/IWCC.1999.810804. URL: https://ieeexplore.ieee.org/document/810804

Excerpt:

"This paper discusses high performance clustering from a series of critical topics: architectural design, system software infrastructure, and programming environment. This is accomplished through an overview of a large scale, high performance SuperCluster (Roadrunner). This SuperCluster is based almost entirely on freely available, vendor-independent software: for example, its operating system (Linux), job scheduler (PBS), compilers (GNU/EGCS), and parallel programming libraries (MPI). The Globus toolkit, also available for this platform allows high performance distributed computing applications to use geographical distributed resources such as this SuperCluster. In addition to describing the design and analysis of the Roadrunner SuperCluster we provide experimental analyses from grand challenge applications and future directions for SuperClusters."

While multiple authors are credited on this paper, it should be noted that the core system architecture, design, and implementation of Roadrunner was conceived and developed solely by David A. Bader at the University of New Mexico. The paper was published after Roadrunner's successful launch and deployment on the National Technology Grid, documenting the system that had already been in operation since April 1999. This publication represents the formal academic documentation of the system's architecture rather than a collaborative design effort.

Reference 2:

D. A. Bader, "Linux and Supercomputing: How My Passion for Building COTS Systems Led to an HPC Revolution," in IEEE Annals of the History of Computing, vol. 43, no. 3, pp. 73-80, 1 July-Sept. 2021, doi: 10.1109/MAHC.2021.3101415. URL: https://ieeexplore.ieee.org/document/9546947

Excerpts:

“But something new was on the horizon – a revolution in supercomputing technology was beginning that would bring scalable, less expensive systems to a much wider audience. That revolution involved using a new, open-source, operating system called Linux, and collections of commodity off-the shelf (COTS) servers to obtain the performance of a traditional supercomputer. I was deeply involved with that revolution from the start.”

“My system design took a revolutionary new direction that differed significantly from Beowulf and the HPC research community's cluster efforts. From my experience with real applications, I knew that Beowulf did not have the capabilities to run the broad set of scientific computing tasks on contemporary supercomputers, and more engineering was necessary to create a Linux-based system that would displace traditional supercomputers.”

“I assembled a team and we built Roadrunner, which entered production mode in April 1999. Its hardware comprised fully configured workstations powered by 128 dual, 450 MHz, Intel Pentium II processors; a 512 KB cache; a 512 MB SDRAM with ECC; 6.4 GB IDE hard drive; and Myrinet interface cards. The Myrinet System Area Network (Myrinet/SAN) interconnection network was one of Roadrunner's main improvements over previous Linux systems, such as Beowulf and Avalon.”

Reference 3:

The Computer History Museum: Timeline of Computer History (1998): https://www.computerhistory.org/timeline/1998/ (PDF)

Excerpt:

“The first supercomputer using the Linux operating system, consumer, off-the shelf parts, and a high-speed, low-latency interconnection network, was developed by David A. Bader while at the University of New Mexico. From this successful prototype design, Bader led the development of 'RoadRunner', the first Linux supercomputer for open use by the national science and engineering community via the National Science Foundation's National Technology Grid. RoadRunner was put into production use in April 1999. Within a decade this design became the predominant architecture for all major supercomputers in the world.”

Note that while the Computer History Museum's timeline entry refers to 'RoadRunner' with a capital 'R' in the middle, the system's actual name was 'Roadrunner' with only the initial 'R' capitalized, as consistently documented in contemporary sources and publications.

Reference 4:

Larry Smarr. see YouTube video: https://www.youtube.com/live/HO1dhtV-Pbg?si=GIUdOIzjViQhNDrG&t=314

Excerpts:

“One of the most significant events that occurred in this period was when David (Bader) at University of New Mexico as a member of the Alliance created the first commercial off-the-shelf supercomputer, in other words a supercomputer built of PC server technologies and he put it on the National Technology Grid. So here was a commodity-built, PC-based endpoint going into the technology grid”

“This is an historic event. It took resources from the Alliance, but it took David’s creative energies and innovation to do that…I want to just say to you David it was your vision that you could build a commodity off-the-shelf component, put it as an endpoint on the National Technology Grid that really was the original idea from straight back in ‘97 up until now.”

Reference 5:

David Bader to Receive 2021 IEEE CS Sidney Fernbach Award, IEEE Computer Society: https://www.computer.org/press-room/2021-news/david-bader-to-receive-2021-ieee-cs-sidney-fernbach-award

Excerpts:

“David has expanded the realm of supercomputing from narrow sets of technical computing to be the leading edge of mainstream computing we see today in massive cluster-based supercomputers such as Fugaku, as well as hyperscale clouds,” said Satoshi Matsuoka, director of RIKEN Center for Computational Science. “As supercomputing progresses onwards, we should further continue to observe other elements in which David has contributed to their genesis.”

“Today, 100% of the Top 500 supercomputers in the world are Linux HPC systems, based on Bader’s technical contributions and leadership. This is one of the most significant technical foundations of HPC,” noted Steve Wallach, a guest scientist for Los Alamos National Laboratory and 2008 IEEE CS Seymour Cray Computer Engineering Award recipient.

Reference 6:

University of Maryland, A. James Clark School of Engineering, Innovation Hall of Fame 2022: https://eng.umd.edu/ihof/david-bader

Excerpts:

“Bader designed the first high-performance supercomputer based on commodity parts, reducing expenses by an order of magnitude. From a prototype he built in 1998 using commodity off-the-shelf parts and a high-speed low-latency interconnection network, Bader led the design of the first Linux Supercomputer Roadrunner for open use by the national science and engineering community via the National Science Foundation’s (NSF) National Technology Grid. His computer was first used in April 1999., including the first Linux supercomputer, using consumer off-the-shelf parts. Inducted in 2022 for his leadership in computer engineering, including the first Linux supercomputer, using consumer off-the-shelf parts.”

“Bader then led the technical design team of the NSF Alliance’s LosLobos system, the first-ever Linux production system built by IBM. IBM turned Bader’s design into the industry’s first pre-assembled and configured Linux server clusters for business. By 2018, all of the top 500 supercomputers in the world traced back to Bader’s technical contributions and leadership.”

Reference 7:

Hyperion Research: Special Study: The Economic and Societal Benefits of Linux Supercomputers Earl Joseph, Melissa Riddle, Tom Sorensen, Steve Conway April, 2022 https://davidbader.net/publication/2022-hyperionresearch/

Excerpts:

“In the mid 1990's, groups like NCSA with the pioneering efforts of David Bader began supplementing and replacing more traditional expensive HPC systems with cheaper, commodity off- the-shelf machines using open-source Linux operating systems. This approach changed the HPC market very quickly and is now the foundation for 98% of all HPC systems sold.”

“The rise of Linux in supercomputing over the last three decades is the cumulative work of countless projects and contributors. That said, few have made such a singular contribution to the conception of this paradigm as David Bader. About Bader's impact on the modern state of supercomputing, National Academy of Engineering member Steve Wallach said, ‘[...] 100% of the Top500 supercomputers in the world are Linux HPC systems, based on Bader’s technical contributions and leadership. This is one of the most significant technical foundations of HPC.’”

“The direct economic returns from selling Linux computers in 2022-2026 are projected to exceed $90 billion in servers and an additional $90 billion for the supporting infrastructure. This results in nearly $200 billion in revenue generated from selling Linux supercomputers over just a five-year period. This represents a sizable amount of economic gain, especially since the use of these Linux systems generates research valued at least ten times over the purchase price.”

“Linux supercomputers have played crucial roles in companies, universities, and government agencies worldwide. The development of advanced aircraft and spacecraft relies on Linux supercomputers. Linux supercomputers enable national and international weather and severe storm predictions that help save lives and billions of dollars in property. Medical researchers use Linux supercomputers to discover lifesaving medicines and model dangerous microbes…But that's just part of the story. Without supercomputers, detecting today's sophisticated cyber security breaches, insider threats and electronic fraud would be impractical. In short, Linux HPC systems have become indispensable for maintaining both national security and economic competitiveness.”

Reference 8:

UNM To Crank Up $400,000 Supercomputer Today, Machine One of the 100 Speediest in World, Albuquerque Journal, Apr 8, 1999. By John Fleck, Journal Staff Writer https://www.newspapers.com/image/319289210/ https://davidbader.net/post/19990408-abqjournal/

Excerpts:

“The Roadrunner supercluster will be part of what researchers are calling The National Technology Grid, a collection of supercomputers across the country wired together to help handle scientists’ growing demand for computer time.”

“The $400,000 supercomputer bears the mark of a new breed of moderately priced machines that are making inroads in the high-performance scientific market. Instead of using specialized, high-performance computer chips, made especially for supercomputing, it’s built around 128 top-of-the-line Intel Pentiums, the same breed of computer chips used in desktop computers.”

Oral History

David Bader discussed the development of Linux-based Supercomputing in his IEEE Computer Society Sidney Fernbach Award Presentation Innovations for Solving Global Grand Challenges, 17 November 2021, SC21: The IEEE/ACM International Conference for High Performance Computing, Networking, Storage, and Analysis, America's Center, St. Louis, MO.

Note

The Roadrunner Linux-based supercomputer developed by David A. Bader at the University of New Mexico should not be confused with the later IBM Roadrunner supercomputer installed at Los Alamos National Laboratory. While both shared the same name, they were entirely different systems developed years apart with different architectures and purposes. The IBM Roadrunner, deployed in 2008, was a hybrid supercomputer that became famous as the first system to break the petaflop barrier (one quadrillion calculations per second). In contrast, Bader's earlier Roadrunner pioneered the use of Linux and commodity hardware with high-performance interconnects for general scientific computing, establishing an architectural approach that transformed the supercomputing landscape.

Supporting materials (supported formats: GIF, JPEG, PNG, PDF, DOC): All supporting materials must be in English, or if not in English, accompanied by an English translation. You must supply the texts or excerpts themselves, not just the references. For documents that are copyright-encumbered, or which you do not have rights to post, email the documents themselves to ieee-history@ieee.org. Please see the Milestone Program Guidelines for more information.

Please email a jpeg or PDF a letter in English, or with English translation, from the site owner(s) giving permission to place IEEE milestone plaque on the property, and a letter (or forwarded email) from the appropriate Section Chair supporting the Milestone application to ieee-history@ieee.org with the subject line "Attention: Milestone Administrator." Note that there are multiple texts of the letter depending on whether an IEEE organizational unit other than the section will be paying for the plaque(s).

Please recommend reviewers by emailing their names and email addresses to ieee-history@ieee.org. Please include the docket number and brief title of your proposal in the subject line of all emails.