Milestone-Proposal:The DIALOG Online Search System, 1966-1970

From IEEE Milestones Wiki
Revision as of 19:14, 16 March 2018 by Bberg (talk | contribs)


To see comments, or add a comment to this discussion, click here.

Docket #:2018-01

This Proposal has been approved, and is now a Milestone


To the proposer’s knowledge, is this achievement subject to litigation? No

Is the achievement you are proposing more than 25 years old? Yes

Is the achievement you are proposing within IEEE’s designated fields as defined by IEEE Bylaw I-104.11, namely: Engineering, Computer Sciences and Information Technology, Physical Sciences, Biological and Medical Sciences, Mathematics, Technical Communications, Education, Management, and Law and Policy. Yes

Did the achievement provide a meaningful benefit for humanity? Yes

Was it of at least regional importance? Yes

Has an IEEE Organizational Unit agreed to pay for the milestone plaque(s)? Yes

Has an IEEE Organizational Unit agreed to arrange the dedication ceremony? Yes

Has the IEEE Section in which the milestone is located agreed to take responsibility for the plaque after it is dedicated? Yes

Has the owner of the site agreed to have it designated as an IEEE Milestone? Yes


Year or range of years in which the achievement occurred:

1966

Title of the proposed milestone:

The DIALOG Online Search System, 1966

Plaque citation summarizing the achievement and its significance:

DIALOG was the first interactive, online search system addressing large databases and allowing iterative refinement of results. Developed at Lockheed Palo Alto Research Laboratory in 1966, extended through contracts with NASA, and offered commercially in 1972, its functionality and data content attracted a diversity of professional users worldwide including scientists, attorneys, educators and librarians. DIALOG preceded the internet by over two decades.

200-250 word abstract describing the significance of the technical achievement being proposed, the person(s) involved, historical context, humanitarian and social impact, as well as any possible controversies the advocate might need to review.


IEEE technical societies and technical councils within whose fields of interest the Milestone proposal resides.


In what IEEE section(s) does it reside?

Santa Clara Valley Section

IEEE Organizational Unit(s) which have agreed to sponsor the Milestone:

IEEE Organizational Unit(s) paying for milestone plaque(s):

Unit: Santa Clara Valley Section
Senior Officer Name: Joseph Wei

IEEE Organizational Unit(s) arranging the dedication ceremony:

Unit: Santa Clara Valley Section
Senior Officer Name: Joseph Wei

IEEE section(s) monitoring the plaque(s):

IEEE Section: Santa Clara Valley Section
IEEE Section Chair name: Joseph Wei

Milestone proposer(s):

Proposer name: Brian Berg
Proposer email: Proposer's email masked to public

Please note: your email address and contact information will be masked on the website for privacy reasons. Only IEEE History Center Staff will be able to view the email address.

Street address(es) and GPS coordinates in decimal form of the intended milestone plaque site(s):

Site 1: Lockheed Martin Advanced Technology Center (formerly Lockheed Palo Alto Research Laboratory, Bldg. 201), 3251 Hanover St., Bldg. 245, Palo Alto, CA 94304-1215 (secure facility - not publicly accessible)

Site 2: Computer History Museum, 1401 N Shoreline Blvd, Mountain View, CA 94043 (publicly accessible)

Describe briefly the intended site(s) of the milestone plaque(s). The intended site(s) must have a direct connection with the achievement (e.g. where developed, invented, tested, demonstrated, installed, or operated, etc.). A museum where a device or example of the technology is displayed, or the university where the inventor studied, are not, in themselves, sufficient connection for a milestone plaque.

Please give the address(es) of the plaque site(s) (GPS coordinates if you have them). Also please give the details of the mounting, i.e. on the outside of the building, in the ground floor entrance hall, on a plinth on the grounds, etc. If visitors to the plaque site will need to go through security, or make an appointment, please give the contact information visitors will need. Site 1: DIALOG was developed and maintained at this Palo Alto corporate facility from 1966-1981.

Site 2: the Computer History Museum is a very visible place in Silicon Valley that already houses two Milestone plaques and one Special Citation plaque on an exterior brick wall that is accessible 24/7.

Are the original buildings extant?

Yes.

Details of the plaque mounting:

Site 1: TBD, but likely indoors and out of view of the public.

Site 2: on an exterior brick wall that is accessible 24/7

How is the site protected/secured, and in what ways is it accessible to the public?

Site 1: within a secure facility requiring a specific US security clearance to access

Site 2: freely accessible to the public 24/7

Who is the present owner of the site(s)?

Site 1: Lockheed Martin Space Systems Company, Palo Alto, CA

Site 2: Computer History Museum, Mountain View, CA

What is the historical significance of the work (its technological, scientific, or social importance)? If personal names are included in citation, include justification here. (see section 6 of Milestone Guidelines)

Until about 1969, most scientists, engineers, attorneys, educators, librarians and others who wanted to research what was known and published in a particular discipline were required to physically locate and then visually search materials published in books, journals and other printed materials. This was a time-consuming and imperfect process. See the video at slide 10 in [Ref1: Berg IEEE Presentation].

When DIALOG became available in 1966, it was able to automate research work for scientists and engineers at NASA, and later at other government agencies. By 1972, its commercial introduction extended these capabilities to all professions by allowing online access to large collections of digitized materials by way of a command language that allowed the searcher to iteratively refine results.

DIALOG's Ability to Allow Iterative Refinement of Results
DIALOG's major technical innovation is reflected in its name: it enables a conversation between the searcher and the computer. Thus, DIALOG has an interactive user interface that allows for iterative searches to be performed in what is in effect a dialogue between a user and the DIALOG system.

"Search at its best is a conversation ... an iterative, interactive process where we find we learn." (Search Patterns, 2010, Peter Morville, p. 9.)

DIALOG's language allows for this interactivity, as shown in the following excerpt from Roger Summit's 1967 ACM paper [Ref2: ACM Paper]:
"There are five important characteristics of the DIALOG language:
• The search question is constructed at search time (rather than at index time as is the case with a manual system).
• DIALOG is designed for nonspecialists; i.e., the users themselves, and thus avoids one communication barrier.
• The command language is independent of the particular data it searches.
• As an on-line system, it allows continual redefinition of the search question, based on examination of intermediate results.
• Control of the process lies with the user; the computer merely serves as a data-processing extension of the user."

Overview of DIALOG's Language
The following overview is distilled from Roger Summit's 1967 ACM paper [Ref2: ACM Paper]:

The DIALOG system provides a number of commands with which the searcher interacts with the computer. A search consists of (1) identifying and (2) selecting terms and phrases that reflect the user's interest, (3) combining descriptors into search expressions, and (4) examining retrieved citations and modifying search expressions. Each of these functions is accomplished with a particular command. The four principal commands are EXPAND, SELECT, COMBINE and DISPLAY.

EXPAND with a term provides a list of synonyms, related terms and similar but misspelled terms found in the database allowing the searcher to home in on the exact combination of words defining his/her interest. The listed terms are numbered to allow the searcher to SELECT a list of terms (term a, term b, term c), a range of terms (term a - term d), or a list of ranges and terms. The result of a SELECT command with a list of terms is a numbered Set representing a subset of the database documents containing these terms. COMBINE with a Boolean expression of Set numbers provides a numbered Set of database documents corresponding the the Boolean specification. DISPLAY with a SET number calls up and allows the searcher to successively display documents contained in the resultant Set allowing the searcher to determine the success of the search so far. Based on this feedback from the database, the searcher may continue to develop additional sets recursively or simply print out desired resulting documents.

The Origins and Early History of DIALOG
The paragraphs in this section are from pp. 72-74 of Elliot King's Free for All: The Internet's Transformation of Journalism, 2010 [Ref3: King Book]:

"The idea that companies could put their computer expertise to work for others had many ramifications. One possibility that pre­sented itself was that efficient, centralized computers could man­age access to and retrieval of information from vast storehouses of information. In 1960, Roger Summit, a doctoral student at Stanford University, took a summer job at Lockheed Information Sciences Laboratory, where he was assigned to work on problems of information retrieval under the supervision of E.K. Fisher, the director of information processing. The central issue was how to locate and retrieve stored information in a cost-efficient, timely manner. At the time, according to Summit, the feeling was that it was often easier to redo scientific research than it was to deter­mine if it had been done before.

In the course of his assignment, Summit encountered the work of H. Peter Luhn, a researcher at IBM who had invented two sig­nificant schemes for the large-scale management of information — Key Word In Context (KWIC) indexing and Selective Dissemina­tion of Information (SDI). In 1964, at Summit’s urging, Lockheed established a laboratory to study the application of these technologies. A project team of six led by Summit set out to create a technology that could facilitate efficient information retrieval. Among the criteria he established were that the system had to be usable by end users without the intervention of computing staff and it had to be interactive and recursive so that searchers could immediately see their results and modify their queries accord­ingly. Finally, researchers wanted to include an alphabetical list of searchable terms near a desired term and the number of items in the database containing that term.

By 1965, the team developed the prototype of what became the DIALOG Information Service. To test the system, Summit submitted an unsolicited proposal to apply DIALOG to NASA’s Scientific and Technical Aerospace Reports (STAR) ..., a database with 200,000 citations ... that was in great demand. NASA had been established by the Space Act of 1958 to spearhead America’s drive into space, and part of its mandate was to disseminate infor­mation about its activities and findings as widely as possible. From its inception, the agency aggressively indexed books, reports, and research concerning aerospace, and in 1962, NASA's staff, working with a contractor, started entering the bibliographic citations into a computer.

When Summit discovered a contract had already been awarded to a competitor, he proposed a smaller, less expensive parallel project as backup if the competitor failed. For the test, Summit leased a data line from the Lockheed offices in Palo Alto, California, to the NASA Ames Research Center. The test was conducted in January 1967. The turnaround time for a query was cut from fourteen hours when conducted at NASA headquarters to just a few minutes using the DIALOG system.

Based on that success, Lockheed won a [competitively bid] $180,000 contract from NASA to build what was called the Remote Console Informa­tion Retrieval system, or NASA RECON. This was followed by contracts to install DIALOG at the Atomic Energy Commission (AEC) and the European Space Research Organization (ESRO) and, in 1969, a con­tract to provide the U.S. Office of Education (USoE) with a retrieval ser­vice on the Educational Resources Information Center (ERIC) database."

Much of the above is also discussed in [Ref1: Berg IEEE Presentation] (including the NASA-RECON video in slide 14), [Ref4: Summit Thesis], [Ref5: World Encyclopedia], [Ref6: AIIP Newsletter] and various of the Roger Summit documents cited by Google Scholar [Ref7: Summit Citations].

In addition, transcontinental use of DIALOG was first possible in June 1970 by way of a satellite link that connected Paris and Oakridge National Laboratory in TN with Lockheed in Palo Alto, CA, for access to the AEC and NASA databases. The plan for a satellite link is noted in the ESRO video at slide 16 in [Ref1: Berg IEEE Presentation].

DIALOG's commercial availability in 1972
In 1972, Lockheed launched the world’s first commercial online service as DIALOG Information Retrieval Service, named after its language. Because interactive access to bibliographic databases of scientific and technical information is of great value to many organizations, the initial service provided users in Europe and the U.S. with access to the ERIC (Educational Resources Information Center) and NTIS (National Technical Information Service) databases, as well as the PANDEX science citation index. At its launch, Dialog had six customers." [Ref3: King Book], [Ref5: World Encyclopedia], [Ref8: History and Heritage] and [Ref9: History of Info Science]

DIALOG became the most comprehensive online information service in the world by 1985
• “By 1985 DIALOG had become the most comprehensive online information service in the world, with more than 200 separate databases in business and economics, chemical, patent and trademark information, science and technology, medicine and the biosciences, news and current events, education, directories, energy and the environment, law and government, computer science and microcomputers, books, the social sciences, and the humanities.” [Ref5: World Encyclopedia]
• “By 1985 DIALOG Information Services, Inc., with Summit as president, offered more than 100 million records on many subjects from more than 200 different databases to its many customers in several countries (Camp, 1985).” [Ref9: History of Info Science]

DIALOG's Use in Libraries
DIALOG dramatically expanded the research capabilities of libraries; it changed the outlook, careers, and perspective of the library and information professionals who used the service; and it provided expert searches for their constituents. As its popularity grew, the world's significant libraries (including the National Library of Medicine, or NLM) were among the first to integrate DIALOG into their research and reference offerings:

“West Coast research centers such as the Rand Corporation, the System Development Corporation, and Lockheed Missiles and Space Corporation, as well as some universities, entered the mainstream of online retrieval through their research projects and by providing leadership to the national establishments such as NASA and the NLM. Two of our featured specialists come to mind: Roger K. Summit of Lockheed’s DIALOG, who was instrumental in applying Lockheed’s techniques to the NASA-RECON online bibliographic retrieval system; and Carlos A. Cuadra of the SDC, who administered the ORBIT II in the NLM’s AIM/TWX experiment.” [Ref9: History of Info Science]. See also the NASA-RECON video in slide 14 of [Ref1: Berg IEEE Presentation].

DIALOG's Use Throughout the World
• Virtually all business segments have used and continue to use DIALOG for searching, including libraries, investment banks, consumer companies, chemical, pharmaceutical, medical, engineering, biology, social sciences, humanities and aerospace companies, government agencies, patent offices such as the European Patent Office (EPO), the Japanese Patent Office (JPO) and the US Patent and Trademark Office (USPTO), and academia. Researchers, executives and professionals of all types were exposed to the speed, precision, and depth of DIALOG searching.
• The vast amount of data from various sources led to unique database formatting to accommodate bibliographic, directory, and specialized intellectual property searching. As many of these database producers organized their tools via sophisticated controlled vocabularies, DIALOG was a pioneer in creating a value-added online approach to controlled vocabularies (metadata/taxonomies/ontologies). Some of the most important vocabulary schemes were loaded on the DIALOG system, including the CAS Registry, the Medical Subject Headings (MESH) of the National Library of Medicine, plus the controlled vocabularies of tools varying from education (Educational Resources Information Clearing House or ERIC) to technology (the INSPEC database) to mention a few.
• The voluminous repository of data available for searching as of the late 1990s is shown by the many hundreds of database names in a pair of documents that organize these names both alphabetically and by subject area. [Ref10: DIALOG Databases Late 1990s] • DIALOG, operating as ProQuest Dialog as of 2018, is part of the US Patent Examiner's Toolkit. [Ref11: USPTO Databases] • The databases supported by ProQuest Dialog as of 2018 remain extensive, and include the patents of nearly 30 countries. [Ref12: ProQuest Dialog Databases]

DIALOG and the Advent of Internet Search Engines
DIALOG retained its usefulness even with the widespread availability of free internet search engines such as Lycos, Infoseek, AltaVista, Yahoo! and Google starting in 1993, as shown in these 1998 observations by Roger Summit as excerpted from [Ref8: History and Heritage]:

“With the rapid growth of the Web, some have been predicting the demise of traditional online services. I don’t agree. Recently, I was doing some research in preparation for a speech I presented in Stockholm. I determined that DIALOG contains more than twenty times the total amount of information accessible through the Web. Furthermore, the two have grown at roughly the same rate over the past year, based on AltaVista statistics.

In addition to comparing the quantity of information on DIALOG and the Web, I compared the quality of search results for several topics using DIALOG and the AltaVista search engine. I’m sure it will come as no surprise that the DIALOG results were highly relevant, while the AltaVista results were, to be generous, somewhat encyclopedic in nature. I found that it was difficult and often impossible to do a comprehensive and in-depth review of a particular topic on the Web.

It’s somewhat ironic that with the phenomenal growth of the Web and concomitant advances in interface design, Web search engines lack even the most rudimentary features that were basic in the first online retrieval system we designed thirty years ago—such features as field specification, display of index terms, or options to allow one to refine a search.”

DIALOG has changed ownership over the years, but it remains an important research tool
Over the years, DIALOG has undergone changes in ownership and name. The following chronology is shown in slides 29-36 of [Ref1: Berg IEEE Presentation]:
• Lockheed spun off DIALOG in 1981 as the wholly-owned subsidiary Dialog Information Services, Inc., with Roger Summit as President and CEO until 1992.
• Knight-Ridder acquired DIALOG in 1987 for $353 million via a Goldman-Sachs auction, and it was operated as Knight-Ridder Information until 1997.
• London-based M.A.I.D., LLC purchased DIALOG in 1997, and it was operated as The Dialog Corp. until 2000.
• Thomson acquired DIALOG in 2000, and it was operated as Thomson Dialog until 2008.
• ProQuest acquired DIALOG from Thomson Reuters in 2008. The service has been marketed as ProQuest Dialog™ since that time. [Ref13: ProQuest Dialog Brochure]

What obstacles (technical, political, geographic) needed to be overcome?

An early obstacle was the usual reluctance of scientists, engineers, attorneys, educators, librarians and others to trust the results of DIALOG searches. That reluctance faded quickly with time and experience.

Other obstacles included difficulty in obtaining necessary administrative support as well as obtaining access to the necessary computer hardware in the days when such resources were expensive, in particular for access by multiple users and within a government entity.

In addition, users were not used to interacting with a computer. As described above, the DIALOG language and system allowed for this interactivity, and thereby provided the means for the "iterative refinement of results." This interactivity and the ability to iteratively refine search results is of course why internet search is so widespread today.

A case study on the use of DIALOG to search the Educational Resources Information Center (ERIC) "document file" (database) was performed at Stanford University in 1969. [Ref14: Stanford Study] DIALOG was used "to see if individuals could sit down at a terminal and, with little preliminary instruction, use such a system to locate relevant educational research documents." Stanford Study at p. 5. Upon first using the system, a Stanford professor stated "... to have right there at your fingertips all the volumes of Research in Education, rather than having first to find the right volume and then the right number -- just the physical juggling of those cumbersome volumes is obviated by this, so you can work a lot faster." Stanford Study at p. 3.

How DIALOG allowed NASA to overcome the problem of finding documents within its huge data catalog is described in an August 26, 1969 newspaper article titled "Computer tells NASA scientists where to find space data" notes that DIALOG "helps the scientist refine his query as he makes the search." [Ref15: NASA 1969 Story] Note that this story was published about five weeks after the first lunar landing by Apollo 11.

What features set this work apart from similar achievements?

DIALOG was the pioneer in on-line database literature searching and retrieval. It provided the first valuable tool and experience for scientists, engineers, attorneys and research librarians to examine literature quickly and at minimal cost. DIALOG contributed to the key technical structure and feature aspects of online information retrieval, including Boolean, proximity, field structure and search, specialized inverted and linear indexes, large scale telecommunication front-ends, multiple state-of-the-art processors, and massive storage. [Ref2: ACM Paper] and [Ref3: King Book]

DIALOG's Predecessors
Several computer-based search activities preceded DIALOG, but none of these went substantially beyond an experimental phase. The following excerpts from [Ref5: World Encyclopedia] describe several representative experiments:
• 1951-1954: “Charles Bourne observed that ‘an investigation of online bibliographic searching was first made by Bagley in 1951’ with the development of a program for a computer at the Massachusetts Institute of Technology ‘to search encoded abstracts.’ Bourne noted that ‘application of the computer to bibliographic searching was first demonstrated in 1954 in the form of batch searching.’ “
• 1954-1964: “Over the next 10 years, many research and development efforts culminated in the development of ‘batch’ searches of bibliographic databases offered by a limited number of special libraries. Search analysts coded requests sent to them for literature searches. Several searches were then batched, or run consecutively, to make the most efficient use of the computer’s time. Several weeks generally passed before the requestor received any result. One batch retrospective search service, the Medical Literature Analysis and Retrieval System (MEDLARS) of the National Library of Medicine (NLM), was made available to the general public in 1964.”
• 1960: “Systems Development Corporation (SDC) demonstrated the first interactive online system, Protosynthex, developed by Robert Simons and John Olney, in 1960. Using a terminal wired directly to the computer, Protosynthex allowed access to the full text of the Golden Book Encyclopedia with the ability to search for the occurrence of terms in proximity with each other and to search for truncated forms of words, but not to combine terms with the use of Boolean logic.”
• 1964: “Another online retrieval system was developed at SDC in late 1964 by Harold Borko, H. P. Burnaugh, and W. H. Moore. The system, Bibliographic Organization for Library Display (BOLD), was developed for browsing literature citations on magnetic tapes. It was first publicly demonstrated about a year later and was one of the first systems capable of displaying an online thesaurus. In November 1964 SDC first demonstrated an online system that nearly achieved the interactive capability today’s users enjoy, Language Used to Communicate Information System Design (LUCID), developed for SDC by E. Franks and P. A. DeSimone.”
• 1965: “ ‘The first demonstration of an online retrieval network, on a national scale,’ according to Bourne, ‘was probably made in 1965 by SDC in an experiment ... to provide 13 organizations with access to some 200,000 bibliographic records on foreign technology.’ This work was done by SDC-Dayton for the Foreign Technology Division of Wright-Patterson Air Force Base, Ohio.”

Developed in parallel with DIALOG were the following non-commercial and non-interactive services:
• 1967-1972: “SDC was instrumental in the development of NLM’s online information service, MEDLINE (MEDLARS ON-LINE). In late 1967 NLM experimented with SDC’s Online Retrieval of Bibliographic Information Timeshared (ORBIT) retrieval language to search NLM’s database of 10,000 citations on neurology. In May 1970 SDC began operating the Abridged Index Medicus (AIM)/TWX online information system on behalf of NLM. In October 1970 NLM introduced MEDLINE as a free service on its own computer facilities with a database of more than 400,000 citations while allowing the AIM/TWX service to continue with SDC. In February 1972 NLM utilized TYMNET, the first public telecommunication network, for access to MEDLINE.” [Ref5: World Encyclopedia]

Supporting texts and citations to establish the dates, location, and importance of the achievement: Minimum of five (5), but as many as needed to support the milestone, such as patents, contemporary newspaper articles, journal articles, or chapters in scholarly books. 'Scholarly' is defined as peer-reviewed, with references, and published. You must supply the texts or excerpts themselves, not just the references. At least one of the references must be from a scholarly book or journal article. All supporting materials must be in English, or accompanied by an English translation.


Supporting materials (supported formats: GIF, JPEG, PNG, PDF, DOC): All supporting materials must be in English, or if not in English, accompanied by an English translation. You must supply the texts or excerpts themselves, not just the references. For documents that are copyright-encumbered, or which you do not have rights to post, email the documents themselves to ieee-history@ieee.org. Please see the Milestone Program Guidelines for more information.


Please email a jpeg or PDF a letter in English, or with English translation, from the site owner(s) giving permission to place IEEE milestone plaque on the property, and a letter (or forwarded email) from the appropriate Section Chair supporting the Milestone application to ieee-history@ieee.org with the subject line "Attention: Milestone Administrator." Note that there are multiple texts of the letter depending on whether an IEEE organizational unit other than the section will be paying for the plaque(s).

Please recommend reviewers by emailing their names and email addresses to ieee-history@ieee.org. Please include the docket number and brief title of your proposal in the subject line of all emails.