The Web-Brain Hypothesis

By Kristina Lerman
First created: April 12, 1998

In a previous essay I argued that the Internet can and should be thought of as an organism. What organism, or part of an organism, does the Internet best resemble? At first glance, the Internet, and more specifically the Web, resembles the human brain. It is a surprisingly robust metaphor, that could benefit the way we treat and use the Web. The idea that the Internet can be thought of as a brain, was first proposed a number of years ago (Source: Mayer-Kress and Barczys) when the Web was still in its infancy. The dramatic explosion in the size and complexity of the Internet in the intervening years, driven primarily by the growth of the Web, motivated me to reexamine the premise and to see what evidence might support it.

 

The Web and the Brain

It is generally accepted that most, if not all, functions of the brain, such as perception, thought, learning, are a result of its architecture as a highly connected network of neurons. All of communications between neurons occur at the specialized connections between them called synapses. The number and density of synaptic connections is very high in the brain --- each of approximately one trillion (1012) neurons in the human brain forms about 1000 synaptic connections and may receive even more. (Source: Kandel and Schwartz, Principles of Neural Science, 1985 (New York: Elsevier)) Though neurons themselves may be involved in information processing, most of what makes the brain perform its unique functions happens at the synapses. Synapses, the junctions between dendrites and axons of one neuron and the body of another neuron, mediate the signal transfer, or information transfer, between neurons through an electric current or release of neurotransmitters. Input received by the neuron, might in turn cause it to fire, that is to trigger information to flow to the neurons it is connected to. More importantly the chemical synapses, which constitute the majority of synapses in the brain, are plastic, that is their effectiveness at transmitting information from one neuron to the next may be modified for a long time by the history of previous activity. This kind of long term modification, or plasticity, of synapses is believed to be the underlying basis of the brain's ability to learn (Source: Kandel and Schwartz, ibid.). Moreover, new research indicates that the adult brain is capable of growing new cells in response to environmental stimuli (Source: Gould, McEwen, Tanapat, Galea, and Fuchs, J. of Neuroscience, 17 (Number 7), 1997; also Holzenberger, Jarvis, Chong, Grossman, Nottebohm, and Scharff, J. of Neuroscience, 17 (Number 18), 1997). This mechanism could also play a role in learning.

 

Organization of neurons is important as well - there are regions of the brain dedicated to specific purposes or activities. In other words, neighboring neurons often perform the same function. In addition these neurons are connected to others performing related functions. The evidence for this view is quite strong - localized damage to the brain, such as that caused by a stroke or a gunshot wound, makes a person lose a very specific function, e.g. the ability to name colors, while seemingly not affecting other faculties.

 

While the morphology and chemistry of neurons and synapses are indisputably complex, the details of structure and interactions are of less importance to the behavior of the brain as a whole. In other words, we may find other many-unit, highly interconnected systems which may exhibit interesting, brain-like behaviors on a large scale. Coexisting populations of interacting species, an ecosphere, might be one example of such a system, the Internet could be another. The Internet is made up of tens of millions of computers (Source: http://www.bitwise.net/~maclearn/shistory.html), that communicate with each other through electrical cables or wireless signals. The Web is made of Web servers and computers delivering Web pages, images and movies, to clients (surfers using Web browsers) using Hypertext Transfer Protocol (HTTP). In 1996, there were an estimated 400,000 Web servers delivering about 50 million pages (Source: Internet Archive). As of the beginning of 1998 the size of the Web was estimated to be about 300 million pages (Source: Steve Lawrence and Lee Giles, as reported on http://www.news.com/News/Item/0,4,20728,00.html?st, also see K. Bharat and A. Broder in http://www.research.digital.com/SRC/whatsnew/sem.html). This number is increasing by about 20 million pages per month (Source: K. Bharat and A. Broder, ibid.) as more and more human knowledge and experience is digitized. Each Web page as a rule contains hyperlinks, which users click on to navigate to other Web pages. The average number of hyperlinks per page is not known, but it must be significant, because hyperlinks are the primary mode of navigating the Web.

 

The Web as a Brain

I propose that it is useful to imagine Web pages in the role of neurons, and hyperlinks as synapses that direct the flow of information from one page to the next - in short, the brain as a metaphor for the Web. Each neuron receives a stimulus - either a voltage spike across the electrical synapse, or some amount of neurotransmitter released across the chemical synapse - and depending on its state and processing capacity, the neuron may in turn generate a signal that will be received by other neurons. I will postulate that the signal received by a web page is proportional to the number of "hits" or users seeing it. Depending on the content of the page, for example Web page authorís annotation of the outgoing hyperlinks, users may follow hyperlinks to other pages. The text (or pictures) in and around the hyperlink plays the role of the neuronís presynaptic environment. If the text indicates that the page pointed to by the hyperlink is considered to be important by the author, the user is more likely to follow that link than another. As the page changes with time, the author may attach less (or more) importance to a hyperlink. Plasticity of the presynaptic environment plays a big role in the brainís ability to learn.

Organization of web pages mirrors the structure of the brain. Pages on the same server, just like neurons in the same area of the brain, are more likely than not to deal with the same subject. Moreover, each page is connected through hyperlinks to other pages on similar subjects. As a matter of fact, the link structure that develops over time between Web pages on similar subjects is so information-rich that it has been used to identify authoritative information sources as well as communities of interest (Source: J. Kleinberg, Authoritative Sources in Hyperlinked Environment; D. Gibson, J. Kleinberg and P. Raghavan, Inferring Web Communities from Link Topology.)

In addition to simply responding to the external environment, the Web could well be undergoing long-term changes in the number of pages, or how the hyperlinks are stressed on a page - in other words, the Web could be learning. Hyperlinks on popular gateways to the Web, such as Yahoo, are plastic. Current events generate many hyperlinks, and if the event is especially newsworthy, the links will grow in number, their explanatory text might change, and with time they might be moved to a different portion of the site, though usually not disappear.

   

Is the Web a Brain? - Case Studies

What experiments can be constructed to support the "Web as a brain" hypothesis? While no metaphor can be proven to be true, it should at least be complete and self-consistent. I will try to present evidence to demonstrate that the Web acts in a manner consistent with how the brain acts. The direction I take is to examine how the Web reacts to external events. Neurons respond to stimuli with a characteristic spike; therefore, web page activity, as measured by the number of hits, or user views, it receives, should show a similar spike. Other important brain functions as its ability to learn, or adapt to the environment, and to remember. Below I propose a quantitative way to characterize learning and memory on the Web.

 

I. Case Studies of Web server traffic

The environment of the Web is also our environment. Therefore, events that affect us will be considered as stimuli for the Web. The best way to measure the "neuronal" activity of the Web is to monitor traffic to Web sites delivering information about a specific world events, for example, the NASA site during an historic space mission. Unfortunately, most and especially commercial sites, consider server statistics to be proprietary information. Even with these barriers there is enough information about web server statistics in the course of outside events to draw some conclusions. I present below a few examples of Web server traffic patterns that seem to emulate the spiking behavior of a firing neuron.

The Oscars:

Traffic on the 1996 Academy Awards Web site grew from 150,000 hits a day when the server went online to more than 10 times that number, about 1.6 million, around the time of the Oscar award presentation.

(Source: http://www.intergraph.com/ics/interserve/oscars/stat.htm)

Likewise traffic on the official 1997 Academy awards presentation web site, "... oscar.com is expected to average from 1 million to 2 million page views a day, before peaking at 10 million in the days immediately preceding the March 24 (1997) telecast." (Source: http://www.microsoft.com/sitebuilder/archive/features/Oscar0220.htm)

Kasparov vs Deep Blue:

The Kasparov chess rematch against IBM's Deep Blue supercomputer in May 1997 was one of the more popular web events. The official chess web site, www.chess.ibm.com was a treasure trove of game results, commentary and background information. The web site "...received more than 74 million hits representing more than 4 million user visits from 106 countries during the nine-day event. During Game 3 on May 6 the site got 21 million hits... Hits declined slowly right after the event ended, and hovered around 20,000 a day through the end of May or so." (Source: private communication)

 

Mars Pathfinder landing:

To this date, the most popular Web event has been Mars Pathfinder landing and its rover Sojourner's exploits on the surface of Mars. The official NASA Pathfinder site, mpfwww.jpl.nasa.gov, received a total of "566 million hits in 30 days with the greatest activity being 47 million hits on July 8." (Source: A Month on Mars and the Pathfinder is Declared a Total Success , August 9, 1997, New York Times) Just as in the two previous examples the number of visits to the site grew rapidly on the days of and immediately following the event, in this case Pathfinder landing, and dropped off afterwards. "On the day Pathfinder was launched, Dec. 4, 1996, the site recorded only 320,000 hits. But on this past July 7, the site experienced a record 80 million hits. As of Wednesday (7/9/97), traffic had dropped to 40 million hits a day." (Source: Mars Landing Is a Big Hit on the Web, July 10, 1997, New York Times) Another article reported that "the number of visitors (to the official NASA site) increased tenfold between 9 p.m. and 10 p.m. on July 4, the night of the Pathfinder landing, and stayed strong through the week."(Source: Mars Pathfinder Landing Was Defining Moment for Net, July 14, 1997, New York Times)

The increased levels of traffic have also been felt at the numerous mirror sites set up around the world (which grew from 24 on July 9 to 52 as of September 15), but also at other related sites. "Visitors to CNN's site jumped 40 percent; and ABC's two-month-old site recorded a 12 percent increase in traffic." (Source: Mars Pathfinder Landing Was Defining Moment for Net, July 14, 1997, New York Times) Discovery Channel Online also experienced an increased number of visits.

Discussion of results:

In every example, the relevant Web pages received a small number of hits during the time preceding the event of interest. The number of hits increased quickly during the high point of the event, and then decreased to background levels afterwards. Moreover, related sites, i.e. those providing similar information about the event, also recorded big increases in the number of hits. Though in most of the cases above I do not know the rate of increase of hits, the profile of the spike, qualitatively the response metric suggests the behavior of a firing neuron.

 

II. Growth in the number of pages

Another method to sample response of the Web to environmental events is to keep track of the number of Web pages created about a particular subject. Querying search engines on a topic is one way to measure of the number of pages dedicated to a topic. Search engines, such as Hotbot and AltaVista, use spiders, programs that are designed to follow links from page to page and download their contents to a central database, which may later be searched by users. Though each search engine's coverage of the Web is incomplete, the engines usually show similar trends. When looking at the number of pages, one has to take into account a time lag between when a page is created (or updated) and when new content appears in the search engine's database. For the more popular search engines this time lag is at least one week, usually two.

The data presented here was collected by querying some of the most popular search engines, such as Hotbot, AltaVista, Excite and Lycos. The queries were expressed to require all of the words to be present in the documents. The following query syntax was used --- on Hotbot and Lycos, the "documents containing all words" option was selected; on AltaVista Boolean operator 'AND' inserted between terms in the Advance Search option, and for Excite search queries, every word was prefixed by '+', meaning that the term is required in the document. During the course of the study, some search engines substantially changed their interfaces, and I suspect their Web page cataloguing methods have changed as well. The most noticeable change in the number pages retrieved occurred for Hotbot on Sep. 15. Data for Hotbot is not shown after that date.

The Hotbot, Excite and Lycos catalogue of web pages were searched using keywords "Kasparov Deep Blue rematch", with all of the words required to be present. Lycos data is multiplied by 4 before being plotted, and Excite data are plotted against the right hand y-axis. The famed match took place the first week of May 1997.

 

 

Number of Web pages containing the term "Tamagotchi." This is the "virtual pet" made by the Japanese company Bandai, introduced in Japan in November 1996, and in the U.S. in May 1997. Lycos data are plotted vs the right hand y-axis.

 

The query "Mario Cuomo", all words required to be present, was used as a control. The number of Web pages returned by the engines in response to the query does not significantly vary with time. Had Cuomo announced his candidacy for president, the plot would have been very different.

 Discussion of results:

While the number of pages dealing on a particular topic shows the same trend as Web server statistics, the results are not very conclusive. Some of the difficulties with the data arose because changing indexing algorithms for Hotbot and AltaVista made long range comparison between number of pages impossible. Still, the query "Kasparov Deep Blue rematch" shows a significant rise in the number of pages following the match in early May. Curiously, all of the search engines show the same relative profile. Even after the drop off, there is still a significant number of sites mentioning the match, an indication of long-term memory perhaps? In contrast the query about Mario Cuomo did not show significant variation in the number of pages over time. This is to be expected, since Cuomo did not make any important announcements. Any trend visible in the data is the result of search engine idiosyncrasies. The growth number of pages dedicated to virtual pet Tamagotchi shows that it was a very important event, but there is no reason the Web should not be as irrational at times as the human brain.

 

III. Growth in the number of hyperlinks between pages

Neurologists have strong evidence that learning is accompanied by changes in the brain. Specifically, the number of connections between neurons changes in response to external, and especially repeated, stimuli. Do the links pointing to interesting pages, i.e. pages with information about world events of interest, change with time, and if so, how do they change? Both AltaVista and Hotbot engines support a search method that reports the number of pages that contain hyperlinks to a given URL. Among other things, it is a handy way to find out who is linking your page to their site.

http://www.chess.ibm.com is the official home page of the Kasparov Deep Blue rematch. The number of pages referencing this site grew immediately after the beginning of the match in early May, 1997, but the interest dropped off by the end of that summer. Changes in indexing methods of the search engines prevented me from seeing whether an equilibrium state was reached and whether it had a non-trivial number of links. Though I cannot quantitatively compare this result with the number of synapses in the learning brain, the qualitative trend is correct.

IV. Usenet posts

Another interesting response metric to consider is the number of Usenet (newsgroup) messages posted on the subject. Usenet is a network of information parallel to the Web, though the distinctions are growing less clear. Both AltaVista and Hotbot archive Usenet messages (again, I am not sure of the degree of coverage). Dejanews is another search engine dedicated to archiving newsgroups and allowing users to search through its database. The query syntax is the same for Usenet searches as for Web searches, and for Dejanews I used the default search option, which was to require all words to be present. When plotted against time, the number of Usenet messages retrieved in response to a query shows the spiking signal seen in the Web server traffic plots. If the number of people who post messages is some set fraction of the number of people perusing Usenet groups, then an increase in the number of posts indicated an increase in the number of readers. This is consistent with an increase in server traffic on the Web.

 

Number of Usenet posts containing terms "Kasparov Deep Blue rematch."

  

Number of Usenet posts containing the term "http://www.chess.ibm.com".

Number of Usenet posts containing the term Tamagotchi. Hotbot data is plotted vs the right hand y-axis. 

Number of Usenet posts containing the terms "Mars Pathfinder."

 Discussion of results:

The Usenet data very closely resembles the spiking behavior observed in Web server traffic statistics. It is interesting to compare the number of pages dedicated to the virtual pet Tamagotchi, with the number of messages posted on the topic. Unlike the first quantity, the number of messages decreased shortly after a sharp rise in the number of messages following the introduction of the virtual pet in the U.S. on May 1, 1997. Usenet posts measure the direct but transient response to the outside world, while the number of pages measures something akin to the long term memory of the topic.

 

V Other Metrics

The number of sites that mirror a specific site, or a page, may also be an interesting measure of the Web's response to external events. A mirror site is updated regularly to provide the same information as the parent site and its purpose is to relieve congestion on a parent site, or if the parent site suddenly becomes unavailable, either because of server or network failure, to replace it in providing the necessary information. In other words, mirroring ensures that popular and important information can be accessed by all. This level of redundancy and duplication must occur in the brain to ensure the critical functions are not lost when some neurons are damaged. On the Web, the redundancy might be important for memory.

The chart below shows the number of mirror sites set up for the Mars Pathfinder Mission. The number of sites grew presumably from zero before the landing, to a maximum of over 50 in the weeks following the landing that took place in the first week of July, then very slowly decreased over the period of seven months. Many Web pages are unofficially mirrored, when users, upon finding interesting information and fearing that the link will disappear to be replaced with the dreaded "404 Document Not Found" message, make a copy of the page available from their Web server.

 

 

Conclusions and the Future

The Web as a Brain metaphor holds that Web pages may be compared to neurons in the brain, and hyperlinks to synapses between neurons, if the users of the Web are included when talking about the Web as a whole. The metaphor is supported by such Web phenomena as a sudden growth and abrupt decay in the number of visits, or page hits, to sites dedicated to popular events, such as the official Mars Pathfinder page. It also helps us understand the patterns in the growth of Web pages dedicated to a topic, number of Usenet postings, Internet mirror sites, as cited in the case studies above. The metaphor may even be extended to higher brain functions, such as learning, which is based on synaptic modification in the brain. The changeable (pre-)synaptic environments of the "neuron" pages on the Web are the text and graphics surrounding hyperlinks. On directory sites such as Yahoo and current events sites such as newspapers especially, over time some links are emphasized while others are de-emphasized in a manner consistent with learning.

 

While there are many similarities between the structure and behavior of the brain and the Web, there are also some differences. The active nature of neurons is missing from the Web pages at present. The neuron takes the information it receives, processes it in some way, and responds to it. I believe that the next and logical step in the evolution of the Web is to make Web servers more intelligent and dynamic. If there is a sudden increase in the number of hits on a Web site, something should be done with this important information. For instance, if a Federal Emergency Management Agency receives a great number of hits to its Earthquake information page for Southern California, it is probably a good idea to send users to a disaster relief page. Or, if users regularly follow only one link from a page, it might be a good idea for a Web server to make a suggestion for newcomers to follow other surfers to that page. Such dynamic response is getting incorporated into applications (see, for instance, Alexa), but we are still a long way away from the day when servers become smarter and are able to adapt quickly to patterns of use. Once the technology achieves this level dynamism, the spark of life, which is missing from the present configuration of the network, will be ignited in the Web.

Consciousness is a feature of the human brain. As the Web becomes more complex and dynamic, I believe a kind of global consciousness will emerge from the interactions between Web servers and users; moreover, as various observation satellites, Web cams and the like, get hooked in to the network, the world will acquire self-consciousness. The challenge will be to recognize its emergence and use it in constructive ways. Potentially, the Web could be a much richer "brain" than even a human brain will ever be. Web pages are extremely rich in information, and Web servers may be programmed to respond in a far more complex ways to incoming signals that the brain does. Ultimately the Web may manifest far more nuanced learning and response behaviors, and perhaps even life.