|
Skill Level: Novice
Article by: eFactory
Google Dance and DNS
Not only Google's index is spread over more than 10,000 servers,
but also these servers are, as of now, placed in eight different
data centers. These data centers are mainly located in the US (i.e.
Santa Clara, California and Herndon, Virginia), indeed, in June
2002 Google's first European data center in Zurich, Switzerland
went online. Very likely, there are more data centers to come, which
will perhaps be spread over the whole world. However, in January
2003 Google has put a data center on stream which is again located
in the US.
In order to direct traffic to all these data centers, Google could
thoeretically record all queries centrally and then send them to
the data centers. But this would obviously be inefficient. In fact,
each data center has its own IP address (numerical address on the
internet) and the way these IP addresses are accessed is managed
by the Domain Name System.
Basically, the DNS works like this: On the Internet, data transfers
always take place in-between IP addresses. The information about
which domain resolves to which IP address is provided by the name
servers of the DNS. When a user enters a domain into his browser,
a locally configured name server gets him the IP address for that
domain by contacting the name server which is responsible for that
domain. (The DNS is structured hierarchically. Illustrating the
whole process would go beyond the scope of this paper.) The IP address
is then cached by the name server, so that it is not necessary to
contact the responsible name server each time a connection is built
up to a domain.
The records for a domain at the responsible name server constitute
for how long the record may be cached by a caching name server.
This is the Time To Live (TTL) of a domain. As soon as the TTL expires,
the caching name server has to fetch the record for a domain again
from the responsible name server. Quite often, the TTL is set to
one or more days. In contrast, the Time To Live of the domain www.google.com
is only five minutes. So, a name server may only cache Google's
IP address for five minutes and has then to look up the IP address
again.
Each time, Google's name server is contacted, it sends back the
IP address of only one data center. In this way, Google queries
are always directed to different data centers by changing DNS records.
On the one hand, the DNS records may be based on the load of the
single data centers. In this way, Google would conduct a simple
form of load balancing by its use of the DNS. On the other hand,
the geographical location of a caching name server may influence
how often it receives the single data centers' IP addresses. So,
the distance for data transmissions can be reduced. In order to
show the DNS records of the domain www.google.com, we present them
here by the example of one caching name server.
How data centers, DNS and Google Dance are related, is easily answered.
During the Google Dance, the data centers do not receive the new
index at the same time. In fact, the new index is transferred to
one data center after the other. When a user queries Google during
the Google Dance, he may get the results from a data center which
still has the old index at one point im time and from a data center
which has the new index a few minutes later. From the users perspective,
the index update took place within some minutes. But of course,
this procedure may reverse, so that Google switches seemingly between
the old and the new index.
Contents

++ Back to top ++
|