ZDNet has a great article on ZDNet Australia about the inner workings of Google: 30 clusters of up to 2,000 PCs each churning away day & night, breaking down regularly but with such redundance that we never see it. Go read The magic that makes Google tick to find out more about the numbers: Over four billion Web pages, each an average of 10KB, all fully indexed. Up to 2,000 PCs in a cluster. Over 30 clusters. 104 interface languages including Klingon and Tagalog. One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue. Sustained transfer rates of 2Gbps in a cluster. An expectation that two machines will fail every day in each of the larger clusters. No complete system failure since February 2000 As proof that everything old is new again, we have the new search engine clusty.com, which gives results clustered into folders on the left-hand side of the screen. Those who've worked the Web as long as I have will quickly recognize the resurrection of an idea pioneered so well by Northern Light before they inexplicably chose to abandon the general search engine fray to concentrate on paid searches of archival material. The Clusty implementation seems to work reasonably well but they have clearly only indexed a very small subset of the Web when compared to their competitors, as the results that I have gotten have been pretty weak on numbers and seem to have missed some significant sites in their clusters. I'll keep trying them as their categorization seems to work fairly wellalthough I think Northern Light's was betterand hopefully their index will continue to grow to a point where it becomes truly useful.
Posted by Ray Trygstad | Category: InfoTech | 12:18 PM