|
The table below indicates that the 60 known, largest Deep-Web sites contain data of about 750 terabytes (HTML included basis), or roughly 40 times the size of the known surface Web. These sites appear in a broad array of domains from science to law to images and commerce. We estimate the total number of records or documents within this group to be about 85 billion.
By nature, this listing is preliminary and likely incomplete, since we lack a complete census of Deep-Web sites. This inability today to identify all of the largest Deep-Web sites should not be surprising. The awareness of the Deep-Web is a new phenomenon and has received little attention.
|
Name
|
Type
|
Web Size (GBs)
|
Rec Num (000)
|
| National Climatic Data Center (NOAA) |
Public |
366,000
|
41,012,794
|
| NASA EOSDIS |
Public |
219,600
|
24,607,676
|
| National Oceanographic (combined with Geophysical) Data Center (NOAA) |
Public/Fee |
32,940
|
3,691,151
|
| Alexa |
Public (partial) |
15,860
|
1,777,221
|
| Right-to-Know Network (RTK Net) |
Public |
14,640
|
1,640,512
|
| MP3.com |
Public |
4,300
|
481,844
|
| Terraserver |
Public/Fee |
4,270
|
478,483
|
| HEASARC (High Energy Astrophysics Science Archive Research Center) |
Public |
2,562
|
287,090
|
| US PTO - Trademarks + Patents |
Public |
2,440
|
3,000
|
| Informedia (Carnegie Mellon Univ.) |
Public (not yet) |
1,830
|
205,064
|
| Alexandria Digital Library |
Public |
1,220
|
1,600
|
| JSTOR Project |
Limited |
1,220
|
136,709
|
| 10K Search Wizard |
Public |
769
|
1,068
|
| UC Berkeley Digital Library Project |
Public |
766
|
2,403
|
| SEC Edgar |
Public |
610
|
1,000
|
| US Census |
Public |
610
|
68,355
|
| NCI CancerNet Database |
Public |
488
|
54,684
|
| Amazon.com |
Public |
461
|
18,000
|
| IBM Patent Center |
Public/Private |
345
|
9,881
|
| NASA Image Exchange |
Public |
337
|
306
|
| InfoUSA.com |
Public/Private |
195
|
14,100
|
| Betterwhois (many similar) |
Public |
152
|
11,900
|
| GPO Access |
Public |
146
|
16,405
|
| Adobe PDF Search |
Public |
143
|
1,678
|
| Internet Auction List |
Public |
130
|
6,000
|
| Commerce, Inc. |
Public |
122
|
12,000
|
| Library of Congress Online Catalog |
Public |
116
|
12,000
|
| Sunsite Europe |
Public |
98
|
10,937
|
| Uncover Periodical DB |
Public/Fee |
97
|
8,800
|
| Astronomer's Bazaar |
Public |
94
|
3
|
| eBay.com |
Public |
82
|
4,076
|
| REALTOR.com Real Estate Search |
Public |
60
|
1,300
|
| Federal Express |
Public (if shipper) |
53
|
3,300
|
| Integrum |
Public/Private |
49
|
20,000
|
| NIH PubMed |
Public |
41
|
11,000
|
| Visual Human (NIH) |
Public |
40
|
4,482
|
| AutoTrader.com |
Public |
39
|
1,550
|
| UPS |
Public (if shipper) |
33
|
2,050
|
| NIH GenBank |
Public |
31
|
5,355
|
| AustLi (Australasian Legal Information Institute) |
Public |
24
|
100
|
| Digital Library Program (UVa) |
Public |
21
|
2,200
|
| |
|
|
|
| Subtotal Public and Mixed Sources |
673,035
|
74,628,077
|
| |
|
|
|
| DBT Online |
Fee |
30,500
|
4,000,000
|
| Lexis-Nexis |
Fee |
12,200
|
2,600,000
|
| Dialog |
Fee |
10,980
|
1,230,384
|
| Genealogy - ancestry.com |
Fee |
6,500
|
500,000
|
| ProQuest Direct (incl. Digital Vault) |
Fee |
3,172
|
50,000
|
| Dun & Bradstreet |
Fee |
3,113
|
75,000
|
| Westlaw |
Fee |
2,684
|
572,000
|
| Dow Jones News Retrieval |
Fee |
2,684
|
55,000
|
| infoUSA |
Fee/Public |
1,584
|
126,700
|
| Elsevier Press |
Fee |
570
|
889
|
| EBSCO |
Fee |
481
|
750
|
| Springer-Verlag |
Fee |
221
|
344
|
| OVID Technologies |
Fee |
191
|
298
|
| Investext |
Fee |
157
|
2,474
|
| Blackwell Science |
Fee |
146
|
227
|
| GenServ |
Fee |
106
|
19,352
|
| Academic Press IDEAL |
Fee |
104
|
162
|
| Tradecompass |
Fee |
61
|
6,835
|
| INSPEC |
Fee |
16
|
6,500
|
| |
|
|
|
| Subtotal Fee-Based Sources |
75.469
|
9,246,915
|
| |
|
|
|
| TOTAL |
|
748,504
|
83,874,991
|
|