Records 1 5 10 – Innovative Personal Database Systems

We all collected things as children. Rocks, baseball cards, Barbies, perhaps even bugs -- we all tried to gather up as much stuff as possible to compile the biggest most interesting collection possible. Some of you may have even been able to amass a collection of items numbering into the hundreds (or thousands). As the story always goes, we got older, our collections got smaller, and eventually our interests died out...until now. There are currently organizations around the world in the business of amassing collections of things, and their collections number into and above the trillions. In many cases these collections, or databases, consist of items we use every day. In this list, we cover the top 10 largest databases in the world:

Library of Congress

  1. 20 000 locations x 720 records x 120 months (10 years back) = 1 728 000 000 records. These are the past records, new records will be imported monthly, so that's approximately 20 000 x 720 = 14 400 000 new records per month. The total locations will steadily grow as well. On all of that data, the following operations will need to be executed.
  2. An automated system to manage records has become a shopping list item for many organisations. In the public sector this trend has been largely driven by the introduction of laws, regulations.

Electronic Records Management (ERM) ensures your organization has the records it needs when they are needed. Records management refers to a set of activities required for systematically controlling the creation, distribution, use, maintenance, and disposition of recorded information maintained as evidence of business activities and transactions. The right database software system — also called a database management system (DBMS) — is critical to maximize performance and minimize IT headaches. Read Next: RFP Template for IT Professionals. Here’s a look at 10 of the best systems available for business professionals: Oracle. No surprise here. Cloud storage for database contents is more efficient in terms of easy access by employees anytime, anywhere. The updated information for the database is generated with real-time information sent to a central server within the company. Updates to the cloud are made every hour during normal working hours.

Not even the digital age can prevent the world's largest library from ending up on this list. The Library of Congress (LC) boasts more than 130 million items ranging from cook books to colonial newspapers to U.S. government proceedings. It is estimated that the text portion of the Library of Congress would comprise 20 terabytes of data. The LC expands at a rate of 10,000 items per day and takes up close to 530 miles of shelf space -- talk about a lengthy search for a book.

If you're researching a topic and cannot find the right information on the internet, the Library of Congress should be your destination of choice. For users researching U.S. history, around 5 million pieces from the LC's collection can be found online at American Memory.

Unfortunately for us, the Library of Congress has no plans of digitizing the entirety of its contents and limits the people who can check out materials to Supreme Court Justices, members of Congress, their respective staff, and a select few other government officials; however, anyone with a valid Reader Identification Card (the LC's library card) can access the collection.

By the Numbers

  • 130 million items (books, photographs, maps, etc)
  • 29 million books
  • 10,000 new items added each day
  • 530 miles of shelves
  • 5 million digital documents
  • 20 terabytes of text data

Central Intelligence Agency

The Central Intelligence Agency (CIA) is in the business of collecting and distributing information on people, places and things, so it should come as no surprise that they end up on this list. Although little is known about the overall size of the CIA's database, it is certain that the agency has amassed a great deal of information on both the public and private sectors via field work and digital intrusions.

Portions of the CIA database available to the public include the Freedom of Information Act (FOIA) Electronic Reading Room, The World Fact Book, and various other intelligence related publications. The FOIA library includes hundreds of thousands of official (and occasionally ultra-sensitive) U.S. government documents made available to the public electronically. The library grows at a rate of 100 articles per month and contains topics ranging from nuclear development in Pakistan to the type of beer available during the Korean War. The World Fact Book boasts general information on every country and territory in the world including maps, population numbers, military capabilities and more.

By the Numbers

  • 100 FOIA items added each month
  • Comprehensive statistics on more than 250 countries and entities
  • Unknown number of classified information

Amazon

Amazon, the world's biggest retail store, maintains extensive records on its 59 million active customers including general personal information (phone number address, etc), receipts, wishlists, and virtually any sort of data the website can extract from its users while they are logged on. Amazon also keeps more than 250,000 full text books available online and allows users to comment and interact on virtually every page of the website, making Amazon one of the world's largest online communities.

This data coupled with millions of items in inventory Amazon sells each year -- and the millions of items in inventory Amazon associates sell -- makes for one very large database. Amazon's two largest databases combine for more than 42 terabytes of data, and that's only the beginning of things. If Amazon published the total number of databases they maintain and volume of data each database contained, the amount of data we know Amazon houses would increase substantially.

But still, you say 42 terabytes, that doesn't sound like so much. In relative terms, 42 terabytes of data would convert to 37 trillion forum posts.

By the Numbers

  • 59 million active customers
  • More than 42 terabytes of data

YouTube

After less than two years of operation YouTube has amassed the largest video library (and subsequently one of the largest databases) in the world. YouTube currently boasts a user base that watches more than 100 million clips per day accounting for more than 60% of all videos watched online.

In August of 2006, the Wall Street Journal projected YouTube's database to the sound of 45 terabytes of videos. While that figure doesn't sound terribly high relative to the amount of data available on the internet, YouTube has been experiencing a period of substantial growth (more than 65,000 new videos per day) since that figures publication, meaning that YouTube's database size has potentially more than doubled in the last 5 months.

Records 1 5 10 – Innovative Personal Database Systems

Estimating the size of YouTube's database is particularly difficult due to the varying sizes and lengths of each video. However if one were truly ambitious (and a bit forgiving) we could project that the YouTube database will expect to grow as much as 20 terabytes of data in the next month.

Given: 65,000 videos per day X 30 days per month = 1,950,000 videos per month; 1 terabyte = 1,048,576 megabytes. If we assume that each video has a size of 1MB, YouTube would expect to grow 1.86 terabytes next month. Similarly, if we assume that each video has a size of 10MB, YouTube would expect to grow 18.6 terabytes next month.

By the Numbers

  • 100 million videos watched per day
  • 65,000 videos added each day
  • 60% of all videos watched online
  • At least 45 terabytes of videos

ChoicePoint

Imagine having to search through a phone book containing a billion pages for a phone number. When the employees at ChoicePoint want to know something about you, they have to do just that. If printed out, the ChoicePoint database would extend to the moon and back 77 times.

ChoicePoint is in the business of acquiring information about the American population -- addresses and phone numbers, driving records, criminal histories, etc., ChoicePoint has it all. For the most part, the data found in ChoicePoint's database is sold to the highest bidders, including the American government.

But how much does ChoicePoint really know? In 2002 ChoicePoint was able to help authorities solve a serial rapist case in Philadelphia and Fort Collins after producing a list of 6 potential suspects by data mining their DNA and personal records databases. In 2001 ChoicePoint was able to identify the remains of World Trade Center victims by matching DNA found in bone fragments to the information provided by victim's family members in conjunction to data found in their databases.

By the Numbers

  • 250 terabytes of personal data
  • Information on 250 million people

Sprint

Sprint is one of the world's largest telecommunication companies as it offers mobile services to more than 53 million subscribers, and prior to being sold in May of 2006, offered local and long distance land line packages.

Large telecommunication companies like Sprint are notorious for having immense databases to keep track of all of the calls taking place on their network. Sprint's database processes more than 365 million call detail records and operational measurements per day. The Sprint database is spread across 2.85 trillion database rows making it the database with the largest number of rows (data insertions if you will) in the world. At its peak, the database is subjected to more than 70,000 call detail record insertions per second.

By the Numbers

  • 2.85 trillion database rows.
  • 365 million call detail records processed per day
  • At peak, 70,000 call detail record insertions per second

Google

Although there is not much known about the true size of Google's database (Google keeps their information locked away in a vault that would put Fort Knox to shame), there is much known about the amount of and types of information Google collects.

On average, Google is subjected to 91 million searches per day, which accounts for close to 50% of all internet search activity. Google stores each and every search a user makes into its databases. After a years worth of searches, this figure amounts to more than 33 trillion database entries. Depending on the type of architecture of Google's databases, this figure could comprise hundreds of terabytes of information.

Google is also in the business of collecting information on its users. Google combines the queries users search for with information provided by the Google cookies stored on a user's computer to create virtual profiles.

To top it off, Google is currently experiencing record expansion rates by assimilating into various realms of the internet including digital media (Google Video, YouTube), advertising (Google Ads), email (GMail), and more. Essentially, the more Google expands, the more information their databases will be subjected to.

In terms of internet databases, Google is king.

By the Numbers

  • 91 million searches per day
  • accounts for 50% of all internet searches
  • Virtual profiles of countless number of users

AT&T

Similar to Sprint, the United States' oldest telecommunications company AT&T maintains one of the world's largest databases. Architecturally speaking, the largest AT&T database is the cream of the crop as it boasts titles including the largest volume of data in one unique database (312 terabytes) and the second largest number of rows in a unique database (1.9 trillion), which comprises AT&T's extensive calling records.

The 1.9 trillion calling records include data on the number called, the time and duration of the call and various other billing categories. AT&T is so meticulous with their records that they've maintained calling data from decades ago -- long before the technology to store hundreds of terabytes of data ever became available. Chances are, if you're reading this have made a call via AT&T, the company still has all of your call's information.

By the Numbers

  • 323 terabytes of information
  • 1.9 trillion phone call records

National Energy Research Scientific Computing Center

Records

Records 1 5 10 – Innovative Personal Database Systems Llc

The second largest database in the world belongs to the National Energy Research Scientific Computing Center (NERSC) in Oakland, California. NERSC is owned and operated by the Lawrence Berkeley National Laboratory and the U.S. Department of Energy. The database is privy to a host of information including atomic enegry research, high energy physics experiements, simulations of the early universe and more. Perhaps our best bet at traveling back in time is to fire up NERSC's supercomputers and observe the big bang.

The NERSC database encompasses 2.8 petabytes of information and is operated by more than 2,000 computational scientists. To put the size of NERSC into perspective, the total amount of spoken words in the history of humanity is estimated to be at 5 exabytes; in relative terms, the NERSC database is equivalent to 0.055% of the size of that figure.

Although that may not seem a lot at first glance, when you factor in that 6 billion humans around the globe speak more than 2,000 words a day, the sheer magnitude of that number becomes apparent.

Records 1 5 10 – Innovative Personal Database Systems Inc

By the Numbers

  • 2.8 petabytes of data
  • Operated by 2,000 computational scientists

World Data Centre for Climate

Records 1 5 10 – Innovative Personal Database Systems Collection Agency

If you had a 35 million euro super computer lying around what would you use it for? The stock market? Building your own internet? Try extensive climate research -- if there's a machine out there that has the answer for global warming, this one might be it. Operated by the Max Planck Institute for Meteorology and German Climate Computing Centre, The World Data Centre for Climate (WDCC) is the largest database in the world.

The WDCC boasts 220 terabytes of data readily accessible on the web including information on climate research and anticipated climatic trends, as well as 110 terabytes (or 24,500 DVD's) worth of climate simulation data. To top it off, six petabytes worth of additional information are stored on magnetic tapes for easy access. How much data is six petabyte you ask? Try 3 times the amount of ALL the U.S. academic research libraries contents combined.

By the Numbers

  • 220 terabytes of web data
  • 6 petabytes of additional data

* Additional Databases

The following databases were unique (and massive) in their own right, and just fell short of the cut on our top 10 list.

Nielsen Media Research / Nielsen Net Ratings

Best known for its television audience size and composition rating abilities, the U.S. firm Nielsen Media Research is in the business of measuring mass-media audiences including television, radio, print media, and the internet. The database required to process such statistics as Google's daily internet searches is nothing short of massive.

United States Customs

The U.S. Customs database is unique in that it requires information on hundreds of thousands of people and objects entering and leaving the United States borders instantaneously. For this to be possible, the database was special programmed to process queries near instantaneously.

HPSS

There are various databases around the world using technology similar to that found in our countdown's second largest database NERSC. The technology is known as High Performance Storage System or HPSS. Several other massive HPSS databases include Lawrence Livermore National Laboratory, Sandia National Laboratories, Los Alamos National Laboratory, Commissariat a l'Energie Atomique Direction des Applications Militaires, and more.

Every company needs a database.

Whether it’s kept on the premises or off site, locally managed or handled by a third-party, businesses need a reliable, searchable and adaptable database to handle the constant influx of information.

But databases don’t store, manage and analyze this information on their own. The right database software system — also called a database management system (DBMS) — is critical to maximize performance and minimize IT headaches.

Here’s a look at 10 of the best systems available for business professionals:

  1. Oracle. No surprise here. Oracle has been making database products since 1979 and is one of the most well-recognized manufacturers worldwide. Worth noting about this database management system: It’s powerful but complex. New users will want to invest in solid training to ensure they’re getting the most from the software. Oracle also is embracing the cloud. Its latest release, 12c, allows companies to consolidate and manage databases as cloud services.
  2. Microsoft SQL Server. Love it or hate it, Microsoft’s DBMS is one of the most popular in the world. It’s also one of the most enduring. Server 2008, 2012 and 2014 are still widely used even after the release of Server 2016. The SQL stands for “structured query language,” and although Microsoft was late to the database management party, this DBMS — which sports native BI tools links with other popular Microsoft offerings such as Excel, Word and SharePoint — grabs a well-earned top spot.
  3. MySQL. An open-source alternative to Microsoft’s offering that still uses structured query language, MySQL has gained traction as the go-to DBMS for web-based business applications, especially those running e-commerce sites or leveraging dynamic content. Tech enterprises such as Facebook, Google and Adobe use this database management tool. Although it now falls under the Oracle umbrella, the project remains an open-source resource.
  4. PostgreSQL. You probably haven’t heard much about PostgreSQL, but this open-source object-relational DBMS shows up in a lot of places — for example, online gaming apps, database automation tools and domain registries. Enjoying 25 years with an active, engaged community, PostgreSQL runs on a host of operating systems, including Windows, Linux, Solaris and now Mac OS X.
  5. Microsoft Access. Think of it like a lighter-weight version of SQL Server and you’re not far off. This desktop database application is quickly finding use as a database for e-commerce sites and content management systems. While it doesn’t offer the depth of features found in SQL proper, Access comes standard with the Microsoft Office Suite and is easy to get up and running.
  6. Teradata. If you’re dealing with big data, Teradata is the very large database (VLDB) system for you. Credited with creating some of the original warehouses, Teradata also rolled out the very first terabyte database for Wal-Mart almost 25 years ago. Today, Teradata version 15.10 is a great choice for companies looking to handle high-volume big data, BI and the Internet of Things (IoT).
  7. IBM DB2. No surprise that IBM makes the list with its DB2 Universal Database (UDB) Enterprise Server Edition. Designed for high-load, high-availability enterprise workloads, DB2 is used by several global corporations to help improve database performance and lower costs.
  8. Informix. Another offering from IBM, Informix often is used by educational institutions, but recently made the jump to corporate databases. Described as an “intelligent database,” the solution integrates well with SQL, JSON and spatial data and often ranks first in terms of customer satisfaction.
  9. SAP ASE. Originally known as Sybase, SAP’s Adaptive Server Enterprise is designed to handle high-performance, transaction-based applications — such as those used in banking and finance — and support thousands of concurrent users.
  10. Amazon’s SimpleDB. Looking for a solid DBMS starting point? Amazon’s offering comes free with an EC2 deployment and provides the ability to store and query data items via web services requests along with true cloud integration.

Rocco Lungariello is Marketing and Social Media Content Creator at New Horizons, the largest group of New Horizons training centers in America. He has been generating content surrounding the IT Industry for more than four years.