A data warehouse is a collection and summarization of information from multiple databases and database tables. The primary purpose of a data warehouse is not data storage, but the collection of information for decision-making. Typically, a data warehouse extracts updated information from operational databases on a regular basis (nightly, hourly, etc.). This forms a snapshot of collected data that can be organized into a logical structure based on your analytical needs.
Data warehouses allow you to express your information needs logically, without being constrained to database fields and records. Using the correct data mining tools, it is possible to display information from a data warehouse in ways that are not possible using SQL or other basic query languages. Unlike a relational database, a data warehouse can present information in multidimensional format. This representation is called a hypercube, and contains layers of rows and columns. Using this model a company could, for instance, track sales of multiple products in multiple regions over a given period of time, all in the same view.
A data warehouse can contain extremely large amounts of information, and many users will only need to access a portion of this. Information in a data warehouse can be organized into data marts, which are subsets of data with a specific focus. Data marts can provide an analyst with a more efficient set of working data relevant to, for instance, a specific business process or unit of the company.
When data is being gathered from diverse sources, it is often necessary to clean and transform the data before it can be used. Cleaning refers to the process of correcting or discarding erroneous data such as duplicate or incomplete items. Data must be transformed when its format is incompatible for instance, a nine-digit ZIP code entry versus a five-digit entry. When this is complete, the data warehouse documents its data with metadata, also called a data dictionary, which defines the data sources and creates the structure and organization of the data warehouse. |
|
Most relational databases support online transaction processing (OLTP). Databases of this type are operational databases, used for storage and maintenance of data. Data warehouses, however, support online analytical processing (OLAP), collecting data in support of decision-making processes. |
A data warehouse is an efficient way to bring together large bodies of data. A data warehouse: