Getting Performance from a Relational
Data Warehouse

 

Michael P. Howe
Product Manager
Oracle Corporation

In today’s fast-paced business world, strategic decisions must be made quickly, so companies can react and adapt to reorganizations, process changes, regulation revisions, and competitors’ directions. Strategic decisions need to be based on accurate data and thorough analysis. The ability to analyze and exploit operational data in a data warehouse becomes a key competitive weapon.

As strategic decisions are made, it is imperative that they are implemented quickly and efficiently so that the competitive advantage is not lost. Operational systems today need to be flexible and adaptable to change without incurring delays or cost overruns due to application or hardware issues. Organizations require enterprise systems that allow them to make strategic decisions based on information collected and analyzed, and then implement these decisions quickly and cost effectively.

1. 0 Enterprise Class Data Warehousing with Oracle7, Release 7.3

Data warehouse applications require different processing techniques than OLTP applications due to the complex, ad hoc queries running against large amounts of data. To address these special requirements, Oracle7 offers a rich variety of query processing techniques, sophisticated query optimizations choose the most efficient data access path, and a scaleable architecture that takes full advantage of all parallel hardware configurations.

1.1 Rich Query Processing Techniques

Successful data warehouse applications rely on superior performance when accessing potentially gigabytes and terabytes of information. Oracle7 provides a rich variety of integrated indexing schemes and join methods to deliver answers quickly to data warehouse users.

1.1.1 Bitmapped Indexes

Bitmapped indexes deliver dramatic performance benefits to data warehouse applications. Bitmapped indexes are fully integrated into the Oracle7 Server, and coexist with and complement other available indexing schemes including standard B-tree indexes, clustered table indexes, and hash clusters. Since the bitmapped indexes are integrated within the Oracle7 Server, they are automatically maintained even when the indexed values in tables are inserted, updated, or deleted, making index "refreshes" unnecessary, unlike other implementations of bitmapped indexes.

1.1.2 Hash Joins

Hash joins deliver dramatic performance improvements over other join methods in many complex queries, especially those queries where existing indexes cannot be leveraged in join processing, a common occurrence in ad hoc query environments. Oracle7’s hash joins eliminate the need to perform sorts by using an in-memory hash table constructed at run-time. They are also ideally suited for scaleable parallel execution.

1.1.3 Partitioned Data

Oracle7 supports efficient management and access of very large databases objects through partition views. Partition views allow a large table to be divided into smaller, more manageable sub-tables or partitions. A partition view simplifies data management operations by allowing data loads, index creation, and data purges to be done at the partition level, rather than on the entire table. Partition elimination and intelligent index utilization provide excellent query performance on partition views.

1.2 Sophisticated SQL Optimizer

Oracle7’s numerous, powerful query processing techniques are completely transparent to the end user. The Oracle7 cost-based optimizer dynamically determines the most efficient access paths and join methods for every query. To choose the most efficient query execution strategy, the Oracle7 cost-based optimizer takes into account statistics, such as the size of each table and the selectivity of each query condition. Histograms provide the cost-based optimizer with more detailed statistics for skewed, non-uniform data distribution.

The cost-based optimizer considers many other constraints when choosing an execution strategy. The user or application can specify whether it is more desirable to return the first row of a query quickly, or to complete the entire query before returning the data. The cost-based optimizer is also "parallel aware," which means it considers the availability of parallel resources when choosing the most efficient execution strategy.

The cost-based optimizer has been tuned for specialized data warehouse queries, such as star queries. The cost-based optimizer selects the most efficient execution strategies for all star queries, even complex "snowflake" queries. Because the cost-based optimizer considers not only query syntax, but also dynamic database statistics and parallel capabilities, Oracle7 efficiently processes ad hoc and complex queries common in data warehouse applications.

1.3 Scaleable Parallel Query Architecture

In addition to efficient query processing techniques and sophisticated query optimization techniques, data warehouse servers must also provide scaleable performance for processing large amounts of data. Oracle7’s superior, integrated parallel query architecture provides excellent scalability on SMP, MPP, and hybrid hardware platforms. Oracle7’s parallel query architecture increases the performance of database operations by dynamically subdividing these operations into distinct tasks, and distributing the workload across multiple processors. Oracle7 parallelizes more operations than any other database product. Oracle7’s adaptive parallel architecture combines the best elements of "shared disk" and "share nothing" approaches and provides excellent load balancing and dynamic load distribution. Oracle7 is a proven solution that enables customers to realize the maximum potential of parallel performance.