Data Warehousing with Oracle Discoverer R3.

Data Warehousing with Oracle Discoverer R3.0

Helen Dwight
Product Manager
Oracle Corporation

1.0 Introduction

More and more mid to large sized organizations worldwide are recognizing the value of storing information regarding the organization's operations, sales figures, cost of manufacture, cost of sale, cost of personnel, competitive data and so on in a data warehouse.

The size of this warehouse may exceed tens of gigabytes. However, this information is essentially worthless unless we can provide a mechanism by which any skill level of end user is able to analyze the information in business terms and use it to provide competitive advantage.

This paper examines how Oracle's next generation business intelligence tool, Oracle Discoverer R3.0 supports fast and easy access to data warehouses with specific reference to the Oracle Warehouse. It also describes how it may be deployed within an organization to provide competitive edge and empower users to extract the information they need from the complex data warehouse structures.

2.0 Data Warehouses

A recent survey indicated that over 90% of mid to large sized organizations will set up a data warehouse this year. According to IDC, approximately 80% of organizations who have already invested in data warehousing, view them as a major success. Why? The benefit of a data warehouse is to provide a distinct centralized repository from the OLTP systems that contains extracts of vital business data from a variety of corporate databases which can be analyzed and used as a strategic competitive weapon.

Fast, accurate analysis of business issues can affect long term survival, for example, analyzing trends in geographic demand patterns allowed one organization to regulate supplies accordingly and increase sales by over $2 Million.

Data structures in a data warehouse are optimized for rapid retrieval and analysis. The data itself is usually historical and is updated in time periods from a day to a year, depending on the application, and usually has the following key features:

Source data is usually mainframe based
The data warehouse is usually based in a relational database. Some OLAP (On-Line AnalyticalProcessing) functionality is provided using multi-dimensional databases
Increasing volumes of data (in some cases gigabytes per month)
Data structures that are denormalized and summarized (approximately 10-30% of the data is summarized)
Analysis requirements (the degree of analysis required depends very much on the user’s job function)

There are 4 steps to defining, creating and using a data warehouse:

2.1 Extracting and Scrubbing the Data from the OLTP Systems

Data is extracted from multiple systems within a company and placed in one centralized data warehouse. As part of the extraction process, the data needs to be scrubbed. (For example, in one source system, personnel data may be stored with the letters ‘M’ and ‘F’ for male and female; in a second source system, they may be referred to using the words ‘male’ and ‘female’ and in a third source system, they may be the numbers, ‘1’ and ‘0’. The process of translating these three definitions to a common denominator is called scrubbing).

2.2 Designing the Data Warehouse Structures

The IS department interviews users to find their requirements and create an initial warehouse. Users access the warehouse and request new information. New data is loaded and so the process continues. At present, most of the information contained in these warehouses is atomic with about 10-30% being summarized.

2.3 Using the Data Warehouse Structures

Oracle provides a rich variety of techniques that ensure users obtain maximum performance from their Oracle warehouse such as: bit-mapped indexes, hash joins, partitioned data, SQL optimization and star schemas.

Bitmapped Indexes are fully integrated into the Oracle7 server and coexist with and complement other available indexing schemes including B-tree indexes, clustered table indexes and hash clusters. Bitmapped indexes arrange data more efficiently in bits rather than whole names. They are used to index data that has a low number of distinct values in relation to the number of rows in the table. For example, sex only has 2 values - male and female, states of the USA have 50 values. Queries using the bitmapped index read directly from the indices rather than the data which dramatically speeds up processing.

Hash Joins can deliver dramatic performance improvements especially for queries where existing indexes cannot be leveraged in join processing which is a common occurrence in ad hoc query environments. Oracle7 hash joins eliminate the need to perform sorts by using an in-memory hash table constructed at run-time.

Partition Views allow a large table to be divided into smaller, more manageable sub-tables or partitions and simplify data management operations by allowing data loads, index creation and data purges to be done at the partition level instead of on the entire table. Partition elimination and intelligent index utilization provide excellent query performance on partition views.

SQL Optimization The cost-based optimizer dynamically determines the most efficient access paths and join methods for every query by taking into account statistics, such as the size of each table and the selectivity of each query condition. Histograms provide the cost-based optimizer with more detailed statistics for skewed, non-uniform data distribution. The cost-based optimizer has been tuned for many specialized data warehouse queries such as star queries based on a star schema. Query syntax and dynamic database statistics and parallel capabilities are taken into account when optimizing SQL which enables Oracle7 to efficiently process ad hoc and complex queries which are common in data warehouse applications.

Star and Snowflake Schemas allow for specialized data organization which makes multi-dimensional analysis easier in relational databases. Data is organized into a main fact table with dimension tables supporting it. For example, a star schema in a retailing environment could contain the following fact and dimension tables: Sales, Customers, SKUs, Stores, Suppliers - all of which allow a user to analyze sales by customer, products etc. Oracle has multiple methods for executing queries on a star schema and Oracle's cost-based optimizer will always consider many possible execution methods when evaluating a query.

It is essential for any data access tool to take advantage of such server side performance to gain optimum benefit from the data warehouse.

2.4 Data Access Tools

In a recent survey, organizations found the selection of a data access tool to be the hardest decision they had to make when considering a data warehouse project. This is simply because the data access tool is what makes or breaks the success of the data warehouse project. Of the Organizations surveyed, several key criteria emerged for selection of a data warehouse tool :

Ease of Use, Flexibility (this is the most important test - if users cannot access data easily and flexibly, then the data warehouse project is doomed to failure from the start)
Performance (this is another usability issue, if users cannot access the data quickly, they soon give up)
Established supplier
Centralized Administration (this is important for reducing meta layer setup and maintenance time - meta layers are a common feature in all decision support tools)

This rest of this paper focuses on how Discoverer meets the key criteria for a data warehouse tool and provides organizations with an easy to use, flexible and performant tool that can be employed by any skill level of user against the data warehouse. Discoverer takes full advantage of the performance capabilities and data warehouse support in the Oracle server, but also accesses non-Oracle data warehouses using ODBC.

3.0 Oracle Discoverer R3.0

Discoverer is a breakthrough development which resolves the issues common with decision support tools in the market today. It is developed using C++ and is targeted for the Windows 95 and NT client markets. Production availability is scheduled for Q1 1997. Discoverer was designed and developed with 4 aims :

To provide minimal meta layer setup and maintenance
To provide the easiest-to-use, most tested user interface in the market
To provide the best performance for a decision support tool in the market
To provide the most flexible analysis model in the market

3.1 Minimal Meta Layer Setup and Maintenance

All tools in the decision support market have the concept of a meta layer which hides underlying complex database structures from end users and makes the data easily accessible.

One issue with data warehouses is support for the meta layer. Administrators are looking for a tool that will give them maximum control, security and flexibility whilst at the same time requiring very little work in terms of initial setup and on-going maintenance. A typical data warehouse today contains two separate meta-layers:

The meta layer that defines the relationship between the source (OLTP) systems and the Warehouse itself, (which contains the models of both systems and any relationships/transformation processes between them)
The front end meta layer for an end user perspective of the data structures.
Maintaining two meta-layers is a time-consuming and error-prone process. To resolve this, integration between Discoverer and Oracle Designer/2000^TM enables you to populate the front-end meta-layer directly from Oracle Designer/2000™, thus reducing the need to duplicate the models and effort required for maintenance of the Warehouse.

Another separate issue with first generation query tools is the amount of effort required for set up and maintenance of their meta layers. Many tools in this market require heavy administration, which becomes increasingly expensive and intensive as the number of users accessing the system increases. With Discoverer, one-third of the development effort was focused on the design of the End User Layer and Administration Utility. Feedback from usability tests was also incorporated into the final product. Both set-up and maintenance are minimal and achieved through an extremely intuitive Windows 95 ‘wizard’ interface. When a Business Area is put into production, it can be altered to accommodate changes in the business or unanticipated requests with little effort.

3.1.1 End User Layer Overview

The End User Layer for Discoverer is a server-based meta layer which simplifies and enhances the user’s view of the data. It provides a business perspective by grouping information into logical Business Areas. You see complex concepts and data models as a list of familiar objects (things) such as "Invoice" or "Annual Salary." The End User Layer automatically handles any relationships (joins), tables, views, columns, default formatting, default drill relationships, default user access and more.

The End User Layer provides:

Business Areas to group related information
Renamed fields and tables and associated descriptions
Creation of user defined folders and items
Relationships (joins) between any schema objects
Drill Relationships between schema objects
Derived (dynamically calculated) values
Dynamic Aggregation
Default formatting for reports
Security and control
Automatic Summary maintenance
Control of external summary tables
Control for reactive and predictive query governing

3.2 Easiest to Use, Most Tested User Interface

From the outset, development focus was placed on the usability of Discoverer.

3.2.1 Usability Testing

Usability testing is an integral part of the development process. The interface has been designed by a team of experienced User Interface professionals recognized worldwide for their expertise. Storyboards were created, revised and repeatedly tested with customers before any code was produced. Focus groups were recruited to give direct feedback on the process of creating complex queries and performing powerful analysis. The emphasis on usability continues throughout the development process, with extensive testing being carried out at Oracle’s sophisticated usability laboratory.

As a result of the Usability focus, Discoverer is recognized as the easiest to use ad-hoc query, reporting and analysis tool in the market today.

3.2.2 Wizards, Cue Cards and Quick Tour (CBT)

Discoverer uses a modern wizard-based approach which leads the user through each step of the query building process thus making them immediately more productive. Additional cue cards and computer based training ensure the training curve is minimal, enabling users to produce results immediately with no supervision.

eporting and analysis tool in the market today.

3.2.3 Single User Interface for Queries and Multi-dimensional Analysis

Discoverer provides one consistent interface for querying, reporting and analytical functionality. It blurs the distinction between these terms, enabling an inexperienced user to retrieve and analyze information without understanding the conceptual differences and terminology of these technologies. The ability to publish reports on the World Wide Web extends the availability of information to every user in the organization regardless of their technical expertise.

3.3 Best Performance

One clear result from the usability tests and focus group feedback was that performance is really another usability issue. If a query, reporting and analysis tool cannot provide rapid response, it is rejected by users. Discoverer is designed to provide maximum performance for query and relational on-line analytical processing capabilities. The expert SQL query engine dynamically generates performance optimized SQL queries.

3.3.1 Summarized Tables

Discoverer creates and maintains summarized tables in your data warehouse to enable rapid performance. Using a wizard-based interface, the administrator specifies which measures and dimensions are to be summarized. A intelligence routine calculates the optimum storage method for table creation and advises the administrator on the database space required. If summarized tables already exist, they are also incorporated and utilized.

When a user summarizes detailed rows, Discoverer intelligence algorithms check to see if that information has been pre-summarized. If so, the appropriate SQL is issued by the query engine and the query is run automatically against the pre-summarized data. This capability provides rapid response times and removes load from the server.

3.3.2 Client-Side Cubic Cache

As data is retrieved from the server, it is stored in a client-side cubic cache known as the ResultsBase to enable extremely rapid multi-dimensional analysis with any number of dimensions and measures, rotation, drill down, drill up and drill across functionality. An expert SQL query engine dynamically generates performance optimized SQL queries.

Discoverer automatically decides when to perform queries and drills locally, and when to go to the server for improved performance. This true client-server architecture provides maximum benefit from your database and hardware investment.

3.3.3 Query Prediction

Most decision support tools today provide a query governor to cancel long running queries. Discoverer extends this capability to give users what they really need. As well as the ability to cancel a query, we provide an estimate of the retrieval time before the query is run. This feedback enables the user to make sensible decisions on whether to run a query immediately using the non-blocking capabilities of Oracle7, or refer it to a batch process for later calculation. Non-blocking query execution allows users to perform other tasks on their PC whilst running a Discoverer query thus increasing productivity.

3.3.4 Designed for Data Warehouses

Discoverer utilizes the bitmapped indexing and star query functionality of Oracle7 release 7.3. Support for SQL hints, which maximizes parallel processing performance, further increases the power of Discoverer and allows users to gain the maximum benefit from the Oracle server. Discoverer also supports ODBC access to non-Oracle data warehouses.

3.4 Flexible Analysis

Discoverer provides full query building capabilities through a tabbed Windows95 wizard. Build complex filters easily or simply attach pre-defined filters using full drag and drop capability. Banded reports supported include tabular, break tabular, matrix and master-detail, with complete font and color control. Create exceptions to automatically highlight values out of range. Take full advantage of OLE2 integration to add bitmap and video to enhance the display of your report output.

Discoverer also enables users to perform many different kinds of analysis. The ‘drill’ functionality is unique in that it employs a ‘just-in-time’ strategy that is analogous to browsing the World Wide Web. It efficiently allows each user to request and access only the information that he/she needs rather like downloading a single web page.

3.4.1 Summary to Detail Drill (Hyper-Drill^TM)

One of the biggest issues in the decision support and OLAP world today is the inability to drill through different levels of summarized data down to detailed rows. Discoverer resolves this issue. It provides a unique Hyper-Drill^TMcapability which enables users to drill rapidly from summary information to detail rows at the click of a button. Use this unique and powerful feature to drill down a variance report in your general ledger to find the detail invoices that cause anomalies.

In an extension of the Hyper-Drill™ capability, Discoverer also supports drill out to external applications. Plug-In a video device to drill to a video clip; Hyper-Drill to an exact location on the Worldwide Web or a separate Developer/2000 application. The possibilities for seamless drilling out to other supporting applications are endless.

4.0 Summary

Discoverer’s Ease of Use, Performance, Low Maintenance, Flexible Analysis and support for Data Warehouses make it the leading tool for today’s data warehouse.

Discoverer is a tool that will empower end users and provide real competitive edge to any organization.