The Data Warehouse Configuration Problem Nykredit Center for Database Research

Title: The Data Warehouse Configuration Problem
By: Michael O. Akinde
Advisor:  Michael H. Böhlen
Status: Defended February 6, 2003

Description

On-line analytical processing (OLAP), business intelligence, and multi-dimensional analysis has been the focus of intense research activity over the last few years. Early OLAP research focused primarily on simple aggregate queries; particularly the evaluation, usage, and maintenance of summary-table views. As data warehousing and analytical applications have gained ground in the industry, the challenges facing OLAP technology have increased in scale and complexity. Many applications exist which require the evaluation of very complex aggregate queries.

This Ph.D. thesis presents a general algebraic operator for the expression and evaluation of complex aggregate queries and considers two relevant research questions within the field of complex OLAP (i.e., aggregation queries that require expressions more complex than simple summary-table views): the evaluation of subquery predicates in the presence of complex aggregation, and the distributed evaluation of complex OLAP queries.

The thesis formalizes the generalized multi-dimensional join (GMD-join), an algebraic operator for complex OLAP and presents a set of algebraic transformation rules demonstrating how the operator interacts with the other operators of a multi-set algebra. The techniques for achieving an efficient evaluation of the GMD-join are considered, and cost-formulas for estimating the cost of evaluating the GMD-join are presented. The algebraic transformations, techniques, and cost-model presented in this thesis provide a foundation for the incorporation of the GMD-join, or a similar segmented evaluation operator into a conventional DBMS.

Subqueries are a common feature of complex OLAP queries. Despite this, no research work has considered the evaluation of subquery predicates in the presence of complex aggregation. The thesis presents a general algorithm that allow subquery predicates to be expressed as GMD-joins expressions thereby enabling them to be evaluated efficiently.

Many of the new applications for complex OLAP involve huge amounts of highly distributed data. In order for such data to be queried we need to develop and maintain a distributed data warehouse. This thesis develops a framework and describes a prototype for the distributed processing of complex OLAP queries. A general strategy for the distributed evaluation of complex OLAP queries expressed using GMD-joins is presented, and optimization strategies that exploit distribution knowledge, if known, as well as strategies that do not assume such knowledge, are developed. A series of experiments are presented to evaluate the performance of these strategies and validate the distributed processing algorithm. Finally, the architecture and algorithms of Skalla, a prototype system for the distributed evaluation of complex OLAP queries implemented during the Ph.D. project is documented.

Further readings:

M.O.Akinde. Skalla. Internal Technical Report, AT&T Shannon Labs Research, Florham Park, New Jersey, USA, 5 pages, August 2000.

D.Chatziantoniou, M.O.Akinde, T.Johnson, and S.Kim. MD-join: An operator for complex OLAP. In Proceedings of the 17th International Conference on Data Engineering (ICDE'2001), Heidelberg, Germany, pages 524--533, April 2001.

M.O.Akinde and M.H.Böhlen. Generalized MD-joins: Evaluation and Reduction to SQL. In Databases in Telecommunications II, International Workshop Co-located with VLDB-2001, Rome, Italy, pages 52--67, September 2001 (LCNS 2209).

M.O.Akinde, M.H.Böhlen, T.Johnson, L.V.S.Lakshmanan, and D.Srivastava. Efficient OLAP Query Processing in Distributed Data Warehouses. In Advances in Database Technology - EDBT'02, 8th International Conference on Extending Database Technology, Prague, Czech Republic, pages 336--353, March 2002.

M.O.Akinde, M.H.Böhlen, T.Johnson, L.V.S.Lakshmanan, and D.Srivastava. Efficient OLAP Query Processing in Distributed Data Warehouses. To appear in Information Systems, 28(1), 25 pages, March 2003.

M.O.Akinde and M.H.Böhlen. Efficient Computation of Subqueries in Complex OLAP. To appear in Proceedings of the 19th International Conference on Data Engineering (ICDE'2003), Bangalore, India, 12 pages, March 2003.

 

Copyright © 1998 - 2004.  All rights reserved.