Column-oriented GPU-accelerated Database Management System
CoGaDB
Concepts

In this section, we describe important concepts and design decisions in CoGaDB. We start with one of the most important building blocks of the query processor, the LookupTables. Then, we discuss the design and capabilities of CoGaDB'S optimizer, divided in the logical and physical optimizer.

Lookup Tables

A Lookup Table is a view on one or multiple tables. They are the bridge between the table-based operators and the internal column-based operators.

Internally, each operator returns the result as a list of TIDs. A LookupTable is basically a list of a pointer to a table, a pointer to a TID list, indicating which tuples of the underlying table belong to the Lookup Table, and a attribute list, specifying which columns of the table are included in the LookupTable. Therefore, LookupTables are a cheap mechanism to store intermediate results. Furthermore, they behave as they were "normal" tables, with the exception that LookupTables cannot be updated. Columns of LookupTables are LookupArrays, which consist of a pointer to a materialized column from a materialized table and a pointer to a TID list. To keep track of which LookupArray indexes a column from which table, we use a helper data structure called LookupColumn. A LookupColumn describes which part of one materialized table is part of a LookupTable, which can be the result of an arbitrary sequence of operators, including binary operators such as joins.

Logical Optimization

CoGaDB implements a simple logical optimizer. It basically implements two of the most basic optimizations: push down selections and resolve cross products by merging them with join conditions to natural joins. To achieve this, CoGaDB has currently four optimizer rules:

  1. Break complex selection expressions in conjunctive normal form in a sequence of selections consisting of at most one disjunction
  2. Push down the simplefied selections as far as possible. (Either to a SCAN operator, or to a binary operator, where not all conditions in the disjunction fit completely on one subtree, which is typically the case for join conditions.)
  3. Now the join conditions were pushed down far enough so they are directly over their respective CROSS_JOIN operators. Therefore, the optimizer removes the join condition, expressed by the selection, and the cross product and replaces them with a semantically equivalent JOIN operator. This process is repeated until all CROSS_JOINS are resolved.
  4. In the final step, the optimizer combines succeeding selections (each only one disjunction) to complex selections in conjunctive normal form. This allows for certain optimizations in case two phase physical optimization is used.

Physical Optimization

The core of CoGaDB's physical optimization is the HyPE Library, which is our Hybrid Query Processing Engine. It allocates for each operator in a query plan a processing device and decides on the most suitable algorithm on the selected processing device. Thus, HyPE takes care of the complete physical optimization in CoGaDB.

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines