Hybrid Query Processing Engine for Coprocessing in Database Systems
HyPE
Tutorial: How to use HyPE

HyPE is organized as a library to allow easy integration in existing applications. You can choose between a dynamic and a static version of the library. Note that you have to link against the libraries HyPE uses, if you use the static version.

ATTENTION: HyPE uses the thread library of boost for it's advanced features. There is a bug concerning applications compiled with the g++ compiler, because boost thread does not properly export all symbols until version 1.48. The Bug was fixed in Boost 1.49. The workaround is to statically link against boost thread. Further details can be found here.

To integrate HyPE in your project, you have to include the header file

#include <hype.hpp>

and link against hype:

g++ -g  -Wl,-rpath,${PATH_TO_HYPE_LIB}/lib -Wall -Werror -o <you application's name> <object files> -I${PATH_TO_HYPE_LIB}/include -Bstatic -lboost_thread -pthread -Bdynamic -L${PATH_TO_HYPE_LIB}/lib -lhype -lboost_system -lboost_filesystem -lboost_program_options-mt -lloki -lrt

The general concept of HyPE is to make decisions for your applications, which algorithm (processing device) should be used to perform an operation. Therefore, you first have to specify the Operations you wish to make decisions for and second you have to register your available algorithms for these operations. First we need a reference to the global Scheduler:

HyPE uses two major abstractions: First, a DeviceSpecification, which defines information to a processing device, e.g., a CPU or GPU. Second, is an AlgorithmSpecification, which encapsulates algorithm specific information, e.g., the name, the name of the operation the algorithm belongs to as well as the learning and the load adaption strategy.

As example, we will create the configuration for the most common case: A system with one CPU and one dedicated GPU:

DeviceSpecification cpu_dev_spec(hype::PD0, //by convention, the first CPU has Device ID: PD0  (any system has at least one)
                                 hype::CPU, //a CPU is from type CPU
                                 hype::PD_Memory_0); //by convention, the host main memory has ID PD_Memory_0

DeviceSpecification gpu_dev_spec(hype::PD1, //different porcessing device (naturally)
                                 hype::GPU, //Device Type
                                 hype::PD_Memory_1); //separate device memory

Now, we have to define the algorithms. Note that an algorithm may utilize only one processeng device at a time (e.g., the GPU).

Note that the GPU algorithm is only executable on the GPU and hence, should be assigned only to DeviceSpecifcations of ProcessingDeviceType GPU. ATTENTION: the algorithm name in the AlgorithmSpecification has to be unique! Let's assume that our CPU algorithm runs only on the CPU and the GPU algorithms runs only on the GPU. We define this by calling the method Scheduler::addAlgorithm:

scheduler.addAlgorithm(cpu_alg, cpu_dev_spec); //add CPU Algorithm to CPU Processing Device
scheduler.addAlgorithm(gpu_alg, gpu_dev_spec); //add GPU Algorithm to GPU Processing Device

We are now ready to use the scheduling functionality of HyPE. First, we have to identify the parameters of a data set that have a high impact on the algorithms execution time. In case of our sorting example, we identify the size of the input array as the most important feature value. Note that HyPE supports n feature values (n>=1).

To tell HyPE the feature value(s) of the data set that is to be processed, we have to store them in a hype::Tuple object. By convention, the first entry quantifies the size of the input data, and the second (if any) should contain the selectivity of a database operator.

hype::Tuple t;
t.push_back(Size_of_Input_Dataset);//for our sort operation, we only need the data size

Now, HyPE knows about your hardware and your algorithms. We can now let HyPE do scheduling decisions. HyPE needs two informations to perform scheduling decisions: First is a OperatorSpecification, which defines the operation that should be executed ("SORT"), and the feature vector of the input data (t). Furthermore, we have to specify the location of the input data as well as the desired location for the output data, so HyPE can take the cost for possible copy operations into account.

OperatorSpecification op_spec("SORT", 
                              t,
                              hype::PD_Memory_0, //input data is in CPU RAM
                              hype::PD_Memory_0); //output data has to be stored in CPU RAM

The second information, which HyPE needs, is a specification of constraints on the processing devices. For some applications, operations cannot be executed on all processing devices for arbitrary data. For example, if a GPU has not enough memory to process a data set, the operation will fail (and will probably slow down other operations). Since HyPE cannot know this, because it does not know the semantic of the operations, the user can specify constraints, on which type of processing device the operation may be executed. In our case, we have no constraints and just default construct a DeviceConstraint object.

DeviceConstraint dev_constr;

No we can ask HyPE were to execute our operation:

SchedulingDecision sched_dec = scheduler.getOptimalAlgorithm(op_spec, dev_constr);

Note that the application always has to execute the algorithm HyPE chooses, otherwise, all following calls to sched_dec.getOptimalAlgorithm() will have undefined behavior. Since HyPE uses a feedback loop to refine the estimations of algorithm execution times, you have to measure the execution times of your algorithms and pass them back to HyPE. HyPE provides a high level interface for algorithm measurement:

AlgorithmMeasurement alg_measure(sched_dec); //has to be directly before algorithm execution
   //execute the choosen algortihm
alg_measure.afterAlgorithmExecution();  //has to be directly after algorithm termination

The AlgorithmMeasurement object starts a timer and afterAlgorithmExecution() stops the timer. Note that the constructor of the AlgorithmMeasurement object needs a SchedulingDecision as parameter. When we put the usage of the SchedulingDecision together, we get the following code skeleton:

if(sched_dec.getNameofChoosenAlgorithm()=="CPU_Algorithm"){
   AlgorithmMeasurement alg_measure(sched_dec);
      //execute "CPU_Algorithm"
   alg_measure.afterAlgorithmExecution(); 
}else if(sched_dec.getNameofChoosenAlgorithm()=="GPU_Algorithm"){
   AlgorithmMeasurement alg_measure(sched_dec);
      //execute "GPU_Algorithm"
   alg_measure.afterAlgorithmExecution(); 
}

Some applications have their own time measurement routines and wish to use their own timer framework. To support such applications, HyPE offers a direct way to add a measured execution time in nanoseconds for a corresponding SchedulingDecision:

uint64_t begin=hype::core::getTimestamp();
CPU_algorithm(t[0]);
uint64_t end=hype::core::getTimestamp();
//scheduling decision and measured execution time in nanoseconds!!!
scheduler.addObservation(sched_dec,end-begin);

The complete source code of this example can be found in the documentation online_learning::cpp and in the examples directory of HyPE (examples/use_as_online_framework/online_learning.cpp).

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines