Hybrid Query Processing Engine for Coprocessing in Database Systems
HyPE

A Statistical Method learns the relation between the feature values of the input dataset and an algorithms execution time. It is a central part of HyPE, implementing the learning based execution time estimation. Hence, it is crucial to select the appropriate statistical method depending on the algorithm and the application environment. Currently, HyPE supports one dimensional Least Square Method and Multi Linear Fitting. Statistical methods are defined in the type hype::StatisticalMethods.
HyPE uses the least square solver of the ALGLIB Project. It is usually the candidate to choose, if an algorithm only depends on one input features, such as sorting.
HyPE uses the multi linear fitting functionality of the ALGLIB Project. You should choose Multi Linear Fitting, if an algorithm depends on multiple input features, such as selections (data size, selectivity). Note that Multi Linear Fitting is currently limited to two features, but will support more in future versions of HyPE.
A Recomputation Heuristic implements the load adaption functionality of HyPE. If the load situation of a system dramatically changes, then it is very likely that the execution time of algorithm will change as well. To ensure sufficiently exact estimations, the learned approximation functions have to be updated. However, there is now 'perfect' point in time when to recompute the approximation functions. Therefore, the user can select a Recomputation Heuristic, which is appropriate for the application. Recomputation heuristics are defined in the type hype::RecomputationHeuristics.
The Periodic Recomputation Heuristic will recompute the approximation function of an algorithm after X executions of this algorithm. X is called Recomputation Period and can be configured as well (see Configure HyPE for details). You should use this Recomputation Heuristic if you want that HyPE refines its estimations at runtime to adapt at changing data, load, etc.
The Oneshot Recomputation Heuristik will compute the approximation functions once after the initial training phase. You should choose this optimization criterion, if significant changes in the load in your system is seldom or have little impact on algorithms execution time.
An Optimization Criterion specifies what an "optimal" algorithm for your application is. Should it be the fastest? Or would you like to select algorithms in a way that the throughput of your system is optimized? Therefore, we implemented several strategies to make HyPE configurable and better usable for a wide range of applications. Optimization criteria are defined in the type hype::OptimizationCriterions.
The idea of Response Time optimization is to reduce the execution time of one operation by selecting the (estimated) fastest algorithm.
Waiting Time Aware Response Time (WTAR) is an extension of the simple response time algorithm. WTAR takes into account the load on all processing devices and allocates for an operation O the processing device, were the sum of the waiting time, until the previous oeprators finished, and the estimated execution time of O is minimal. This is the recommended optimization algorithm for HyPE.
The round robin strategy allocates processing devices for operations in turns, distributing a workload of operations on all processing devices. This approach works well in case operation need roughly the same time on all processing devices (e.g., on homogeneous hardware). However, in case one processing device is significantly faster than the other processing devices, the round strategy will under utilize the faster processing device, and over utilize the slower processing devices.
Thresholdbased Outsourcing is an extension of Response Time. The idea is to force the use of a slower processing device to relieve the fastest processing device and distribute the workload on all available processing devices. However, the algorithm has to ensure that the operation's response time does not significantly increase. Therefore, an operation may be executed on a slower processing device, if and only if the expected slowdown is under a certain threshold W.
Probabilitybased Outsourcing computes for each scheduling decision the estimated execution times of the avaialble algorithms. Then, each algorithm gets assigned a probability, depending on the estimated execution time. Faster algorithms (on faster processing devices) get a higher probability to be executed then slower algorithms. Depending on the probability, an algorithm is chosen randomly for execution.