A Classification of Skew Effects in Parallel Database Systems

Skew effects are a serious problem in parallel database systems, but the relationship between different skew types and load balancing methods is still not fully understood. We develop and compare two classifications of skew effects and load balancing strategies, respectively, to match their relevant properties. Our conclusions highlight the importance of highly dynamic scheduling to optimize both the complexity and the success of load balancing. We also suggest the tuning of database schemata as a new anti-skew measure.

