Reading digest: Basic Algorithms in Skyline query

42 views
Skip to first unread message

Jarod Wen

unread,
Oct 16, 2006, 10:03:04 PM10/16/06
to DB_Sk...@googlegroups.com, Donghui Zhang
Oct 16th, 2006 from Jarod Wen:

Reading digest: Basic Algorithms in Skyline query and analysis on their
performance

Paper:
[BKS01] S. Borzsonyi, D. Kossmann, and K. Stocker. The Skyline Operator.
In Proc. IEEE Conf. on Data Engineering, pages 421--430, Heidelberg,
Germany, 2001
[GSG05] Parke Godfrey, Ryan Shipley, Jarek Gryz Maximal Vector
Computation in Large Data Sets VLDB'2005, Norway

The aim of Skyline research is on the optimal solution in
multi-dimension data set, which also needs the query has a better
performance to show the result object for actual applications. It is
hard to be a clearly "better" performance, however, since there are
different aspects for the algorithm to focusing on, and different
environment always have different preference on such aspects.

[BKS01] is the first introduction on skyline operator, although the
related concepts have already been seen in other terms such as convex
hall, maximal vector, Pareto set and so on. [BKS01] mainly focuses on
the application of Skyline as an extensions of SQL and also provides a
implementation based on nested SQL query, which has very poor
performance. So two algorithms, D&C and BNL, are introduced for better
performance. D&C is aimed to use main memory for better query speed, but
it is not fit to the big database because the partition process will
take a high cost on IO. BNL is straightforward solution for skyline
query, and it always show good performance on high dimension compared to
D&C. (Recently we have tried to implemented BNL on both desktop and B/S
environment, using NBA player dataset, experiment shows that BNL can
show good performance in desktop environment, but not very good in B/S.)

[GSG05]/ /focuses on the external efficiency, which refers to the
limitation on buffer in main-memory. The most important part of this
paper, I think, is the analysis part on the different algorithms for
Maximal Vector and Skyline query. The analysis make good summary on the
time consume and the space requirement(focusing on the external disk
performance). The new algorithm in the paper, LESS, is mainly focusing
on improving the progressive performance in general environment(common
data set without index or sort), which can be considered as a mixture of
BNL, SFS and external sort for large dataset. The biggest advantage of
LESS compared with other progressive algorithms, as far as I know, is
the general dataset environment, which will have less limitation on the
data storage. A flaw of it, I think, may be on the dynamic dataset,
where new data may change the skyline set, and LESS should re-run the
algorithm for new one. On this point, LESS is not very progressive for
online skyline query. Another paper "Shooting star in the sky: An online
algorithm for skyline queries" pays some attention on this point.

In the future research on Skyline, in my opinion, the improvement on
progressive performance is a keystone. Progressive may include the fast
query on the fly, and the flexibility of algorithm for dynamic data, and
also the users' control on query. And another important aspect is to
improve the value of the skyline query result, which may match to the
users' requirement, for example, certain number of skyline objects on
some of the attributes. In recent skyline project, both of these
function will be considered in the B/S environment.

Reply all
Reply to author
Forward
0 new messages