An efficient stream-based join to process end user transactions in real-time data warehousing

Thumbnail Image
Other Title
Jamil, Noreen
Author ORCID Profiles (clickable)
Journal Article
Ngā Upoko Tukutuku (Māori subject headings)
real-time data warehousing
semi-stream processing
join operator
performance measurement
data processing
information resources management
ANZSRC Field of Research Code (2020)
Jamil, N. (2014). An Efficient Stream-based Join to Process End User Transactions in Real- Time Data Warehousing. Journal of Digital Information Management, 12, pp.201-215.
In the field of real-time data warehousing semistream processing has become a potential area of research since last one decade. One important operation in semi-stream processing is to join stream data with a slowly changing diskbased master data. A join operator is usually required to implement this operation. This join operator typically works under limited main memory and this memory is generally not large enough to hold the whole disk-based master data. Recently, a seminal join algorithm called MESHJOIN (Mesh Join) has been proposed in the literature to process semistream data. MESHJOIN is a candidate for a resource-aware system setup. However, MESHJOIN is not very selective. In particular, MESHJOIN does not consider the characteristics of stream data and its performance is suboptimal for skewed stream data. In this paper we propose a novel Semi-Stream Join (SSJ) using a new cache module. The algorithm is more appropriate for skewed distributions, and we present results for Zipfian distributions of the type that appears in many applications. We present the cost model for our SSJ and validate it with experiments. Based on the cost model we also tune the algorithm up to a maximum performance. We conduct a rigorous experimental study to test our algorithm. Our experiments show that SSJ outperforms MESHJOIN significantly
Link to ePress publication
Copyright holder
Copyright notice
All rights reserved
Copyright license
This item appears in: