This semester I’m taking a course in database management systems. For this course, we have to work on a mini-research project in groups. I’m in a group with 2 other students and the project we decided on was to perform an experimental evaluation of the mJoin operator. This will involve surveying the prior work on the mJoin operator and performing an implementation of the operator in an open-source DBMS.
The mJoin operator is essentially an n-ary symmetric hash join operator. For each relation to be joined, a hash table is built on each join attribute. Then for each new tuple, it is inserted into the appropriate hash table(s) and a probe is performed into the hash tables on the other relations. Intermediate tuples are never stored anywhere. One of the issues we will be investigating in this experimental evaluation is whether an operator like the mJoin is more or less efficient than a tree of binary joins. Conventional wisdom says that a tree of binary joins is typically more efficient.
The first thing we will be doing in the next week or two is looking at various open-source databases and seeing which one would be most suited for us to work with for this project. Basically, the main criteria will be how easy the runtime engine is to work with and how easy it will be to add a new operator. We’ll have a look at a lot of databases but at the moment, its looking like Postgresql is the one we will work with for the semester. We’ll also be looking into any related work. The survey on adaptive query processing looks like a good starting point for this.
Some other interesting aspects of the mJoin operator which we hope to investigate are:
- query optimization with the mJoin operator
- what applications would benefit from an operator such as this
- what kind of scenarios is the operator suited for (and not suited for)
- how difficult it is to add the operator to an existing DBMS
blog comments powered by Disqus