Measuring Precision and Recall

Before measuring precision and recall it is necessary to materialize the "baseline" join which gives the exact answer to the text-join. We give two versions: The baseline.sql materializes the join by executing the query of Figure 2. This script, however, may crash for large datasets if you have limited disk space. The baseline2.sql "forces" a "block nested loop"-style execution. It might be useful for big datasets. The size of the block can be controlled by changing the "step" variable in the script. The result of these scripts is inserted into the Baseline table. It also creates an auxuliary BaselineNumbers table, which is used to speedup the calculation of precision and recall.

Finally, the SQL script PrecRecall.sql creates a table PrecisionRecall with the precision and recall of the different algorithms and for different settings.