The XTree 1.0 Decision Tree Learning Toolkit

Introduction

The eXtended decision TREE learning toolkit (XTree) offers a collection of non-standard decision tree learners. They were specifically designed for probability-based ranking. Compared to standard decision tree learners based on the information gain metric, the XTree toolkit offers the following advantages: The theory underlying the XTree learners and the kMRR/kMAP metrics are described in: The co-learning approach is explained in:

Obtaining the XTree toolkit

The XTree toolkit, including this file, can be downloaded from the website of the LogAnswer project, http://www.loganswer.de/resources/xtree-1.0.tgz.

Copyright, License, and Disclaimer of Warranty

The XTree toolkit was written by Ingo Glöckner, Intelligent Information & Communication Systems Group at FernUniversität in Hagen, Copyright (C) 2012.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Requirements

You need a working installation of Bigloo scheme in order to compile the XTree toolkit. Bigloo is freely available from http://www-sop.inria.fr/indes/fp/Bigloo/.

Installation / How to build the XTree toolkit

Make sure that Bigloo is properly installed.
The environment variable BIGLOODIR should be set to the location where Bigloo is installed (default: /usr/local). Moreover the environment variable BIGLOO_LIB should contain the the path to the Bigloo libraries (default: /usr/local/lib/bigloo/4.0b).
Once your environment settings are correct, you can build the XTree executable bin/xtree by typing make.
In order to call XTree functions from applications written in Bigloo scheme, simply add the XTree-related modules to your .afile and add an (import xtree) clause to the module header of each Bigloo module which needs access to XTree functions.

Running the XTree toolkit

The XTree toolkit provides a collection of scheme functions for learning decision trees and bags of decision trees, for evaluating these models and for applying to data items in an application. When starting the executable bin/xtree, you can issue commands in the read-eval-print-loop (repl) of the bigloo interpreter. (It is recommended to use a readline wrapper such as rlwrap in order improve the user experience).
The following list of exported bindings documents the main functions supplied by the XTree toolkit. You can also invoke these functions in your own compiled code if you import these bindings from the module file src/xtree.mod.

XTree function documentation

Preliminaries

Learning models

Parallel learning of models (co-training)

Evaluating models based on standard metrics

Evaluating models based on ranking metrics

The following functions assume that the data is organized into groups. As described in the preliminaries section, the ARFF files used for evaluation must contain an integer-valued group attribute which must be the second-last attribute in the attribute list.
For the rank-based evaluation, all data items with the same group number are collected and sorted in decreasing order of YES probability as estimated by the given model. Standard ranking metrics such as the MRR (mean reciprocal rank) can then be calculated. See Reference [1] for explanation of the MRR-k metrics (which corresponds to kMRR in the paper) and ANS-k, which is the total number of YES items found on the top k ranks across all groups.
The group-based evaluation functions accept all the options described for the non-grouped versions eval-dt, co-eval-dt etc.
Moreover, they accept an additional DSSSL key :window which controls the number of ranking metrics shown: Grouped evaluation functions:

Applying models to data

Modifying models

Displaying basic information about models

Reproducing tree generation

Displaying attribute usage statistics

Visualizing models

Miscellaneous functions for working with ARFF files

The following two functions compute basic analyses of grouped data (histogram and median): For testing the treatment of missing values, the following function can be used which intentionally deteriorates the quality of training or test files by 'forgetting' some of the attribute values:  
(C) 2012 Ingo Glöckner, iglockner@web.de