{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using the MEKA wrapper\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [MEKA](http://waikato.github.io/meka/) project provides an open source implementation of methods for multi-label learning and evaluation. In multi-label classification, we want to predict multiple output variables for each input instance. \n",
    "\n",
    "MEKA is based on the [WEKA](http://www.cs.waikato.ac.nz/ml/weka/) Machine Learning Toolkit; it includes dozens of multi-label methods from the scientific literature, as well as a wrapper to the related [MULAN](http://mulan.sourceforge.net/) framework.\n",
    "\n",
    "An introduction to multi-label classification and MEKA is given in a [JMLR MLOSS-track paper](http://jmlr.org/papers/volume17/12-164/12-164.pdf). Note that while MEKA is GPL-licensed, using this wrapper does not incur GPL limitations on your code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setting up MEKA\n",
    "\n",
    "In order to use the scikit-multilearn interface to MEKA you need to have JAVA and MEKA installed. Paths to both are passed to the class's constructor. **The current version supports meka 1.9.1+**\n",
    "\n",
    "The currently officially supported MEKA version is [1.9.2](https://github.com/Waikato/meka/releases/tag/meka-1.9.2). \n",
    "\n",
    "You can download it using the :fun:`download_meka`, the function returns path to MEKA classes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MEKA 1.9.2 found, not downloading\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'/home/niedakh/scikit_ml_learn_data/meka/meka-release-1.9.2/lib/'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from skmultilearn.ext import download_meka\n",
    "\n",
    "meka_classpath = download_meka()\n",
    "meka_classpath"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to use a different version, just pass the version number as an argument to `download_meka`.\n",
    "\n",
    "Note that you will need to have ``liac-arff`` installed if you want to use the MEKA wrapper, you can get them using: ``pip install liac-arff``.\n",
    "\n",
    "You will also need Java. You can pass a path to the Java binary in MEKA wrapper constructor. In Python 2.7 the pip package ``whichcraft`` is used to detect the location of java executables if no path is provided to the constructor. You can install it via ``pip install whichcraft``. In Python 3 ``whichcraft`` is not used, java path will be found using the standard library.\n",
    "\n",
    "## Using MEKA via scikit-multilearn \n",
    "\n",
    "Starting from scikit-multilearn ``0.0.2`` the meka wrapper is available from ``skmultilearn.ext`` (ext as in external) and is a fully scikit-compatible multi-label classifier.\n",
    "\n",
    "To use the interface class start with importing skmultilearn's module, then create an object of the ``Meka`` class using the constructor and perform the standard fit & predict scenario. \n",
    "\n",
    "Let's load up some data to see how it works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "scene - exists, not redownloading\n",
      "scene - exists, not redownloading\n"
     ]
    }
   ],
   "source": [
    "from skmultilearn.dataset import load_dataset\n",
    "\n",
    "X_train, y_train, _, _ = load_dataset('scene', 'train')\n",
    "X_test,  y_test, _, _ = load_dataset('scene', 'test')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have a data set let's classify it using MEKA and WEKA! If you are new to the MEKA and WEKA stack you can find available classifiers under the following links:\n",
    "\n",
    "- [WEKA base classifier list](http://weka.sourceforge.net/doc.dev/weka/classifiers/Classifier.html)\n",
    "- [MEKA multi-label classifier list](http://waikato.github.io/meka/methods/)\n",
    "\n",
    "Let's start by importing :class:`Meka` and constructing a MEKA wrapper classifier:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Meka(java_command='/usr/bin/java',\n",
       "   meka_classifier='meka.classifiers.multilabel.BR',\n",
       "   meka_classpath='/home/niedakh/scikit_ml_learn_data/meka/meka-release-1.9.2/lib/',\n",
       "   weka_classifier='weka.classifiers.bayes.NaiveBayesMultinomial')"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from skmultilearn.ext import Meka\n",
    "\n",
    "meka = Meka(\n",
    "        meka_classifier = \"meka.classifiers.multilabel.BR\", # Binary Relevance\n",
    "        weka_classifier = \"weka.classifiers.bayes.NaiveBayesMultinomial\", # with Naive Bayes single-label classifier\n",
    "        meka_classpath = meka_classpath, #obtained via download_meka\n",
    "        java_command = '/usr/bin/java' # path to java executable\n",
    ")\n",
    "meka"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Where:\n",
    "\n",
    "- ``meka_classifier`` is the MEKA classifier class \n",
    "- ``weka_classifier`` is the WEKA base classifier class if used\n",
    "- ``java_command`` is the path to java\n",
    "- ``meka_classpath`` is the path to where meka.jar and weka.jar are located, usually they come together in meka releases, so this points to the ``lib`` subfolder of the folder where ``meka-<version>-realease.zip`` file was unzipped. If not provided the path is taken from environmental variable: ``MEKA_CLASSPATH``\n",
    "\n",
    "Now let's train and test the classifier - we'll se what level of hamming loss do we get?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<1211x294 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 351805 stored elements in LInked List format>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "meka.fit(X_train, y_train)\n",
    "predictions = meka.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import hamming_loss"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.14659977703455965"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hamming_loss(y_test, predictions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Citing meka\n",
    "\n",
    "```latex\n",
    "@article{MEKA,\n",
    "    author = {Read, Jesse and Reutemann, Peter and Pfahringer, Bernhard and Holmes, Geoff},\n",
    "    title = {{MEKA}: A Multi-label/Multi-target Extension to {Weka}},\n",
    "    journal = {Journal of Machine Learning Research},\n",
    "    year = {2016},\n",
    "    volume = {17},\n",
    "    number = {21},\n",
    "    pages = {1--5},\n",
    "    url = {http://jmlr.org/papers/v17/12-164.html},\n",
    "}\n",
    "\n",
    "@article{Hall:2009:WDM:1656274.1656278,\n",
    "    author = {Hall, Mark and Frank, Eibe and Holmes, Geoffrey and Pfahringer, Bernhard and Reutemann, Peter and Witten, Ian H.},\n",
    "    title = {The WEKA Data Mining Software: An Update},\n",
    "    journal = {SIGKDD Explor. Newsl.},\n",
    "    issue_date = {June 2009},\n",
    "    volume = {11},\n",
    "    number = {1},\n",
    "    month = nov,\n",
    "    year = {2009},\n",
    "    issn = {1931-0145},\n",
    "    pages = {10--18},\n",
    "    numpages = {9},\n",
    "    url = {http://doi.acm.org/10.1145/1656274.1656278},\n",
    "    doi = {10.1145/1656274.1656278},\n",
    "    acmid = {1656278},\n",
    "    publisher = {ACM},\n",
    "    address = {New York, NY, USA},\n",
    "}\n",
    "\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
