Metadata-Version: 2.0
Name: revscoring
Version: 2.2.0
Summary: A set of utilities for generating quality scores for MediaWiki revisions
Home-page: https://github.com/wiki-ai/revscoring
Author: Aaron Halfaker
Author-email: ahalfaker@wikimedia.org
License: MIT
Description-Content-Type: UNKNOWN
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Environment :: Other Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3
Requires-Dist: deltas (>=0.4.6,<0.4.999)
Requires-Dist: docopt (>=0.6.2,<0.6.999)
Requires-Dist: flake8 (>=3.3.0,<3.3.999)
Requires-Dist: gensim (>=2.3.3,<3.3.999)
Requires-Dist: mmh3 (>=2.3.1,<2.3.999)
Requires-Dist: more-itertools (==2.2)
Requires-Dist: mwapi (>=0.5.0,<0.5.999)
Requires-Dist: mwparserfromhell (>=0.3.3,<0.4.999)
Requires-Dist: mwtypes (>=0.2.0,<0.3.999)
Requires-Dist: mysqltsv (<0.0.999,>=0.0.7)
Requires-Dist: nltk (<3.0.999,>=3.0.0)
Requires-Dist: numpy (<1.14.999,>=1.8.2)
Requires-Dist: pyenchant (<1.6.999,>=1.6.6)
Requires-Dist: pytest (>=3.2.2,<3.2.999)
Requires-Dist: pytz (>=2017.2)
Requires-Dist: pywikibase (<0.0.999,>=0.0.3)
Requires-Dist: requests (<2.999.999,>=2.0.0)
Requires-Dist: scikit-learn (>=0.17.0,<0.17.999)
Requires-Dist: scipy (>=0.13.3,<1.0.999)
Requires-Dist: tabulate (>=0.7.5,<0.7.999)
Requires-Dist: textstat (<0.3.999,>=0.3.1)
Requires-Dist: tqdm (>=4.15.0,<4.15.9999)
Requires-Dist: yamlconf (<0.2.999,>=0.2.0)

[![Build Status](https://travis-ci.org/wiki-ai/revscoring.svg?branch=master)](https://travis-ci.org/wiki-ai/revscoring)
[![Test coverage](https://codecov.io/gh/wiki-ai/revscoring/branch/master/graph/badge.svg)](https://codecov.io/gh/wiki-ai/revscoring)
# Revision Scoring

A generic, machine learning-based revision scoring system designed to be used
to automatically differentiate damage from productive contributory behavior on
Wikipedia.

## Example


Using a scorer_model to score a revision::
```
  import mwapi
  from revscoring import Model
  from revscoring.extractors.api.extractor import Extractor

  with open("models/enwiki.damaging.linear_svc.model") as f:
       scorer_model = Model.load(f)

  extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
                                          user_agent="revscoring demo"))

  feature_values = list(extractor.extract(123456789, scorer_model.features))

  print(scorer_model.score(feature_values))
  {'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}} 
  ```


# Installation

The easiest way to install is via the Python package installer
(pip).

``pip install revscoring``

You may find that some of the dependencies fail to compile (namely
`scipy`, `numpy` and `sklearn`).  In that case, you'll need to install some
dependencies in your operating system.

### Ubuntu & Debian:
  *  Run ``sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev``
  *  Run ``apt-get install aspell-ar aspell-bn aspell-is myspell-cs myspell-nl myspell-en-us myspell-en-gb myspell-en-au myspell-et voikko-fi myspell-fr myspell-de-at myspell-de-ch myspell-de-de myspell-he myspell-hr myspell-hu aspell-id myspell-it myspell-nb myspell-fa aspell-pl myspell-pt myspell-es hunspell-sr aspell-sv aspell-ta myspell-ru myspell-uk hunspell-vi aspell-el myspell-lv aspell-ro myspell-ca`
### Windows:
<i>TODO</i> 
### MacOS:
  Using Homebrew and pip, installing `revscoring` and `enchant` can be accomplished
  as follows::

* brew install aspell --with-all-languages
* brew install enchant
* pip install --no-binary pyenchant revscoring

#### Adding languages in aspell (MacOS only)
```
cd /tmp
wget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
./configure
make
sudo make install
 ```
 Caveats: <br>
  <b><u> The differences between the `aspell` and `myspell` dictionaries can cause </b>
    <b> <u>some of the tests to fail </b>


Finally, in order to make use of language features, you'll need to download
some NLTK data.  The following command will get the necessary corpora.

``python -m nltk.downloader omw sentiwordnet stopwords wordnet``

You'll also need to install `enchant <https://en.wikipedia.org/wiki/Enchant_(software)>`_ compatible
dictionaries of the languages you'd like to use.  We recommend the following:

* languages.arabic: aspell-ar
* languages.bengali: aspell-bn
* languages.bosnian: hunspell-bs
* languages.catalan: myspell-ca
* languages.czech: myspell-cs
* languages.croatian: myspell-hr
* languages.dutch: myspell-nl
* languages.english: myspell-en-us myspell-en-gb myspell-en-au
* languages.estonian: myspell-et
* languages.finnish: voikko-fi
* languages.french: myspell-fr
* languages.german: myspell-de-at myspell-de-ch myspell-de-de
* languages.greek: aspell-el
* languages.hebrew: myspell-he
* languages.hungarian: myspell-hu
* languages.icelandic: aspell-is
* languages.indonesian: aspell-id
* languages.italian: myspell-it
* languages.latvian: myspell-lv
* languages.norwegian: myspell-nb
* languages.persian: myspell-fa
* languages.polish: aspell-pl
* languages.portuguese: myspell-pt
* languages.serbian: hunspell-sr
* languages.spanish: myspell-es
* languages.swedish: aspell-sv
* languages.tamil: aspell-ta
* languages.russian: myspell-ru
* languages.ukrainian: aspell-uk
* languages.vietnamese: hunspell-vi

# Authors

  *   [Aaron Halfaker](http://halfaker.info)


  *   [Helder](https://github.com/he7d3r)
    

  *   [Adam Roses Wight](https://mediawiki.org/wiki/User:Adamw)
    

  *   [Amir Sarabadani](https://github.com/Ladsgroup)


