Metadata-Version: 2.1
Name: drafttopic
Version: 0.4.1
Summary: A library for automatic detection of topics of new drafts on Wikipedia based on WikiProjects.
Home-page: https://github.com/wikimedia/drafttopic
Author: Aaron Halfaker, Sumit Asthana
Author-email: ahalfaker@wikimedia.org, asthana.sumit23@gmail.com
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Utilities
Classifier: Topic :: Scientific/Engineering
License-File: LICENSE

# Draft topic

Predicting topics to new drafts based on Wikiprojects on English Wikipedia.

## Setting up

Make sure to have a working python3 environment.
Install requirements using:

```
pip install -r requirements
```

Install the library using:

```
python setup.py install
```

## Generating machine-readable WikiProjects data

Use the following utility from root directory to generate machine-readable WikiProjects data:

```
./utility fetch_wikiprojects --output <output_file_name.json>
```

## Generating mid-level category to WikiProjects mapping

Use the following utility from root directory to generate a mapping of high-level topic categories to list of WikiProjects contained in them:

```
./utility trim_wikiprojects --wikiprojects wp --output outmid
```

## Labeling a list of page-ids with the wikiprojects and mid-level categories each page belongs to

Use the following utility from root directory to label a list of page-ids with the wikiprojects and the mid-level categories the page belongs to.

```
./utility fetch_page_wikiprojects --api-host=https://en.wikipedia.org/ --input=wikiproject_page_ids.json --output=enwiki.labeled_wikiprojects.json --mid_level_wp=outmid.json --verbose
```

In above, the input to the script should be a json containing a list of
observations, each observation having a **page\_id: \<page-id\>** mapping.
Additionally also pass the mid-level wikiprojects json for the script to
generate wikiprojects to mid-level categories mapping. The script augments the
given list with the mentioned fields, writing them to a new file specified by
**"output"**

## Generating predictions for a set of page-ids on Wikipedia

For generating topic predictions for a set of revision-ids, download the relevant model and use revscoring's [score](https://github.com/wikimedia/revscoring/blob/master/revscoring/utilities/score.py) API
to generate predictions. Note that the revision-ids need to be in a file with a format specified by the API. Use the revision ID of the most recent revision for a page to get a good prediction.
