Metadata-Version: 2.1
Name: gatenlp
Version: 1.0.8.dev2
Summary: GATE NLP implementation in Python.
Home-page: https://github.com/GateNLP/python-gatenlp
Author: Johann Petrak
Author-email: johann.petrak@gmail.com
License: Apache License 2.0
Keywords: nlp,text processing
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: sortedcontainers (>=2.0.0)
Provides-Extra: all
Requires-Dist: beautifulsoup4 (>=4.9.3) ; extra == 'all'
Requires-Dist: conllu ; extra == 'all'
Requires-Dist: elg ; extra == 'all'
Requires-Dist: google-cloud-language ; extra == 'all'
Requires-Dist: ibm-watson ; extra == 'all'
Requires-Dist: matchtext ; extra == 'all'
Requires-Dist: msgpack ; extra == 'all'
Requires-Dist: nltk (>=3.5) ; extra == 'all'
Requires-Dist: py4j ; extra == 'all'
Requires-Dist: pyyaml (>=5.2) ; extra == 'all'
Requires-Dist: ray[default] ; extra == 'all'
Requires-Dist: recordclass ; extra == 'all'
Requires-Dist: requests ; extra == 'all'
Requires-Dist: sortedcontainers (>=2.0.0) ; extra == 'all'
Requires-Dist: spacy (>=2.2) ; extra == 'all'
Requires-Dist: stanza (>=1.3.0) ; extra == 'all'
Provides-Extra: alldev
Requires-Dist: RISE ; extra == 'alldev'
Requires-Dist: bandit ; extra == 'alldev'
Requires-Dist: black[d] ; extra == 'alldev'
Requires-Dist: flake8 ; extra == 'alldev'
Requires-Dist: ipykernel ; extra == 'alldev'
Requires-Dist: ipython ; extra == 'alldev'
Requires-Dist: ipywidgets ; extra == 'alldev'
Requires-Dist: jupyterlab ; extra == 'alldev'
Requires-Dist: mypy ; extra == 'alldev'
Requires-Dist: notebook ; extra == 'alldev'
Requires-Dist: pdoc3 ; extra == 'alldev'
Requires-Dist: prospector[with_bandid,with_frosted,with_mypy,with_pyroma,with_vulture] ; extra == 'alldev'
Requires-Dist: pytest ; extra == 'alldev'
Requires-Dist: pytest-cov ; extra == 'alldev'
Requires-Dist: pytest-pep8 ; extra == 'alldev'
Requires-Dist: pytest-runner ; extra == 'alldev'
Requires-Dist: pytest-tornasync ; extra == 'alldev'
Requires-Dist: sphinx ; extra == 'alldev'
Requires-Dist: tox ; extra == 'alldev'
Requires-Dist: voila ; extra == 'alldev'
Provides-Extra: base
Requires-Dist: sortedcontainers (>=2.0.0) ; extra == 'base'
Provides-Extra: clientelg
Requires-Dist: requests ; extra == 'clientelg'
Requires-Dist: elg ; extra == 'clientelg'
Provides-Extra: clientgatecloud
Requires-Dist: requests ; extra == 'clientgatecloud'
Provides-Extra: clientgooglenlp
Requires-Dist: google-cloud-language ; extra == 'clientgooglenlp'
Provides-Extra: clientibm
Requires-Dist: ibm-watson ; extra == 'clientibm'
Provides-Extra: clients
Requires-Dist: requests ; extra == 'clients'
Requires-Dist: elg ; extra == 'clients'
Requires-Dist: ibm-watson ; extra == 'clients'
Requires-Dist: google-cloud-language ; extra == 'clients'
Provides-Extra: clienttagme
Requires-Dist: requests ; extra == 'clienttagme'
Provides-Extra: clienttextrazor
Requires-Dist: requests ; extra == 'clienttextrazor'
Provides-Extra: dev
Requires-Dist: pytest ; extra == 'dev'
Requires-Dist: pytest-pep8 ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Requires-Dist: pytest-runner ; extra == 'dev'
Requires-Dist: sphinx ; extra == 'dev'
Requires-Dist: pdoc3 ; extra == 'dev'
Requires-Dist: tox ; extra == 'dev'
Requires-Dist: mypy ; extra == 'dev'
Requires-Dist: bandit ; extra == 'dev'
Requires-Dist: prospector[with_bandid,with_frosted,with_mypy,with_pyroma,with_vulture] ; extra == 'dev'
Requires-Dist: pytest-tornasync ; extra == 'dev'
Requires-Dist: flake8 ; extra == 'dev'
Requires-Dist: black[d] ; extra == 'dev'
Provides-Extra: formats
Requires-Dist: msgpack ; extra == 'formats'
Requires-Dist: pyyaml (>=5.2) ; extra == 'formats'
Requires-Dist: beautifulsoup4 (>=4.9.3) ; extra == 'formats'
Requires-Dist: requests ; extra == 'formats'
Requires-Dist: conllu ; extra == 'formats'
Provides-Extra: gazetteers
Requires-Dist: matchtext ; extra == 'gazetteers'
Requires-Dist: recordclass ; extra == 'gazetteers'
Provides-Extra: github
Requires-Dist: flake8 ; extra == 'github'
Requires-Dist: pytest ; extra == 'github'
Requires-Dist: pytest-cov ; extra == 'github'
Requires-Dist: pytest-pep8 ; extra == 'github'
Provides-Extra: java
Requires-Dist: py4j ; extra == 'java'
Provides-Extra: nltk
Requires-Dist: nltk (>=3.5) ; extra == 'nltk'
Provides-Extra: notebook
Requires-Dist: ipython ; extra == 'notebook'
Requires-Dist: ipykernel ; extra == 'notebook'
Requires-Dist: jupyterlab ; extra == 'notebook'
Requires-Dist: notebook ; extra == 'notebook'
Requires-Dist: voila ; extra == 'notebook'
Requires-Dist: RISE ; extra == 'notebook'
Requires-Dist: ipywidgets ; extra == 'notebook'
Provides-Extra: ray
Requires-Dist: ray[default] ; extra == 'ray'
Provides-Extra: spacy
Requires-Dist: spacy (>=2.2) ; extra == 'spacy'
Provides-Extra: stanza
Requires-Dist: stanza (>=1.3.0) ; extra == 'stanza'

# Python library gatenlp

[![PyPi version](https://img.shields.io/pypi/v/gatenlp.svg)](https://pypi.python.org/pypi/gatenlp/)
[![Python compatibility](https://img.shields.io/pypi/pyversions/gatenlp.svg)](https://pypi.python.org/pypi/gatenlp/)
[![Downloads](https://static.pepy.tech/personalized-badge/gatenlp?period=week&units=none&left_color=blue&right_color=yellow&left_text=Downloads/week)](https://pepy.tech/project/gatenlp)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/gatenlp)](https://pypistats.org/packages/gatenlp)
[![License](https://img.shields.io/github/license/GateNLP/python-gatenlp.svg)](LICENSE)
[![GitHub Build Status](https://github.com/GateNLP/python-gatenlp/actions/workflows/python-package.yml/badge.svg?branch=main)](https://github.com/GateNLP/python-gatenlp/actions/workflows/python-package.yml)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![Join the chat at https://gitter.im/GateNLP/python-gatenlp](https://badges.gitter.im/GateNLP/python-gatenlp.svg)](https://gitter.im/GateNLP/python-gatenlp?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/ccc55f10e7f5479e9a882ec3aee3222a)](https://www.codacy.com/gh/GateNLP/python-gatenlp/dashboard?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=GateNLP/python-gatenlp)
[![Updates](https://pyup.io/repos/github/GateNLP/python-gatenlp/shield.svg)](https://pyup.io/repos/github/GateNLP/python-gatenlp/)
[![Python 3](https://pyup.io/repos/github/GateNLP/python-gatenlp/python-3-shield.svg)](https://pyup.io/repos/github/GateNLP/python-gatenlp/)
[![Documentation Status](https://readthedocs.org/projects/gatenlp/badge/?version=latest)](https://gatenlp.readthedocs.io/en/latest/?badge=latest)


![Python GateNLP](https://github.com/GateNLP/python-gatenlp/blob/main/docs/logo/gateNLP-423x145.png)

Python package for:
* Representing documents, annotations, annotation and document features etc. 
* Processing documents with powerful NLP libraries like Stanza, Spacy, NLTK and represent results as gatenlp annotations and features
* Visualize annotations and features using interactive HTML and allow those visualizations to be included in Jupyter or Colab notebooks
* Using a powerful pattern matching library (PAMPAC) to match complex patterns based on text, annotations and features and create new annotations
* Interact with and use Java GATE from Python
* Allow to use Python processing from Java GATE via the GATE Python plugin
* Provide abstractions for building complex processing pipelines

## Documentation and feedback

* Documentation:
  * [GitHub](https://gatenlp.github.io/python-gatenlp/) 
  * [ReadTheDocs](https://gatenlp.readthedocs.io/en/latest/)
* PythonDoc:
  * [GitHub](https://gatenlp.github.io/python-gatenlp/pythondoc/gatenlp/)
  * [ReadTheDocs](https://gatenlp.readthedocs.io/en/latest/pythondoc/gatenlp/)

If you find bugs, want to requrest a feature or change, please use the [issue tracker](https://github.com/GateNLP/python-gatenlp/issues)

For more general discussions about the package and communication within current and future users, please use the [Dicussions](https://github.com/GateNLP/python-gatenlp/discussions)


## Overview

Python GateNLP is an NLP and text processing framework implemented in Python. 

Python GateNLP represents documents and stand-off annotations very similar to 
the [Java GATE framework](https://gate.ac.uk/): Annotations describe arbitrary character ranges in the text and each annotation can have an arbitrary number of _features_.  Documents can have arbitrary features and an arbitrary number of named _annotation sets_, where each annotation set can have an arbitrary number of annotations which can overlap in any way. Python GateNLP documents can be exchanged with Java GATE by using the bdocjs/bdocym/bdocmp formats which are supported in Java GATE via the [Format Bdoc Plugin](https://gatenlp.github.io/gateplugin-Format_Bdoc/)

Other than many other Python NLP tools, GateNLP does not require a specific way of how text is split up into tokens, tokens can be represented by annotations in any way, and a document can have different ways of tokenization simoultanously, if needed. Similarly, entities can be represented by annotations without restriction: they do not need to start or end at token boundaries and can overlap arbitrarily. 

GateNLP provides ways to process text and create annotations using annotating pipelines, which are sequences of one or more annotators. 
There are annotators for matching text against gazetteer lists and annotators for complex matching of annotation and text sequences (see [PAMPAC](pampac)).

There is also support for creating GateNLP annotations with other NLP packages like Spacy or Stanford Stanza.

The GateNLP document representation also optionally allows to track all changes
done to the document in a "change log" (a `gatenlp.ChangeLog` instance).
Such changes can later be applied to other Python GateNLP or to  Java GATE documents.

This library also implements the functionality for the interaction with
a Java GATE process in two different ways:
* The Java GATE Python plugin can invoke a python process to annotate GATE documents
  with python code
* the python code can remote-control a Jave GATE instance

## Versions and Roadmap

* Versions 0.x are unpublished
* Versions 1.0.x are public releases with feedback that may change APIs and change main parts of the software
* Versions 1.x are public stable releases

## Default branch renamed to "main"

If you have a cloned copy, you need to rename it in your local copy as well:
```
git branch -m master main
git fetch origin
git branch -u origin/main main
```



---

**NOTE: The previous Pypi project "gatenlp" has moved to [gatenlphiltlab](https://github.com/nickwbarber/gatenlphiltlab)**



