Metadata-Version: 2.1
Name: nerblackbox
Version: 1.0.0
Summary: a high-level library for named entity recognition in python
Home-page: https://pypi.org/project/nerblackbox
Author: Felix Stollenwerk
Author-email: felix.stollenwerk@ai.se
License: Apache 2.0
Keywords: NLP,NER,named entity recognition,BERT,transformer,pytorch
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: Unix
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.8
Description-Content-Type: text/x-rst
License-File: LICENSE.txt
Requires-Dist: matplotlib (~=3.5.2)
Requires-Dist: mlflow (~=2.6.0)
Requires-Dist: pytorch-lightning (~=1.9.3)
Requires-Dist: omegaconf (~=2.2.2)
Requires-Dist: seqeval (~=1.2.2)
Requires-Dist: scikit-learn (~=1.1.1)
Requires-Dist: tensorboard (~=2.13.0)
Requires-Dist: tensorboardX (~=2.5.1)
Requires-Dist: transformers (~=4.31.0)
Requires-Dist: numpy (~=1.23.4)
Requires-Dist: datasets (~=2.9.0)
Requires-Dist: evaluate (~=0.4.0)
Requires-Dist: torchmetrics (~=0.11.1)
Requires-Dist: notebook (~=6.5.3)
Requires-Dist: ipywidgets (~=8.0.4)
Requires-Dist: doccano-client (~=1.2.7)
Requires-Dist: label-studio-sdk (~=0.0.19)
Requires-Dist: sentencepiece (~=0.1.99)
Provides-Extra: dev
Requires-Dist: black ; extra == 'dev'
Requires-Dist: coverage ; extra == 'dev'
Requires-Dist: jupyter ; extra == 'dev'
Requires-Dist: mkdocs-click ; extra == 'dev'
Requires-Dist: mkdocs-material ; extra == 'dev'
Requires-Dist: mkdocstrings[python] ; extra == 'dev'
Requires-Dist: mypy (~=1.0.0) ; extra == 'dev'
Requires-Dist: pip-chill ; extra == 'dev'
Requires-Dist: pylint ; extra == 'dev'
Requires-Dist: pytest ; extra == 'dev'
Requires-Dist: twine ; extra == 'dev'
Requires-Dist: tox ; extra == 'dev'
Requires-Dist: types-requests (~=2.28.11.13) ; extra == 'dev'
Requires-Dist: types-setuptools (~=67.3.0.1) ; extra == 'dev'

===========
nerblackbox
===========

A High-level Library for Named Entity Recognition in Python.

.. image:: https://img.shields.io/pypi/v/nerblackbox
    :target: https://pypi.org/project/nerblackbox
    :alt: PyPI

.. image:: https://img.shields.io/pypi/pyversions/nerblackbox
    :target: https://www.python.org/doc/versions/
    :alt: PyPI - Python Version

.. image:: https://github.com/flxst/nerblackbox/actions/workflows/python-package.yml/badge.svg
    :target: https://github.com/flxst/nerblackbox/actions/workflows/python-package.yml
    :alt: CI

.. image:: https://coveralls.io/repos/github/flxst/nerblackbox/badge.svg?branch=master
    :target: https://coveralls.io/github/flxst/nerblackbox?branch=master

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
    :target: https://github.com/psf/black

.. image:: https://img.shields.io/pypi/l/nerblackbox
    :target: https://github.com/flxst/nerblackbox/blob/latest/LICENSE.txt
    :alt: PyPI - License

Resources
=========

- Source Code: https://github.com/flxst/nerblackbox
- Documentation: https://flxst.github.io/nerblackbox
- PyPI: https://pypi.org/project/nerblackbox

Installation
============

::

    pip install nerblackbox

About
=====

.. image:: https://raw.githubusercontent.com/flxst/nerblackbox/master/docs/docs/images/nerblackbox_sources.png

Take a dataset from one of many available sources.
Then train, evaluate and apply a language model
in a few simple steps.

1. Data
"""""""

- Choose a dataset from **HuggingFace (HF)**, the **Local Filesystem (LF)**, an **Annotation Tool (AT)** server, or a **Built-in (BI)** dataset

::

    dataset = Dataset("conll2003",  source="HF")  # HuggingFace
    dataset = Dataset("my_dataset", source="LF")  # Local Filesystem
    dataset = Dataset("swe_nerc",   source="BI")  # Built-in

- Set up the dataset

::

    dataset.set_up()


2. Training
"""""""""""

- Define the training by choosing a pretrained model and a dataset

::

    training = Training("my_training", model="bert-base-cased", dataset="conll2003")

- Run the training and get the performance of the fine-tuned model

::

    training.run()
    training.get_result(metric="f1", level="entity", phase="test")
    # 0.9045


3. Evaluation
"""""""""""""

- Load the model

::

    model = Model.from_training("my_training")

- Evaluate the model

::

    results = model.evaluate_on_dataset("ehealth_kd", phase="test")
    results["micro"]["entity"]["f1"]
    # 0.9045


4. Inference
""""""""""""

- Load the model

::

    model = Model.from_training("my_training")

- Let the model predict

::

    model.predict("The United Nations has never recognised Jakarta's move.")
    # [[
    #  {'char_start': '4', 'char_end': '18', 'token': 'United Nations', 'tag': 'ORG'},
    #  {'char_start': '40', 'char_end': '47', 'token': 'Jakarta', 'tag': 'LOC'}
    # ]]

There is much more to it than that! See the `documentation <https://flxst.github.io/nerblackbox>`__ to get started.

Features
========

*Data*

* Integration of Datasets from Multiple Sources (HuggingFace, Annotation Tools, ..)
* Support for Multiple Dataset Types (Standard, Pretokenized)
* Support for Multiple Annotation Schemes (IO, BIO, BILOU)
* Text Encoding

*Training*

* Adaptive Fine-tuning
* Hyperparameter Search
* Multiple Runs with Different Random Seeds
* Detailed Analysis of Training Results

*Evaluation*

* Evaluation of Any Model on Any Dataset

*Inference*

* Versatile Model Inference (Entity/Word Level, Probabilities, ..)

*Other*

* Full Compatibility with HuggingFace
* GPU Support
* Language Agnosticism

See the `documentation <https://flxst.github.io/nerblackbox>`__ for details.

Citation
========

::

    @misc{nerblackbox,
      author = {Stollenwerk, Felix},
      title  = {nerblackbox: a high-level library for named entity recognition in python},
      year   = {2021},
      url    = {https://github.com/flxst/nerblackbox},
    }


