Metadata-Version: 2.1
Name: spacy-data-debug
Version: 0.0.3
Summary: spaCy Data Debug has utilities to help you debug your custom NER data. It checks for inconsistencies in labels for the same text.
Home-page: https://github.com/kabirkhan/spacy_data_debug
Author: Kabir Khan
Author-email: kakh@microsoft.com
License: Apache Software License 2.0
Description: <!--
        
        #################################################
        ### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
        #################################################
        # file to edit: index.ipynb
        # command to build the docs after a change: nbdev_build_docs
        
        -->
        
        # spaCy Data Debug
        
        > spaCy Data Debug has utilities to help you debug your custom NER data. It checks for inconsistencies in labels for the same text, 
        
        
        ## Install
        
        `pip install spacy-data-debug`
        
        ## How to use
        <div class="codecell" markdown="1">
        <div class="input_area" markdown="1">
        
        ```
        from pathlib import Path
        import srsly
        from spacy_data_debug.core import *
        from spacy_data_debug.pipeline import *
        ```
        
        </div>
        
        </div>
        
        ### 0. Load your Data in the Prodigy Annotation Format
        <div class="codecell" markdown="1">
        <div class="input_area" markdown="1">
        
        ```
        train = list(srsly.read_jsonl(base_dir / "train.jsonl"))
        dev = list(srsly.read_jsonl(base_dir / "dev.jsonl"))
        test = list(srsly.read_jsonl(base_dir / "test.jsonl"))
        ```
        
        </div>
        
        </div>
        
        ### Clean, format and filter overlapping entities
        While working on a large annotation projects the format of your data can get weird from different annotation sessions by different people.
        This ensures you have data in a format useful for the other functions in this `spacy-data-debug`
        <div class="codecell" markdown="1">
        <div class="input_area" markdown="1">
        
        ```
        train = fix_annotations_format(train)
        dev = fix_annotations_format(dev)
        test = fix_annotations_format(test)
        ```
        
        </div>
        
        </div>
        
        ### Or construct a Pipeline
        A `Pipeline` holds your datasets together and runs `spacy_data_debug` functions across all datasets.
        This can make sure you have consistent annotations across your datasets split
        <div class="codecell" markdown="1">
        <div class="input_area" markdown="1">
        
        ```
        pipeline = Pipeline(train, dev, test)
        pipeline.apply(fix_annotations_format)
        ```
        
        </div>
        
        </div>
        
Keywords: spacy,data,machine learning,nlp,natural language processing,ner,named entity recognition
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/markdown
