Metadata-Version: 2.1
Name: implicit-word-network
Version: 0.0.4
Summary: A python package for extracting and exploring context-enriched word networks from corpora
Home-page: https://gitlab.inf.uni-konstanz.de/julian.schelb/implicit-word-network
Author: Julian Schelb
Author-email: julian.schelb@uni-konstanz.de
License: UNKNOWN
Project-URL: Bug Tracker, https://gitlab.inf.uni-konstanz.de/julian.schelb/implicit-word-network/-/issues
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

# Implicit Word Network

## Introduction
This python package can be used to extract context-enriched implicit word networks as described by Spitz and Gertz. The theoretical background is explained in the following publications:

   1. Spitz, A. (2019). Implicit Entity Networks: A Versatile Document Model. Heidelberg University Library. https://doi.org/10.11588/HEIDOK.00026328
   2. Spitz, A., & Gertz, M. (2018). Exploring Entity-centric Networks in Entangled News Streams. In Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW ’18. Companion of the The Web Conference 2018. ACM Press. https://doi.org/10.1145/3184558.3188726

## Dependencies

This project uses models from the spaCy and sentence_transformers package. These packages are not installed automatically. You can use the following commands to install them.

```console
pip install sentence_transformers
pip install spacy
python -m spacy download en_core_web_sm
```

## Example Usage

```python

import spacy as sp
import implicit_word_network as wn

# Path to text file
path = "data.txt"

# Entities to search for in corpus
entity_types = ["PERSON", "LOC", "NORP", "ORG", "WORK_OF_ART"]

c = 2  # Cut-off parameter

# Importing data ...
D = wn.readDocuments(path)

# Parsing data ...
nlp = sp.load("en_core_web_sm")
D_parsed = wn.parseDocuments(D, entity_types, nlp=nlp)

# Converting parsing results ...
D_mat = wn.createCorpMat(D_parsed)

# Building graph ...
V, Ep = wn.buildGraph(D_mat, c)

```

