Metadata-Version: 2.1
Name: ddhi-encoder
Version: 1.3.0
Summary: Encoding tools for DDHI
Home-page: https://github.com/agile-humanities/ddhi-encoder/
Author: Clifford Wulfman
Author-email: cliff@agilehumanities.ca
License: mit
Project-URL: Documentation, https://github.com/agile-humanities/ddhi-encoder/blob/master/README.rst
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Requires-Python: >=3.6
Description-Content-Type: text/x-rst; charset=UTF-8
License-File: LICENSE.txt
License-File: AUTHORS.rst
Requires-Dist: docx2python (>=1.19.0)
Requires-Dist: lxml (>=4.6.2)
Requires-Dist: spacy (>=3.1.0)
Provides-Extra: testing
Requires-Dist: pytest ; extra == 'testing'
Requires-Dist: pytest-cov ; extra == 'testing'

A collection of command-line utilities to assist in the creation of
TEI-encoded oral history interviews for the Dartmouth Digital
History Initiative.

.. _ddhi-encoder-1:

Installation
============

Use pip to install this package:

.. code:: bash

   pip install ddhi-encoder

To peform named-entity tagging with ``ddhi_tag``, you will need a Spacy
model. Before running ``ddhi_tag``, install Spacy's small English model:

.. code:: bash

   python -m spacy download en_core_web_sm

See `the Spacy documentation <https://spacy.io/models>`__ for more
information.

Use
===


Use ``ddhi_convert`` to transform a DOCX-encoded transcription into a
simply structured TEI document.

.. code:: bash

   ddhi_convert ~/Desktop/transcripts/zien_jimmy_transcript_final.docx -o tmp.tei.xml

Use ``ddhi_tag`` to add named-entity tags to a TEI-encoded
transcription:

.. code:: bash

   ddhi_tag -o zien.tei.xml tmp.tei.xml

Encoders are then expected to edit the text of the interview,
correcting automatically generated named-entity tags and adding new
ones.

Use ``ddhi_generate_standoff`` to  create a ``<standOff>`` element in the
interview and link the entities to names in the text.

Use ``ddhi_mentioned_places`` to extract the places in a TEI file's
standoff markup and print it as tab-separated values:

.. code:: bash

	  ddhi_mentioned_places lovely.tei.xml > lovely.tsv

Then use OpenRefine or another tool to refine this list with
identifiers and other metadata.

Use ``ddhi_update_places`` to update the places in a TEI file's
standoff markup with identifiers and geo-coordinates obtained via
OpenRefine or other procedure:

.. code:: bash

	  ddhi_update_places lovely.tei.xml lovely_updates.tsv >
	  updated_lovely.tei.xml
	  
Similarly, use ``ddhi_mentioned_events`` and ``ddhi_update_events`` to
perform the same operations for events.


