Metadata-Version: 1.1
Name: SPLAT-library
Version: 0.1.0
Summary: Speech Processing & Linguistic Analysis Tool
Home-page: http://splat-library.org
Author: Benjamin S. Meyers
Author-email: ben@splat-library.org
License: - - - -
## SPLAT
The MIT License (MIT)

Copyright (c) 2015 Benjamin S. Meyers < <ben@splat-library.org> >

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

- - - -
## NLTK
The Natural Language ToolKit (NLTK) is a Python library the provides a wide selection of natural language processing functions.
For documentation and other information, please visit [www.nltk.org](http://www.nltk.org/).

- - - -
## CORPORA
# Brown Corpus
Corpus: brown-corpus.pkl
Project Site: <http://clu.uni.no/icame/brown/bcm.html>
Word Count: ~1,000,000
The Brown University Standard Corpus of Present-Day American English (or just Brown Corpus) was compiled in the 1960s by Henry Kučera and W. Nelson Francis at Brown University, Providence, Rhode Island as a general corpus (text collection) in the field of corpus linguistics. It contains 500 samples of English-language text, totaling roughly one million words, compiled from works published in the United States in 1961.
# English Open Word List
Corpus: EOWL-corpus.pkl
Project Site: <http://dreamsteep.com/projects/the-english-open-word-list.html>
Word Count: 128,985
This corpus was created using Ken Loge's English Open Word List.
# Google 20,000 Most Common English
Corpus: google-20000-english.pkl
Project Site: <https://github.com/first20hours/google-10000-english>
Word Count: 20,000
This corpus was created using the 20,000 most common English words from the Google Trillion-Word Corpus.
# Gutenberg Corpus
Corpus: gutenberg-corpus.pkl
Project Site: <https://www.gutenberg.org/wiki/Gutenberg:Terms_of_Use>
Word Count:
This corpus was created using the Gutenberg corpus available in NLTK.
# NLTK Stopwords
Corpus: stopwords-corpus.pkl
Project Site: <https://github.com/nltk/nltk>
Word Count:
The corpus was created using the Stopwords corpus available in NLTK.

- - - -
## The Berkeley Parser
The files listed below are part of the [Berkeley Parser](https://github.com/slavpetrov/berkeleyparser):
* /parse/BerkeleyParser-1.7.jar
* /parse/eng_sm6.gr

For questions regarding the Berkeley Parser, please contact Slav Petrov < <petrov@cs.berkeley.edu> >.

- - - -
## Graham Neubig
The functions listed below were adapted from [this script](https://github.com/neubig/util-scripts/blob/96c91e43b650136bb88bbb087edb1d31b65d389f/syntactic-complexity.py):
* /complexity/Util.get_word_score()
* /complexity/Util.is_sentence()
* /complexity/Util.calc_yngve_score()
* /complexity/Util.calc_frazier_score()
* /complexity/Util.get_yngve_score()
* /complexity/Util.get_frazier_score()

Permission to use this script was granted by the code owner, Graham Neubig. For related questions, you may contact 
him via email < <neubig@is.naist.jp> > or you can visit his [website](http://www.phontron.com/index.php).
Download-URL: https://github.com/meyersbs/SPLAT/tarball/0.1.0
Description: SPLAT is a command-line application designed to make it easy for linguists (both computer-oriented and non-computer-oriented) to use the Natural Language Tool Kit (NLTK) for analyzing virtually any text file. SPLAT is designed to help you gather linguistic features from text files and it is assumed that most input files will not be already annotated. In order for SPLAT to function properly, you should ensure that the input files that you provide do not contain any annotations. Because there are so many variations of linguistic annotation schemes, it would simply be impossible to account for all of them in the initial parsing of input files; it is easier for you to remove any existing annotations than it is for me to do so.
Platform: UNKNOWN
