Metadata-Version: 2.1
Name: liwc
Version: 0.4.2
Summary: Linguistic Inquiry and Word Count (LIWC) analyzer (proprietary data not included)
Home-page: https://github.com/chbrown/liwc-python
Author: Christopher Brown
Author-email: chrisbrown@utexas.edu
License: MIT
Platform: UNKNOWN
Description-Content-Type: text/markdown

# `liwc`

[![PyPI version](https://badge.fury.io/py/liwc.svg)](https://pypi.org/project/liwc/)
[![Travis CI Build Status](https://travis-ci.org/chbrown/liwc-python.svg?branch=master)](https://travis-ci.org/chbrown/liwc-python)

Linguistic Inquiry and Word Count (LIWC) analyzer.

The LIWC lexicon is proprietary, so it is _not_ included in this repository,
but this Python package requires it.
The lexicon data can be acquired (purchased) from [liwc.net](http://liwc.net/).
This package reads from the `LIWC2007_English100131.dic` (MD5: `2a8c06ee3748218aa89b975574b4e84d`) file,
which must be available on any system where this package is used.

The LIWC2007 `.dic` format looks like this:

    %
    1   funct
    2   pronoun
    [...]
    %
    a   1   10
    abdomen*    146 147
    about   1   16  17
    [...]


## Setup

Install from [PyPI](https://pypi.python.org/pypi/liwc):

    pip install -U liwc


## Example

```python
import re
from collections import Counter

def tokenize(text):
    # you may want to use a smarter tokenizer
    for match in re.finditer(r'\w+', text, re.UNICODE):
        yield match.group(0)

import liwc
parse, category_names = liwc.load_token_parser('LIWC2007_English100131.dic')
```

* `parse` is a function from a token of text (a string) to a list of matching LIWC categories (a list of strings)
* `category_names` is all LIWC categories in the lexicon (a list of strings)

```python
gettysburg = '''Four score and seven years ago our fathers brought forth on
  this continent a new nation, conceived in liberty, and dedicated to the
  proposition that all men are created equal. Now we are engaged in a great
  civil war, testing whether that nation, or any nation so conceived and so
  dedicated, can long endure. We are met on a great battlefield of that war.
  We have come to dedicate a portion of that field, as a final resting place
  for those who here gave their lives that that nation might live. It is
  altogether fitting and proper that we should do this.'''
gettysburg_tokens = tokenize(gettysburg)
# now flatmap over all the categories in all of the tokens using a generator:
gettysburg_counts = Counter(category for token in gettysburg_tokens for category in parse(token))
# and print the results:
print(gettysburg_counts)
```


## License

Copyright (c) 2012-2019 Christopher Brown.
[MIT Licensed](LICENSE.txt).


