Metadata-Version: 2.1
Name: unicodecheck
Version: 1.0.2
Summary: Check if Unicode text files are Unicode-normalized
Maintainer-email: curegit <contact@curegit.jp>
License: MIT
Project-URL: homepage, https://github.com/curegit/unicodecheck
Project-URL: repository, https://github.com/curegit/unicodecheck.git
Keywords: Unicode,character encoding
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Operating System :: OS Independent
Classifier: Topic :: Text Processing
Classifier: License :: OSI Approved :: MIT License
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: chardet
Requires-Dist: rich
Provides-Extra: dev
Requires-Dist: pip; extra == "dev"
Requires-Dist: setuptools; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: mypy; extra == "dev"

# Unicodecheck

Simple tool to check if Unicode text files are Unicode-normalized

## Install

```sh
pip3 install unicodecheck
```

## Usage

### Quickstart

```sh
unicodecheck -iv SPAM.txt
```

To check files in a directory recursively:

```sh
unicodecheck -ivr Ham/Eggs/
```

### Synopsis

The main program can be invoked either through the `unicodecheck` command or through the Python main module option `python3 -m unicodecheck`.

```txt
usage: unicodecheck [-h] [-V] [-m {NFC,NFD,NFKC,NFKD}] [-d] [-u [NUMBER]] [-r] [-i] [-v]
                    PATH [PATH ...]

positional arguments:
  PATH                  describe input file or directory (pass '-' to specify stdin)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -m {NFC,NFD,NFKC,NFKD}, --mode {NFC,NFD,NFKC,NFKD}
                        target Unicode normalization (default: NFC)
  -d, --diff            show diffs between the original and normalized (default: False)
  -u [NUMBER], -U [NUMBER], --unified [NUMBER]
                        show unified diffs with NUMBER lines of context [NUMBER=3] (default: False)
  -r, --recursive       follow the directory tree rooted in each PATH argument (default: False)
  -i, --include-hidden  include hidden files and directories (default: False)
  -v, --verbose         report non-essential logs (default: False)
```

## Tips

### Check whether filenames are normalized

`convmv` command is suitable.

#### NFC

```sh
convmv -f utf8 -t utf8 --nfc -r ./
```

#### NFD

```sh
convmv -f utf8 -t utf8 --nfd -r ./
```

## Notes

- This tool doesn't provide auto in-place (write) file normalization because Unicode normalization doesn't guarantee content equivalence.
- The procedure for determining the binary file refers to Git's algorithm.

## License

MIT
