Metadata-Version: 2.1
Name: stt-listen
Version: 3.0.2a3
Summary: Transcribe long audio files with ASR or use the streaming interface
Home-page: https://gitlab.com/waser-technologies/technologies/listen
Author: Danny Waser
License: LICENSE
Requires-Python: >=3.8,<3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torchaudio
Requires-Dist: transformers
Requires-Dist: numpy>=1.15.1
Requires-Dist: pyaudio>=0.2.12
Requires-Dist: webrtcvad>=2.0.10
Requires-Dist: scipy>=1.1.0
Requires-Dist: toml>=0.10.2
Requires-Dist: sanic>=22.6.0
Requires-Dist: ffmpeg-python>=0.2.0
Requires-Dist: websockets>=10.3
Requires-Dist: deepmultilingualpunctuation
Requires-Dist: pyctcdecode
Requires-Dist: kenlm
Requires-Dist: fastapi
Requires-Dist: accelerate

# Listen: STT Services

This program is composed of two parts:

- A server aimed to be runned as a background service to serve ASR models within the bounds of a socket.
- A client to query the models to transcribe audio from files or directly from a live microphone stream.

The outputed wav file can be stored for later use.

You can then use the `data.helper` script to verify the transcription of every wav file and update the CSV training register before you start training a model.

## Requirements

- [`python-pyaudio`](https://people.csail.mit.edu/hubert/pyaudio/)

## Installation

Once you have a working `pyaudio` for your version of python, install `listen`.

```zsh
pip install stt-listen
# Or from source
pip install git+https://gitlab.com/waser-technologies/technologies/listen.git
```

## Usage

```zsh
❯ listen --help
usage: listen [-h] [-f FILE] [--aggressive {0,1,2,3}] [-d MIC_DEVICE]
                   [-w SAVE_WAV]

Transcribe long audio files using webRTC VAD or use the streaming interface
from a microphone

options:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  Path to the audio file to run (WAV format)
  --aggressive {0,1,2,3}
                        Determines how aggressive filtering out non-speech is.
                        (Integer between 0-3)
  -d MIC_DEVICE, --mic_device MIC_DEVICE
                        Device input index (Int) as listed by
                        pyaudio.PyAudio.get_device_info_by_index(). If not
                        provided, falls back to PyAudio.get_default_device().
  -w SAVE_WAV, --save_wav SAVE_WAV
                        Path to directory where to save recorded sentences
  --debug               Show debug info
```

## Start the server

To use `listen`, you need a socket with STT models at the ready.

Example to enable as service.

```zsh
cp ./listen.service.example /usr/lib/systemd/user/listen.service
systemctl --user enable --now listen.service
```

Models for STT and punctuation will be downloaded the first time your run the server.

Or manually with `uvicorn`.

```zsh
uvicorn listen.Wav2Vec.as_service:app --reload --port 5063
```

### Get authorization to listen

You need to authorize the system to listen first. Change the service configuration as follows.

```toml
# ~/.assistant/stt.toml
...
[stt]
is_allowed = true
...
```

Then [start the server](#start-the-server) and use `listen` to start [transcribing audio](#use-the-client).

## Use the client

### Transcribe a file

You can quickly transcribe a wav file.

```zsh
❯ listen -f audio.wav
Filename                       Duration(s)         
audio.wav                      3.580               

❯ cat audio.txt
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: audio.txt
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ Bonjour.
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────
```

### Transcribe from a live microphone stream

You can also query the models in real time from a microphone.

```zsh
❯ listen
You can speak now.
Bonjour.
^C
Stopped listening.
```

## Supported languages

By default, the server uses the system's language according to the environment variable `$LANG`.

You can manually specify a supported language for the server to use.

```zsh
LANG="en_US.UTF-8" uvicorn listen.Wav2Vec.as_service:app --reload --port 5063
```

This will look for a good model for this language.

You can also directly specify a model to load.

```zsh
ASR_MODEL_ID="patrickvonplaten/wav2vec2-base-100h-with-lm" uvicorn listen.Wav2Vec.as_service:app --reload --port 5063
```
