Metadata-Version: 2.3
Name: wft
Version: 0.0.1
Summary: Run Whisper fine-tuning with ease.
Project-URL: Homepage, https://github.com/jacoblincool/wft
Project-URL: Issues, https://github.com/jacoblincool/wft/issues
Author-email: Jacob Lin <jacob@csie.cool>
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Requires-Dist: accelerate
Requires-Dist: datasets>=3.0.0
Requires-Dist: evaluate
Requires-Dist: peft
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.0.0
Description-Content-Type: text/markdown

# wft

## Prepare dataset

```py
outdir = "output"
ft = (
    WhisperFineTuner(outdir)
    .prepare_dataset(
        "mozilla-foundation/common_voice_16_1",
        src_subset="zh-TW",
        src_audio_column="audio",
        src_transcription_column="sentence",
    )
    .push_dataset("JacobLinCool/mozilla-foundation-common_voice_16_1-zh-TW-preprocessed")
)
```

## Fine-tune

```py
outdir = "output"
ft = (
    WhisperFineTuner(outdir)
    .set_baseline("openai/whisper-large-v3-turbo", language="zh", task="transcribe")
    .then(lambda ft: print(ft.baseline_model))
    .load_dataset(
        "JacobLinCool/mozilla-foundation-common_voice_16_1-zh-TW-preprocessed"
    )
    .then(lambda ft: print(ft.dataset))
    .set_metric("cer")
    .set_lora_config() # use default config. see ft.default_lora_config for details
    .train() # use default config. see ft.default_training_args for details
    .merge_and_save(f"{outdir}/merged_model")
)
```

## Chaining All Together

```py
outdir = "output"
ft = (
    WhisperFineTuner(outdir)
    .prepare_dataset(
        "mozilla-foundation/common_voice_16_1",
        src_subset="zh-TW",
        src_audio_column="audio",
        src_transcription_column="sentence",
    )
    .set_baseline("openai/whisper-large-v3-turbo", language="zh", task="transcribe")
    .set_metric("cer")
    .set_lora_config() # use default config. see ft.default_lora_config for details
    .train() # use default config. see ft.default_training_args for details
    .merge_and_save(f"{outdir}/merged_model")
)
```
