CHANGELOG

2.17.5

added dynamic_heads to transcribe() and align() (32235fa)
added pipeline_kwargs to load_hf_whisper() (024d7dc)
added "large-v3-turbo" and "turbo" to HF_MODELS (024d7dc)
updated Whisper requirement to >=20230314,<=20240930 (453013c, df8dace)
updated Whisper compatibility warning message (453013c, df8dace)
updated compatibility with Whisper v20240930 (df8dace)
updated align() and transcribe_stable() compatibility with latest Faster-Whisper commit (024d7dc)

2.17.4

deprecated vad_onnx (b309530)
added optional dependencies for Faster Whisper and Hugging Face (c541169)
added nonspeech_skip (888181f)
fixed #393 (1ee47ce)
fixed stabilization.utils.mask2timing() to handle edge cases (e0e7183)
fixed suppress_silence=False performing unnecessary compute when vad=True (888181f)
fixed typos in docstrings (e0e7183)
updated refine() docstring in README (3bc76b9)
updated vad to accept a dict of keyword arguments for loading VAD (b309530)

2.17.3

added pad() to result.WhisperResult (689fe5e)
added newline to merge_by_gap() and merge_by_punctuation() (689fe5e)
fixed verbose for adjust_by_silence() (f53f2ee)
fixed adjustment progress bar in non_whisper.transcribe_any() (48d70a8)
fixed error from using tag/--tag when output format is VTT and word_level=True (3997ef1)
fixed segment merging methods not working when the result contains only segment-level timestamps (689fe5e)
updated merge_by_gap() and merge_by_punctuation() docstrings with newline (3ab74e7)

2.17.2

changed SRT to start from index 1 (9f8db52)
changed reset() to be consistent for results produces by all transcribe() variants (864b76c)
fixed #357 (98923ea)
fixed refine() not working when verbose is not True (864b76c)
fixed progress bar warning for refine() (864b76c)

2.17.1

fixed #353 (66f8d13)
fixed align() error when audio segment contains no detectable nonspeech/silent sections (6d9a1ef)
fixed gap_padding causing unpredictable gaps or delays in the final timestamps for align() (6d9a1ef)
updated align() (6d9a1ef)

2.17.0

added min_silence_dur to align() and all variants of transcribe() (e2f9458)
added pad_or_trim() to whisper_compatibility (c4d42f2)
changed align() to ignore compatibility issues for Fast-Whisper models (c4d42f2)
changed align() to prioritize new timestamps within rounding error (5ca7ca5)
changed align() to prioritize timestamps that least overlap nonspeech timings (e2f9458)
changed silence suppression to be less aggressive (e2f9458)
changed silence suppression to treat nonspeech sections that overlap a word as individual sections (5ca7ca5)
dropped Whisper dependency for stable-ts-whisperless (c4d42f2)
fixed result.WordTIming.suppress_silence() by undoing changes in e2f9458 (0546d76)
fixed discrepancy between text and output for align() (e2f9458)
changed default of align() to presplit=False on faster-whisper models (850a19f)
updated README.md with setup instructions for stable-ts-whisperless (c4d42f2)
updated use_word_position=True to also take into account the index of each word (5ca7ca5)

2.16.0

deprecated suppress_attention (5513609)
deprecated ts_num and ts_noise (5513609)
added noisereduce as a supported denoisers (03bb83b)
added engine to load_model() (5513609)
added extra_models, to align() and transcribe() (5513609)
added presplit and gap_padding to align() (5513609)
fixed docstring of adjust_by_silence() (5513609)
fixed dfnet denoiser model to use specified device (5513609)
fixed error from progress=True when denoiser='noisereduce' (5513609)
fixed incorrect titles when downloading audio with yt-dlp(5513609)
changed 'demucs' and 'dfnet' denoisers to denoise in 2 channels when stream=False (5513609)
improved word timing by making gap_padding more effective (5513609)

2.15.11

fixed inaccurate progress bar in result.WhisperResult.suppress_silence() (ad013d7)
replaced update_all_segs_with_words() in the refine() with reassign_ids() (ad013d7)
updated --align to treat the argument as plain-text if the argument starts with 'text=' (ad013d7)

2.15.10

added --persist / -p to CLI (177bcc4)
added suppress_attention to transcribe() and align() for original Whisper (177bcc4)
fixed align() failing to predict nonspeech timings after skipping a nonspeech section (424f484)
fixed typo (#324) (dbee5c5)

2.15.9

changed WhisperResult to allow initialization without data (00ad4b4)
fixed Segment.copy() failing to initialize WordTiming when new_words=None and copy_words=False (00ad4b4)
fixed WhisperResult.duration to return 0.0 if result contains no segments (00ad4b4)
fixed WhisperResult.has_words to return False if result contains no segments (00ad4b4)

2.15.8

fixed Whisper.fill_in_gaps() (cbbad76)
removed end >= start requirement for Segment (cbbad76)
updated warning message for out of order timestamps (cbbad76)

2.15.7

deprecated Segment.update_seg_with_words() and WhisperResult.update_all_segs_with_words() (ff89e53)
changed start, end, text, tokens of Segment to properties (ff89e53)
deprecated and replace WordTiming.round_all_timestamps() with round_ts=True at initialization (ff89e53)
added progress bar for timestamps adjustments (ff89e53)
speed up splitting and merging of segments (ff89e53)
removed redundant parts of the default regrouping algorithm (ff89e53)

2.15.6

added pipeline to stable_whisper.load_hf_whisper() (c356491)
changed language, task, batch_size to optional parameters for the WhisperHF.transcribe() (c356491)
fixed English models not working for WhisperHF (c356491)
fixed get_device() for 'mps' (53272cb)

2.15.5

WhisperHF.transcribe() can now take generation parameters supported by Transformers (133f323)
added logic to replace None timestamps returned by Hugging Face Whisper models (8bbe0c5)
changed whisper_word_level.hf_whisper.load_hf_pipe() model loading method(a684fb4)

2.15.4

added DeepFilterNet (https://github.com/Rikorose/DeepFilterNet) as supported denoiser (3fafd04)
added Whisper on Hugging Face Transformers to CLI (3fafd04)
fixed CLI throwing OSError when input is a URL and --output is not specified (3fafd04)
fixed WhisperHF.transcribe() unable to load when audio is URL or certain formats (3fafd04)

2.15.3

added support for Whisper on Hugging Face Transformers (9197b5c)
fixed non-speech suppression not working properly for transcribe_any() (9197b5c)

2.15.2

changed default to dtype=numpy.int32 for all Numpy int arrays (3886bc6)

2.15.1

removed shell=True in .audio.utils.get_metadata() (e8f72a3)

2.15.0

added "「" to prepend_punctuations and "」" to append_punctuations (9968a45)
added AudioLoader class for handling general audio loading (9968a45)
added NonSpeechPredictor class for handling non-speech detection (9968a45)
added default.py to hold global default states (9968a45)
added failure_threshold to align() (9968a45)
added stream to functions that use AudioLoader internally (9968a45)
added progress bars for VAD and Demucs operations (9968a45)
changed text normalization for align() (6d0746c)
changed WhisperResult to ignore segments with no words (6d0746c)
changed nonspeech_error default from 0.3 to 0.1 for all functions (9968a45)
changed nonspeech_skip default from 3.0 to 5.0 for align() (9968a45)
changed use_word_position behavior (9968a45)
changed to load Demucs into cache for reuse by default (9968a45)
deprecated and replaced demucs and demucs_options with denoiser and denoiser_options (9968a45)
dropped ffmpeg-python dependency (9968a45)
dropped dependencies: more-itertools, transformers (9968a45)
fixed align() producing empty word slices (6d0746c)
fixed refine() exceeding the max token count (#297) (f6d61c2)
fixed issues in transcribe_any() caused by unspecified samplerate (9968a45)
fixed vad=True causing first word of segment to be grouped with previous segment (9968a45)
refactored audio.py, stabilization.py, whisper_word_level.py into subpackages (9968a45)
removed demucs_output (9968a45)

2.14.4

added output_demo.mp4 (395c8a9)
fixed align() throwing UnsortedException (f9ca03b)
fixed original_split=True failing when there are more than one consecutive newlines (f9ca03b)
fixed (align() IndexError)(#292 (comment)) (f9ca03b)

2.14.3

added trust_repo=True for loading Silero-VAD (a6b2b05)
added 'master' to the branch for loading Silero-VAD (a6b2b05)
fixed align() failing for faster whisper with certain languages (677f233)
fixed result.WhisperResult.apply_min_dur() and result.Segment.apply_min_dur() to work as intended (be2985e)
removed resampling_method="kaiser_window" for all calls of torchaudio.functional.resample() (a6b2b05)

2.14.2

updated align() logic (738fd98)
added nonspeech_skip to align() (738fd98)
added show_unsorted to result.WhisperResult.__init__() and result.WhisperResult.raise_for_unsorted() (738fd98)
added use_word_position to methods that support non-speech/silence suppression (738fd98)
fixed result.WhisperResult.force_order() to handle data with multiple consecutive unsort timestamps (738fd98)
fixed empty segment removal to work as intend for result.WhisperResult (ef0a87e)
updated README.md to directly included the docstrings instead of hyperlinks (738fd98)
updated result.save_as_json() to include ensure_ascii=False as default (738fd98)
added kwargs to result.save_as_json() (738fd98)
updated demo videos (3524aa2)

2.14.1

fixed result.WhisperResult.force_order() causing IndexError (0430a31)
updated README.md (bc4601f)

2.14.0

added nonspeech_sections property to result.WhisperResult (191674b)
added nonspeech_error for silence suppression (191674b)
changed min_word_dur behavior for silence suppression (191674b)
changed silence suppression behavior (191674b)
updated README.md (191674b)

2.13.7

fixed result.WhisperResult.split_by_punctuation() not working if min_words/min_chars/min_dur are unspecified (d51edb6)

2.13.6

added show_regroup_history() to result.WhisperResult (df4a199)
added new attribute, regroup_history, to .result.WhisperResult (df4a199)
added min_words, min_chars, min_dur to result.WhisperResult.split_by_punctuation() (df4a199)
updated README.md (e86c571)

2.13.5

added get_content_by_time() to result.WhisperResult (900797a)
added get_result() to result.Segment (900797a)
added get_segment() to result.WordTiming (900797a)
added text_ouput.result_to_txt()/result.WhisperResult.to_txt() (900797a)
added editing methods to result.WhisperResult: remove_word(), remove_segment(), remove_repetition(), remove_words_by_str(), fill_in_gaps() (900797a)
added editing methods to list of 'method keys' in result.WhisperResult.regroup() (900797a)
changed result.Segment.to_display_str() to enclose segment text in double quotes (900797a)
implemented __getitem__ and __delitem__ for result.Segment and result.WhisperResult (900797a)
updated docstrings of whisper_word_level.load_model() and whisper_word_level.load_faster_whisper() (900797a)

2.13.4

added result.WhisperResult.split_by_duration() (71b9f1f)
fixed newline=True for result.WhisperResult._split_segments() (71b9f1f)
fixed docstring of result.WhisperResult.split_by_length() (71b9f1f)
updated Whisper to v20231117 (71b9f1f)

2.13.3

added --faster_whisper, -fw to CLI (a038ad1)
added --locate, -lc to CLI (a038ad1)
changed alignment.align() to be compatible with faster-whisper (a038ad1)
changed verbose behavior for alignment.locate() (a038ad1)
fixed inconsistent syntax and typo in docstrings (a038ad1)
removed assertions for checking timestamp order when using __add__() with result.Segment or result.WordTiming (a038ad1)

2.13.2

added newline to split_by_gap(), split_by_punctuation(), split_by_length() (b336735)
added progress_callback to whisper_word_level.load_faster_whisper.faster_transcribe() (b336735)
fixed #241 (5c512a1)
refactored _COMPATIBLE_WHISPER_VERSIONS, _required_whisper_ver, warn_compatibility_issues() (b336735)
updated README.md (3dfbd72)
updated --model for CLI to be compatible with checkpoint paths (b336735)
merge_all_segments() with faster logic (b336735)
updated verbose for .whisper_word_level.load_faster_whisper.faster_transcribe() (b336735)
updated whisper version to v20231106 (b336735)

2.13.1

added avg_prob_threshold to whisper_word_level.transcribe_stable() (58ece35)
added fast_mode to alignment.align() (58ece35)
added utils.UnsortedException (eb00d29)
added word_dur_factor and max_word_dur to alignment.align() (58ece35)
changed check_sorted for result.WhisperResult to also accept a path (eb00d29)
changed clip_start default to None for result.WhisperResult.clamp_max() (58ece35)
corrected docstrings of suppress_silence and suppress_word_ts (58ece35)
fixed timing.find_alignment_stable() returning negative timestamps (58ece35)

2.13.0

added alignment.locate() (a777206)
added utils.format_timestamp() and utils.make_safe() (a777206)
added utils.safe_print() (a777206)
added demucs, demucs_options, only_voice_freq to alignment.refine() (a777206)
added to_display_str() to result.Segment (a777206)
added demucs_options to whisper_word_level.load_faster_whisper.faster_transcribe() (a777206)
updated --output / -o (a777206)
changed audio to always expected to be 16kHz for torch.Tensor or numpy.ndarray (a777206)
fixed alignment.align() failing if text a result.WhisperResult without tokens (a777206)
fixed original_split=True by replacing line breaks with space (97a316d)
fixed result_to_ass() failing to return to base color when using tag (83ae509)
improved efficiency of segment splitting for alignment.align() when original_split=True (a777206)
refactored the audio preprocessing into audio.prep_audio() (a777206)
removed _is_whisper_repo_version from utils.py (a777206)
renamed original_spit to original_split for alignment.align() (a777206)
set action="extend" for all CLI keyword arguments that take multiple values (a777206)
changed demucs to also accept a Demucs model instance(a777206)
deprecated time_scale, input_sr, demucs_output, demucs_device (a777206)
updated docstrings (a777206)

2.12.3

updated alignment.align() to raise warning on failure (b9ac041)
changed language into a required parameter (b9ac041)
fixed alignment.align() endlessly looping (b9ac041)

2.12.2

added original_spit to alignment.align() (45bd3bc)
ignore DecodingOptions for alignment (1fb3009)

2.12.1

changed abs_dur_change default to None (dd1452e)
changed abs_prob_decrease default to 0.5 (dd1452e)
changed alignment.refine() allow durations to increase (dd1452e)
changed rel_prob_decrease default to 0.3 (dd1452e)
changed rel_rel_prob_decrease to optional (dd1452e)
changed the usage of original probability in alignment.refine() (dd1452e)
fixed CLI not using decode_options (9aba3dc)
fixed adjust_by_silence() throwing TypeError (92d51b9)
updated README.md 3643092)

2.12.0

added --align to CLI (c90ff06)
added alignment.refine() for refining timestamps (138cb6b)
added --refine and --refine_option to CLI (138cb6b)
added segment_id and id to result.WordTiming (138cb6b)
added description to transcription progress bar (138cb6b)
fixed align() not working when text is a result.WhisperResult (138cb6b)
fixed transcribe() throwing error if suppress_silence=False (138cb6b)
updated README.md (c90ff06)

2.11.7

fixed --debug not showing the first option (857df9a)
fixed demucs and only_voice_freq for transcribe_stable() (7f62a9d)
fixed demucs for transcribe_minimal() (857df9a)
fixed only_voice_freq for transcribe_minimal() (7f62a9d)
fixed progress bar for faster-whisper (7f62a9d)
updated transcribe_minimal() to accept more options (857df9a)
updated transcribe_stable() for faster-whisper models to accept more options (7f62a9d)

2.11.6

delete _demo directory (66f4376)
fixed #216 (1732ac0)

2.11.5

added 'us' as method key to WhisperResult.regroup() (da33bf5)
added --demucs_option, --model_option, --transcribe_option, --save_option to CLI (da33bf5)
added --transcribe_method to CLI (da33bf5)
added Segment.words_by_lock(), WhisperResult.all_words_by_lock() (da33bf5)
added strip to WhisperResult.lock() (e98c3d6)
fixed docstring of WhisperResult.lock() (05bba74)
improved --debug for CLI (da33bf5)
improved even_split=True for WhisperResult.split_by_length() (da33bf5)
updated docstring of WhisperResult.split_by_length() (da33bf5)

2.11.4

added lock() to WhisperResult (384fc3c)
added 'l' as method key to WhisperResult.regroup() (384fc3c)
added progress bar to transcription with faster-whisper (5ac6f5e)
updated --output_format to accept multiple formats (384fc3c)
updated WhisperResult.reset() to match its initialization (384fc3c)
updated regroup() to parse regroup_algo into dict (384fc3c)

2.11.3

added check_sorted to WhisperResult (4054ca1)
added check_sorted to transcribe_any() (07eaf9e)
added round_all_timestamps() to result.Segment and result.WordTiming (4a7e52b)
changed default to word_timestamps=True for faster_transcribe() (4a7e52b)
changed raise_for_unsorted() logic (4a7e52b)
fixed WhisperResult.force_order() to work as intended (4a7e52b)

2.11.2

fixed condition_on_previous_text (641cce7)
updated Whisper version to v20230918 (641cce7)

2.11.1

added token_step to align() (ac3b38c)
delete _demo directory (b592731)
fixed #205 (ac3b38c)
updated README.md (d0340ef, ffa05a4)

2.11.0

added Whisper.adjust_by_result() (6da3dd8)
added alignment.align() (6da3dd8)
added load_faster_whisper() (6da3dd8)
fixed encode_video_comparison() unable to encode more than two subtitle files (6da3dd8)
fixed verbose not working for transcribe_minimal() (6da3dd8)
refactored compatibility warning into warn_compatibility_issues() in utils.py (6da3dd8)
refactored post-inference silence suppress into WhisperResult.adjust_by_silence() (6da3dd8)

2.10.1

added demucs_options to transcribe() (91cf2b1)
added ignore_compatibility to transcribe() (91cf2b1)
changed compatibility warning to distinguish between mismatch version number and repo version (91cf2b1)
changed heuristic for identifying Whisper version number to avoid false positives (91cf2b1)

2.10.0

added transcribe_minimal() (ef8a7f1)
added force_order to result.WhisperResult (ef8a7f1)
added max_instant_words to transcribe() (ef8a7f1)
added progress_callback to transcribe() (ef8a7f1)
changed default to clip_start=True for WhisperResult.clamp_max() (ef8a7f1)
added logic to check if the installed Whisper version is compatible (e53f4be)
fixed tag for result_to_ass() to work as intended (ea8cac8)

2.9.0

added logic to ensure ascending timestamps in result.WhisperResult (fd78cd7)
updated default regroup algorithm (fd78cd7, 77dcfdf)
updated long form transcription logic (fd78cd7)
fixed skipping words (77dcfdf)
avoid computing higher temperatures on no_speech segments (fd78cd7)
removed any segments that contains only punctuations (fd78cd7)
removed segments with 50%+ instantaneous words (fd78cd7)
updated README.md (f5b4c22)

2.8.1

allow regroup_algo to be bool for regroup() (4984163)

2.8.0

added even_split to split_by_length() (7b867d6)
changed default behavior of split_by_length() (7b867d6)
changed default to verbose=False for clamp_max() (7b867d6)

2.7.2

ignore min_word_dur when missing words timestamps (e93c280)
fixed min_word_dur not working for word timestamps (e93c280)

2.7.1

added verbose to clamp_max() (70f092f)
fixed typo in examples\non-whisper.ipynb (70f092f)

2.7.0

added clamp_max() to WhisperResult and WordTiming (bfe93ab)
added cm as method key for clamp_max() (bfe93ab)
added non_whisper.transcribe_any() (789bb54)
changed default to suppress_ts_tokens=False (789bb54)
fixed hyperlinks in README.md not linking to the latest commit (87636ef)
fixed incorrect line numbers for docstring hyperlinks (52b8b7a)

2.6.4

fixed --regroup default (af5579e)

2.6.3

added string form custom regrouping algorithm (cc352cd)

2.6.2

fixed #153 (9e3ba72)
removed max limit on audio threshold) (9e3ba72)
updated non-whisper.ipynb (da3721b, 7866462)

2.6.1

changed result.WhisperResult to only require necessary data to initialize (cdf3ea9)
added --karaoke to CLI (cdf3ea9)
updated README.md (0635e15, 2f094f8, fb23c27)

2.6.0

added support for TSV output format (d30d0d1)
changed to VTT and ASS default output to use more efficient formats (d30d0d1)
fixed non-VAD suppression not working properly (d30d0d1)
improved language detection (d30d0d1)

2.5.3

fixed #145 (efbe6b6)

2.5.2

re-added #143 (5ea52b2)

2.5.1

added logic for loading audio with yt-dlp (8960922)
added only_ffmpeg to transcribe() and CLI (8960922)
added shell=True to subprocess call (a8df3b5)

2.5.0

added classes: SegmentMatch and WhisperResultMatches (1eabb37)
added fallback logic to word alignment (1eabb37)
added find() to result.WhisperResult (1eabb37)
added suppress_ts_tokens and gap_padding to transcribe() and CLI (1eabb37)
added shell=True to is_ytdlp_available() (d2b7f3f)
fixed NaN values in the logits (1eabb37)

2.4.1

added result_to_any() (eab8319)
changed rtl to reverse_text (eab8319)

2.4.0

added offset_time() to WhisperResult, Segment, WordTiming (1447a66)
added support for audio as URLs (1447a66)
fixed language detection for English models (1447a66)

2.3.1

added split_callback (44af5c4)
changed parameters of split_callback (c003ce4)
corrected the docstring for rtl (169e014)
fixed punctuation split/merge to work as intended (a84a346)

2.3.0

added regrouping list (a0021bd)
added --max_chars and --max_words to CLI (f913d6f)
added rtl #116 (f913d6f)
corrected VAD pytorch requirement (60f668d)
fixed visualize_suppression() error when max_width=-1 (918e3ba)
fixed out of range error (918e3ba)

2.2.0

added merge_all_segments() to result.WhisperResult (7c69535)
added split_by_length() to result.WhisperResult (7c69535)

2.1.3

fixed transcription logic (d44d287)

2.1.2

added Tips to README.md (c21e198)
added new token splitting method (fa813fe)
fixed #112(3985791)
fixed #117 (3985791)
added instructions for installing demucs via error (de3c812)
added encoding='utf-8' to read_me() in setup.py (ff34b27)
updated README.md (dfb147e)

2.1.1

added mel_first (8fa5670)
fixed: to not apply min_dur on words if segments contains no words (8fa5670)
updated regroup demo video (e9932fe)

2.1.0

added 1.x to 2.x guide README.md (19ba449)
added min_dur (8c62ee1)
fixed regroup (8c62ee1)

2.0.4

fixed timestamps to jump backwards (26918d5)

2.0.3

changed default strip=True for result_to_srt_vtt() (ce4c7b3)
keep segments when if segment has no words from the start (6ccfa17)
improved stabilization.audio2loudness() efficiency (db99d6b)
fixed regroup=True when word_timestamp=sFalse (6ccfa17)
fixed word_level=False failing output when word_timestamps=False (ce4c7b3)
fixed ASS output formatting (ce4c7b3)
updated README.md (f9f7c51)

2.0.2

fixed wav2mask() when suppress_silence=True (e884e38)
fixed typo (58006ec)

2.0.1

added examples videos/images (75611f7)
updated README.md (0f2f699)

2.0.0

added segment-level and word-level support to SRT/VTT/ASS outputs (2248087)
added result.WhisperResult (2248087)
added Silero VAD support (2248087)
added visualize_suppression() (2248087)
added regrouping methods (2248087)
changed python requirement from 3.7+ to 3.8+ (2248087)
improved non-vad suppression (2248087)
improve word-level timestamps reliability (2248087)
updated README.md (eb5e68c)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

CHANGELOG

2.17.5

2.17.4

2.17.3

2.17.2

2.17.1

2.17.0

2.16.0

2.15.11

2.15.10

2.15.9

2.15.8

2.15.7

2.15.6

2.15.5

2.15.4

2.15.3

2.15.2

2.15.1

2.15.0

2.14.4

2.14.3

2.14.2

2.14.1

2.14.0

2.13.7

2.13.6

2.13.5

2.13.4

2.13.3

2.13.2

2.13.1

2.13.0

2.12.3

2.12.2

2.12.1

2.12.0

2.11.7

2.11.6

2.11.5

2.11.4

2.11.3

2.11.2

2.11.1

2.11.0

2.10.1

2.10.0

2.9.0

2.8.1

2.8.0

2.7.2

2.7.1

2.7.0

2.6.4

2.6.3

2.6.2

2.6.1

2.6.0

2.5.3

2.5.2

2.5.1

2.5.0

2.4.1

2.4.0

2.3.1

2.3.0

2.2.0

2.1.3

2.1.2

2.1.1

2.1.0

2.0.4

2.0.3

2.0.2