- added
dynamic_heads
totranscribe()
andalign()
(32235fa) - added
pipeline_kwargs
toload_hf_whisper()
(024d7dc) - added
"large-v3-turbo"
and"turbo"
toHF_MODELS
(024d7dc) - updated Whisper requirement to >=20230314,<=20240930 (453013c, df8dace)
- updated Whisper compatibility warning message (453013c, df8dace)
- updated compatibility with Whisper v20240930 (df8dace)
- updated
align()
andtranscribe_stable()
compatibility with latest Faster-Whisper commit (024d7dc)
- deprecated
vad_onnx
(b309530) - added optional dependencies for Faster Whisper and Hugging Face (c541169)
- added
nonspeech_skip
(888181f) - fixed #393 (1ee47ce)
- fixed
stabilization.utils.mask2timing()
to handle edge cases (e0e7183) - fixed
suppress_silence=False
performing unnecessary compute whenvad=True
(888181f) - fixed typos in docstrings (e0e7183)
- updated
refine()
docstring inREADME
(3bc76b9) - updated
vad
to accept adict
of keyword arguments for loading VAD (b309530)
- added
pad()
toresult.WhisperResult
(689fe5e) - added
newline
tomerge_by_gap()
andmerge_by_punctuation()
(689fe5e) - fixed
verbose
foradjust_by_silence()
(f53f2ee) - fixed adjustment progress bar in
non_whisper.transcribe_any()
(48d70a8) - fixed error from using
tag
/--tag
when output format is VTT andword_level=True
(3997ef1) - fixed segment merging methods not working when the result contains only segment-level timestamps (689fe5e)
- updated
merge_by_gap()
andmerge_by_punctuation()
docstrings withnewline
(3ab74e7)
- changed SRT to start from index 1 (9f8db52)
- changed
reset()
to be consistent for results produces by alltranscribe()
variants (864b76c) - fixed #357 (98923ea)
- fixed
refine()
not working whenverbose
is notTrue
(864b76c) - fixed progress bar warning for
refine()
(864b76c)
- fixed #353 (66f8d13)
- fixed
align()
error when audio segment contains no detectable nonspeech/silent sections (6d9a1ef) - fixed
gap_padding
causing unpredictable gaps or delays in the final timestamps foralign()
(6d9a1ef) - updated
align()
(6d9a1ef)
- added
min_silence_dur
toalign()
and all variants oftranscribe()
(e2f9458) - added
pad_or_trim()
towhisper_compatibility
(c4d42f2) - changed
align()
to ignore compatibility issues for Fast-Whisper models (c4d42f2) - changed
align()
to prioritize new timestamps within rounding error (5ca7ca5) - changed
align()
to prioritize timestamps that least overlap nonspeech timings (e2f9458) - changed silence suppression to be less aggressive (e2f9458)
- changed silence suppression to treat nonspeech sections that overlap a word as individual sections (5ca7ca5)
- dropped Whisper dependency for
stable-ts-whisperless
(c4d42f2) - fixed
result.WordTIming.suppress_silence()
by undoing changes in e2f9458 (0546d76) - fixed discrepancy between
text
and output foralign()
(e2f9458) - changed default of
align()
topresplit=False
on faster-whisper models (850a19f) - updated
README.md
with setup instructions forstable-ts-whisperless
(c4d42f2) - updated
use_word_position=True
to also take into account the index of each word (5ca7ca5)
- deprecated
suppress_attention
(5513609) - deprecated
ts_num
andts_noise
(5513609) - added noisereduce as a supported denoisers (03bb83b)
- added
engine
toload_model()
(5513609) - added
extra_models
, toalign()
andtranscribe()
(5513609) - added
presplit
andgap_padding
toalign()
(5513609) - fixed docstring of
adjust_by_silence()
(5513609) - fixed
dfnet
denoiser model to use specifieddevice
(5513609) - fixed error from
progress=True
whendenoiser='noisereduce'
(5513609) - fixed incorrect titles when downloading audio with yt-dlp(5513609)
- changed
'demucs'
and'dfnet'
denoisers to denoise in 2 channels whenstream=False
(5513609) - improved word timing by making
gap_padding
more effective (5513609)
- fixed inaccurate progress bar in
result.WhisperResult.suppress_silence()
(ad013d7) - replaced
update_all_segs_with_words()
in therefine()
withreassign_ids()
(ad013d7) - updated
--align
to treat the argument as plain-text if the argument starts with'text='
(ad013d7)
- added
--persist
/-p
to CLI (177bcc4) - added
suppress_attention
totranscribe()
andalign()
for original Whisper (177bcc4) - fixed
align()
failing to predict nonspeech timings after skipping a nonspeech section (424f484) - fixed typo (#324) (dbee5c5)
- changed
WhisperResult
to allow initialization without data (00ad4b4) - fixed
Segment.copy()
failing to initializeWordTiming
whennew_words=None
andcopy_words=False
(00ad4b4) - fixed
WhisperResult.duration
to return0.0
if result contains no segments (00ad4b4) - fixed
WhisperResult.has_words
to returnFalse
if result contains no segments (00ad4b4)
- fixed
Whisper.fill_in_gaps()
(cbbad76) - removed
end
>=start
requirement forSegment
(cbbad76) - updated warning message for out of order timestamps (cbbad76)
- deprecated
Segment.update_seg_with_words()
andWhisperResult.update_all_segs_with_words()
(ff89e53) - changed
start
,end
,text
,tokens
ofSegment
to properties (ff89e53) - deprecated and replace
WordTiming.round_all_timestamps()
withround_ts=True
at initialization (ff89e53) - added progress bar for timestamps adjustments (ff89e53)
- speed up splitting and merging of segments (ff89e53)
- removed redundant parts of the default regrouping algorithm (ff89e53)
- added
pipeline
tostable_whisper.load_hf_whisper()
(c356491) - changed
language
,task
,batch_size
to optional parameters for theWhisperHF.transcribe()
(c356491) - fixed English models not working for
WhisperHF
(c356491) - fixed
get_device()
for'mps'
(53272cb)
WhisperHF.transcribe()
can now take generation parameters supported byTransformers
(133f323)- added logic to replace
None
timestamps returned by Hugging Face Whisper models (8bbe0c5) - changed
whisper_word_level.hf_whisper.load_hf_pipe()
model loading method(a684fb4)
- added DeepFilterNet (https://github.com/Rikorose/DeepFilterNet) as supported denoiser (3fafd04)
- added Whisper on Hugging Face Transformers to CLI (3fafd04)
- fixed CLI throwing OSError when input is a URL and --output is not specified (3fafd04)
- fixed
WhisperHF.transcribe()
unable to load when audio is URL or certain formats (3fafd04)
- added support for Whisper on Hugging Face Transformers (9197b5c)
- fixed non-speech suppression not working properly for
transcribe_any()
(9197b5c)
- changed default to
dtype=numpy.int32
for all Numpy int arrays (3886bc6)
- removed
shell=True
in.audio.utils.get_metadata()
(e8f72a3)
- added "「" to
prepend_punctuations
and "」" toappend_punctuations
(9968a45) - added
AudioLoader
class for handling general audio loading (9968a45) - added
NonSpeechPredictor
class for handling non-speech detection (9968a45) - added
default.py
to hold global default states (9968a45) - added
failure_threshold
toalign()
(9968a45) - added
stream
to functions that useAudioLoader
internally (9968a45) - added progress bars for VAD and Demucs operations (9968a45)
- changed text normalization for
align()
(6d0746c) - changed
WhisperResult
to ignore segments with no words (6d0746c) - changed
nonspeech_error
default from 0.3 to 0.1 for all functions (9968a45) - changed
nonspeech_skip
default from 3.0 to 5.0 foralign()
(9968a45) - changed
use_word_position
behavior (9968a45) - changed to load Demucs into cache for reuse by default (9968a45)
- deprecated and replaced
demucs
anddemucs_options
withdenoiser
anddenoiser_options
(9968a45) - dropped
ffmpeg-python
dependency (9968a45) - dropped dependencies: more-itertools, transformers (9968a45)
- fixed
align()
producing empty word slices (6d0746c) - fixed
refine()
exceeding the max token count (#297) (f6d61c2) - fixed issues in
transcribe_any()
caused by unspecified samplerate (9968a45) - fixed
vad=True
causing first word of segment to be grouped with previous segment (9968a45) - refactored
audio.py
,stabilization.py
,whisper_word_level.py
into subpackages (9968a45) - removed
demucs_output
(9968a45)
- added
output_demo.mp4
(395c8a9) - fixed
align()
throwingUnsortedException
(f9ca03b) - fixed
original_split=True
failing when there are more than one consecutive newlines (f9ca03b) - fixed (
align()
IndexError)(#292 (comment)) (f9ca03b)
- added
trust_repo=True
for loading Silero-VAD (a6b2b05) - added
'master'
to the branch for loading Silero-VAD (a6b2b05) - fixed
align()
failing for faster whisper with certain languages (677f233) - fixed
result.WhisperResult.apply_min_dur()
andresult.Segment.apply_min_dur()
to work as intended (be2985e) - removed
resampling_method="kaiser_window"
for all calls oftorchaudio.functional.resample()
(a6b2b05)
- updated
align()
logic (738fd98) - added
nonspeech_skip
toalign()
(738fd98) - added
show_unsorted
toresult.WhisperResult.__init__()
andresult.WhisperResult.raise_for_unsorted()
(738fd98) - added
use_word_position
to methods that support non-speech/silence suppression (738fd98) - fixed
result.WhisperResult.force_order()
to handle data with multiple consecutive unsort timestamps (738fd98) - fixed empty segment removal to work as intend for
result.WhisperResult
(ef0a87e) - updated
README.md
to directly included the docstrings instead of hyperlinks (738fd98) - updated
result.save_as_json()
to includeensure_ascii=False
as default (738fd98) - added
kwargs
toresult.save_as_json()
(738fd98) - updated demo videos (3524aa2)
- added
nonspeech_sections
property toresult.WhisperResult
(191674b) - added
nonspeech_error
for silence suppression (191674b) - changed
min_word_dur
behavior for silence suppression (191674b) - changed silence suppression behavior (191674b)
- updated
README.md
(191674b)
- fixed
result.WhisperResult.split_by_punctuation()
not working ifmin_words
/min_chars
/min_dur
are unspecified (d51edb6)
- added
show_regroup_history()
toresult.WhisperResult
(df4a199) - added new attribute,
regroup_history
, to.result.WhisperResult
(df4a199) - added
min_words
,min_chars
,min_dur
toresult.WhisperResult.split_by_punctuation()
(df4a199) - updated
README.md
(e86c571)
- added
get_content_by_time()
toresult.WhisperResult
(900797a) - added
get_result()
toresult.Segment
(900797a) - added
get_segment()
toresult.WordTiming
(900797a) - added
text_ouput.result_to_txt()
/result.WhisperResult.to_txt()
(900797a) - added editing methods to
result.WhisperResult
:remove_word()
,remove_segment()
,remove_repetition()
,remove_words_by_str()
,fill_in_gaps()
(900797a) - added editing methods to list of 'method keys' in
result.WhisperResult.regroup()
(900797a) - changed
result.Segment.to_display_str()
to enclose segment text in double quotes (900797a) - implemented
__getitem__
and__delitem__
forresult.Segment
andresult.WhisperResult
(900797a) - updated docstrings of
whisper_word_level.load_model()
andwhisper_word_level.load_faster_whisper()
(900797a)
- added
result.WhisperResult.split_by_duration()
(71b9f1f) - fixed
newline=True
forresult.WhisperResult._split_segments()
(71b9f1f) - fixed docstring of
result.WhisperResult.split_by_length()
(71b9f1f) - updated Whisper to v20231117 (71b9f1f)
- added
--faster_whisper
,-fw
to CLI (a038ad1) - added
--locate
,-lc
to CLI (a038ad1) - changed
alignment.align()
to be compatible with faster-whisper (a038ad1) - changed
verbose
behavior foralignment.locate()
(a038ad1) - fixed inconsistent syntax and typo in docstrings (a038ad1)
- removed assertions for checking timestamp order when using
__add__()
withresult.Segment
orresult.WordTiming
(a038ad1)
- added
newline
tosplit_by_gap()
,split_by_punctuation()
,split_by_length()
(b336735) - added
progress_callback
towhisper_word_level.load_faster_whisper.faster_transcribe()
(b336735) - fixed #241 (5c512a1)
- refactored
_COMPATIBLE_WHISPER_VERSIONS
,_required_whisper_ver
,warn_compatibility_issues()
(b336735) - updated
README.md
(3dfbd72) - updated
--model
for CLI to be compatible with checkpoint paths (b336735) merge_all_segments()
with faster logic (b336735)- updated
verbose
for.whisper_word_level.load_faster_whisper.faster_transcribe()
(b336735) - updated whisper version to
v20231106
(b336735)
- added
avg_prob_threshold
towhisper_word_level.transcribe_stable()
(58ece35) - added
fast_mode
toalignment.align()
(58ece35) - added
utils.UnsortedException
(eb00d29) - added
word_dur_factor
andmax_word_dur
toalignment.align()
(58ece35) - changed
check_sorted
forresult.WhisperResult
to also accept a path (eb00d29) - changed
clip_start
default toNone
forresult.WhisperResult.clamp_max()
(58ece35) - corrected docstrings of
suppress_silence
andsuppress_word_ts
(58ece35) - fixed
timing.find_alignment_stable()
returning negative timestamps (58ece35)
- added
alignment.locate()
(a777206) - added
utils.format_timestamp()
andutils.make_safe()
(a777206) - added
utils.safe_print()
(a777206) - added
demucs
,demucs_options
,only_voice_freq
toalignment.refine()
(a777206) - added
to_display_str()
toresult.Segment
(a777206) - added
demucs_options
towhisper_word_level.load_faster_whisper.faster_transcribe()
(a777206) - updated
--output
/-o
(a777206) - changed
audio
to always expected to be 16kHz fortorch.Tensor
ornumpy.ndarray
(a777206) - fixed
alignment.align()
failing iftext
aresult.WhisperResult
without tokens (a777206) - fixed
original_split=True
by replacing line breaks with space (97a316d) - fixed
result_to_ass()
failing to return to base color when usingtag
(83ae509) - improved efficiency of segment splitting for
alignment.align()
whenoriginal_split=True
(a777206) - refactored the audio preprocessing into
audio.prep_audio()
(a777206) - removed
_is_whisper_repo_version
fromutils.py
(a777206) - renamed
original_spit
tooriginal_split
foralignment.align()
(a777206) - set
action="extend"
for all CLI keyword arguments that take multiple values (a777206) - changed
demucs
to also accept a Demucs model instance(a777206) - deprecated
time_scale
,input_sr
,demucs_output
,demucs_device
(a777206) - updated docstrings (a777206)
- updated
alignment.align()
to raise warning on failure (b9ac041) - changed
language
into a required parameter (b9ac041) - fixed
alignment.align()
endlessly looping (b9ac041)
- changed
abs_dur_change
default toNone
(dd1452e) - changed
abs_prob_decrease
default to0.5
(dd1452e) - changed
alignment.refine()
allow durations to increase (dd1452e) - changed
rel_prob_decrease
default to0.3
(dd1452e) - changed
rel_rel_prob_decrease
to optional (dd1452e) - changed the usage of original probability in
alignment.refine()
(dd1452e) - fixed CLI not using
decode_options
(9aba3dc) - fixed
adjust_by_silence()
throwingTypeError
(92d51b9) - updated
README.md
3643092)
- added
--align
to CLI (c90ff06) - added
alignment.refine()
for refining timestamps (138cb6b) - added
--refine
and--refine_option
to CLI (138cb6b) - added
segment_id
andid
toresult.WordTiming
(138cb6b) - added description to transcription progress bar (138cb6b)
- fixed
align()
not working whentext
is aresult.WhisperResult
(138cb6b) - fixed
transcribe()
throwing error ifsuppress_silence=False
(138cb6b) - updated
README.md
(c90ff06)
- fixed
--debug
not showing the first option (857df9a) - fixed
demucs
andonly_voice_freq
fortranscribe_stable()
(7f62a9d) - fixed
demucs
fortranscribe_minimal()
(857df9a) - fixed
only_voice_freq
fortranscribe_minimal()
(7f62a9d) - fixed progress bar for faster-whisper (7f62a9d)
- updated
transcribe_minimal()
to accept more options (857df9a) - updated
transcribe_stable()
for faster-whisper models to accept more options (7f62a9d)
- added
'us'
as method key toWhisperResult.regroup()
(da33bf5) - added
--demucs_option
,--model_option
,--transcribe_option
,--save_option
to CLI (da33bf5) - added
--transcribe_method
to CLI (da33bf5) - added
Segment.words_by_lock()
,WhisperResult.all_words_by_lock()
(da33bf5) - added
strip
toWhisperResult.lock()
(e98c3d6) - fixed docstring of
WhisperResult.lock()
(05bba74) - improved
--debug
for CLI (da33bf5) - improved
even_split=True
forWhisperResult.split_by_length()
(da33bf5) - updated docstring of
WhisperResult.split_by_length()
(da33bf5)
- added
lock()
toWhisperResult
(384fc3c) - added
'l'
as method key toWhisperResult.regroup()
(384fc3c) - added progress bar to transcription with faster-whisper (5ac6f5e)
- updated
--output_format
to accept multiple formats (384fc3c) - updated
WhisperResult.reset()
to match its initialization (384fc3c) - updated
regroup()
to parseregroup_algo
into dict (384fc3c)
- added
check_sorted
toWhisperResult
(4054ca1) - added
check_sorted
totranscribe_any()
(07eaf9e) - added
round_all_timestamps()
toresult.Segment
andresult.WordTiming
(4a7e52b) - changed default to
word_timestamps=True
forfaster_transcribe()
(4a7e52b) - changed
raise_for_unsorted()
logic (4a7e52b) - fixed
WhisperResult.force_order()
to work as intended (4a7e52b)
- added
token_step
toalign()
(ac3b38c) - delete
_demo
directory (b592731) - fixed #205 (ac3b38c)
- updated
README.md
(d0340ef, ffa05a4)
- added
Whisper.adjust_by_result()
(6da3dd8) - added
alignment.align()
(6da3dd8) - added
load_faster_whisper()
(6da3dd8) - fixed
encode_video_comparison()
unable to encode more than two subtitle files (6da3dd8) - fixed
verbose
not working fortranscribe_minimal()
(6da3dd8) - refactored compatibility warning into
warn_compatibility_issues()
inutils.py
(6da3dd8) - refactored post-inference silence suppress into
WhisperResult.adjust_by_silence()
(6da3dd8)
- added
demucs_options
totranscribe()
(91cf2b1) - added
ignore_compatibility
totranscribe()
(91cf2b1) - changed compatibility warning to distinguish between mismatch version number and repo version (91cf2b1)
- changed heuristic for identifying Whisper version number to avoid false positives (91cf2b1)
- added
transcribe_minimal()
(ef8a7f1) - added
force_order
toresult.WhisperResult
(ef8a7f1) - added
max_instant_words
totranscribe()
(ef8a7f1) - added
progress_callback
totranscribe()
(ef8a7f1) - changed default to
clip_start=True
forWhisperResult.clamp_max()
(ef8a7f1) - added logic to check if the installed Whisper version is compatible (e53f4be)
- fixed
tag
forresult_to_ass()
to work as intended (ea8cac8)
- added logic to ensure ascending timestamps in
result.WhisperResult
(fd78cd7) - updated default regroup algorithm (fd78cd7, 77dcfdf)
- updated long form transcription logic (fd78cd7)
- fixed skipping words (77dcfdf)
- avoid computing higher temperatures on
no_speech
segments (fd78cd7) - removed any segments that contains only punctuations (fd78cd7)
- removed segments with 50%+ instantaneous words (fd78cd7)
- updated
README.md
(f5b4c22)
- allow
regroup_algo
to be bool forregroup()
(4984163)
- added
even_split
tosplit_by_length()
(7b867d6) - changed default behavior of
split_by_length()
(7b867d6) - changed default to
verbose=False
forclamp_max()
(7b867d6)
- ignore
min_word_dur
when missing words timestamps (e93c280) - fixed
min_word_dur
not working for word timestamps (e93c280)
- added
clamp_max()
toWhisperResult
andWordTiming
(bfe93ab) - added
cm
as method key forclamp_max()
(bfe93ab) - added
non_whisper.transcribe_any()
(789bb54) - changed default to
suppress_ts_tokens=False
(789bb54) - fixed hyperlinks in
README.md
not linking to the latest commit (87636ef) - fixed incorrect line numbers for docstring hyperlinks (52b8b7a)
- fixed
--regroup
default (af5579e)
- added string form custom regrouping algorithm (cc352cd)
- fixed #153 (9e3ba72)
- removed max limit on audio threshold) (9e3ba72)
- updated
non-whisper.ipynb
(da3721b, 7866462)
- changed
result.WhisperResult
to only require necessary data to initialize (cdf3ea9) - added
--karaoke
to CLI (cdf3ea9) - updated
README.md
(0635e15, 2f094f8, fb23c27)
- added support for TSV output format (d30d0d1)
- changed to VTT and ASS default output to use more efficient formats (d30d0d1)
- fixed non-VAD suppression not working properly (d30d0d1)
- improved language detection (d30d0d1)
- added logic for loading audio with yt-dlp (8960922)
- added
only_ffmpeg
totranscribe()
and CLI (8960922) - added
shell=True
to subprocess call (a8df3b5)
- added classes:
SegmentMatch
andWhisperResultMatches
(1eabb37) - added fallback logic to word alignment (1eabb37)
- added
find()
toresult.WhisperResult
(1eabb37) - added
suppress_ts_tokens
andgap_padding
totranscribe()
and CLI (1eabb37) - added
shell=True
tois_ytdlp_available()
(d2b7f3f) - fixed
NaN
values in the logits (1eabb37)
- added
offset_time()
toWhisperResult
,Segment
,WordTiming
(1447a66) - added support for audio as URLs (1447a66)
- fixed
language
detection for English models (1447a66)
- added
split_callback
(44af5c4) - changed parameters of
split_callback
(c003ce4) - corrected the docstring for
rtl
(169e014) - fixed punctuation split/merge to work as intended (a84a346)
- added regrouping list (a0021bd)
- added
--max_chars
and--max_words
to CLI (f913d6f) - added
rtl
#116 (f913d6f) - corrected VAD pytorch requirement (60f668d)
- fixed
visualize_suppression()
error whenmax_width=-1
(918e3ba) - fixed out of range error (918e3ba)
- added
merge_all_segments()
toresult.WhisperResult
(7c69535) - added
split_by_length()
toresult.WhisperResult
(7c69535)
- fixed transcription logic (d44d287)
- added Tips to
README.md
(c21e198) - added new token splitting method (fa813fe)
- fixed #112(3985791)
- fixed #117 (3985791)
- added instructions for installing demucs via error (de3c812)
- added
encoding='utf-8'
toread_me()
insetup.py
(ff34b27) - updated
README.md
(dfb147e)
- added
mel_first
(8fa5670) - fixed: to not apply
min_dur
on words if segments contains no words (8fa5670) - updated regroup demo video (e9932fe)
- fixed timestamps to jump backwards (26918d5)
- changed default
strip=True
forresult_to_srt_vtt()
(ce4c7b3) - keep segments when if segment has no words from the start (6ccfa17)
- improved
stabilization.audio2loudness()
efficiency (db99d6b) - fixed
regroup=True
whenword_timestamp=sFalse
(6ccfa17) - fixed
word_level=False
failing output whenword_timestamps=False
(ce4c7b3) - fixed ASS output formatting (ce4c7b3)
- updated
README.md
(f9f7c51)
- added segment-level and word-level support to SRT/VTT/ASS outputs (2248087)
- added
result.WhisperResult
(2248087) - added Silero VAD support (2248087)
- added
visualize_suppression()
(2248087) - added regrouping methods (2248087)
- changed python requirement from 3.7+ to 3.8+ (2248087)
- improved non-vad suppression (2248087)
- improve word-level timestamps reliability (2248087)
- updated
README.md
(eb5e68c)