Release Version 0.4: Enhanced STJ Format with New Validation and Features #5
yaniv-golan
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Release Version 0.4: Enhanced STJ Format with New Validation and Features
This update introduces significant enhancements to the specification, schema, tools, and documentation to improve the flexibility, interoperability, and robustness of the STJ format.
What's New
1. Addition of
word_timing_mode
word_timing_mode
field in thesegments
object to indicate the completeness of word-level timing data."complete"
: All words in thetext
are included in thewords
array."partial"
: Only some words are included in thewords
array."none"
: No word-level timing data is provided.words
array within a segment, especially when dealing with incomplete or absent word-level timing information.2. Updated JSON Schema (
stj-schema.json
)word_timing_mode
field to the schema.start
andend
times.[0.0, 1.0]
.start
equalsend
) must include the appropriate duration flag set to"zero"
inadditional_info
.3. Enhanced Validators
Python Validator (
stj_validator.py
)word_timing_mode
consistency with the presence and completeness of thewords
array.text
field and the concatenatedwords
array whenword_timing_mode
is"complete"
.iso639-lang
library (version 2.4.2).JavaScript Validator (
stj-validator.js
)iso-639-1
package to validate language codes.4. Updated Conversion Tools
stj_to_srt
,stj_to_vtt
,stj_to_ass
):word_timing_mode
field appropriately.word_timing_mode
set to"complete"
, the text is reconstructed from thewords
array.5. Comprehensive Test Coverage
overlapping_segments.stj.json
invalid_language.stj.json
invalid_word_timing_mode.stj.json
zero_duration_word_without_flag.stj.json
invalid_confidence_scores.stj.json
invalid_speaker_id.stj.json
word_outside_segment_timings.stj.json
words_overlap_or_out_of_order.stj.json
6. Documentation Updates
stj-specification.md
):word_timing_mode
field and new validation requirements.Breaking Changes
word_timing_mode
is correctly set in segments.How to Upgrade
Update Your STJ Files:
word_timing_mode
field to segments as appropriate.Update Tools and Dependencies:
stj_validator.py
and conversion scripts from the repository.iso639-lang
version 2.4.2:stj-validator.js
and conversion scripts.iso-639-1
package:Run Validation:
Update Integrations:
Acknowledgments
This release was triggered by mluggy comments on word-by-word caption tools expectations. Thanks for spotting this!
Feedback and Contributions
If you encounter any issues, have suggestions, or would like to contribute to the project, please:
Links
This discussion was created from the release Release Version 0.4: Enhanced STJ Format with New Validation and Features.
Beta Was this translation helpful? Give feedback.
All reactions