forked from CCExtractor/ccextractor
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGES.TXT
1212 lines (1106 loc) · 50.7 KB
/
CHANGES.TXT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
0.89 (TBD)
-----------------
- Fix: Fix broken links in README
- Fix: Timing in DVB, sub duration check for timeout.
- New: Added support for SCC and CCD encoder formats
- New: Added support to output captions to MCC file (#733).
- New: Add support for censoring words ("Kid Friendly") (#1139)
- New: Extend support of capitalization for all BITMAP and 608 subtitles (#1214)
- New: Added an option to disable timestamps for WebVTT (In response to issue #1127)
- Fix: Change inet_ntop to inet_ntoa for Windows XP compatibility
- Fix: Added italics, underline, and color rendering support for -out=spupng with EIA608/teletext
- Fix: ccx_demuxer_mxf.c: Parse framerate from MXF captions to fix caption timings.
- Fix: hardsubx_decoder.c: Fix memory leaks using Leptonica API.
- Fix: linux/Makefile.am: added some sources to enable rpms to be created.
- Fix: Crash when using -sc (sentence case) option (#1115)
- Fix: Segmentation fault on VOB #1128
- Fix: Hang while processing video #1121
- Fix: lib_ccx.c: Initialize fatal error logging function before first usage in init_libraries
- Fix: A few (minor) memory leaks around the code.
- Fix: General code clean up / reformatting
- Fix: Fix multiple definitions with new -fno-common default in GCC 10
- Fix: Mac now builds reproducibly again without errors on the date command (#1230)
- Fix: Allow all oem modes with tesseract v4 (#1264)
- Doc: Updated ccextractor.cnf.sample.
- Update: Updated LibPNG to 1.6.37
- Remove: Python API (since no one cares about it and it's unmaintained)
- Remove: -cf , just use FFmpeg if you want a ES from a TS or PS, CCExtractor is a bad tool
for this.
- Fix: Segmentation fault on Windows
- Update: Updated libGPAC to 1.0.1
- Fix: Segmentation fault with unsupported and multitrack file reports
- Fix: Write subtitle header to multitrack outputs
- Fix: Write multitrack files to the output file directory
0.88 (2019-05-21)
-----------------
- New: More tapping points for debug image in ccextractor.
- New: Add support for tesseract 4.0
- Optimize: Remove multiple RGB to grey conversion in OCR.
- Fix: Update UTF8Proc to 2.2.0
- Fix: Update LibPNG to 1.6.35
- Fix: Update Protobuf-c to 1.3.1
- Fix: Warn instead of fatal when a 0xFF marker is missing
- Fix: Segfault in general_loop.c due to null pointer dereference (case of no encoder)
- Fix: Enable printing hdtv stats to console.
- Fix: Many typos in comments and output messages
- Fix: Ignore Visual Studio temporary project files
- New: Add support for non-Latin characters in stdout
- Fix: Check whether stream is empty
- New: Add support for EIA-608 inside .mkv
- New: Add support for DVB inside .mkv
- Fix: Added -latrusmap Map Latin symbols to Cyrillic ones in special cases
of Russian Teletext files (issue #1086)
- Fix: Several OCR crashes
0.87 (2018-10-23)
-----------------
- New: Upgrade libGPAC to 0.7.1.
- New: mp4 tx3g & multitrack subtitles.
- New: Guide to update dependencies (docs/Updating_Dependencies.txt).
- New: Add LICENSE File (#959).
- New: Display quantisation mode in info box (#954).
- New: Add instruction required to build ccextractor with HARDSUBX support (#946).
- New: Added version no. of libraries to --version.
- New: Added -quant (OCR quantization function).
- New: Python API now compatible with Python 3.
- Fix: linux/builddebug: Added non-local directories to the incluye search path so we don't
require a locally compiled tesseract or leptonica.
- Fix: Correct -HARDSUBX Bug In CMake, allow build with hardsubx using cmake (#966).
- Fix: possible segfaults in hardsubx_classifier.c due to strdup (#963).
- Fix: Improve the start and end timestamps of extracted burned in captions (#962).
- Fix: Update COMPILATION.md (#960).
- Fix: Fixed crash with "-out=report" and "-out=null".
- Fix: -nocf not working with OCR'ing (#958).
- Fix: segfault in add_cc_sub_text and initialize to NULL in init_encoder (#950).
- Fix: ccx_decoders_common.c: Copy data type when creating a copy of the subtitle structure.
- Fix: Implicit declaration of these functions throws warning during build (#948).
- Fix: ccx_decoders_common.c: Properly release allocated resources on free_subtitle().
- Fix: Added a datatype member to struct cc_subtitle - needed so we can properly free all
memory when void *data points to a structure that has its own pointers.
- Fix: dvb_subtitle_decoder.c: When combining image regions verify that the offset is
never negative.
- Fix: Updated traivis.yml to fix osx build (#947).
- Fix: Add utf8proc src file to cmake, updated header file (#944).
- Fix: Added required pointers on freep() calls.
- Fix: Removed dvb_debug_traces_to_stdout and used the usual dbg_print instead.
- Fix: Additional debug traces for DVB.
- Fix: Fix minor memory leak in ocr.c.
- Fix: Fix issue with displaying utf8proc version.
- Fix: Fix failing cmake due to liblept/tesseract header files.
- Fix: Added missing \n in params.c.
- Fix: builddebug: Use -fsanitize=address -fno-omit-frame-pointer.
- Fix: ccx_decoders_common.c: Removed trivial memory leak.
- Fix: ccx_encoders_srt.c: Made sure a pointer is non-NULL before dereferencing.
- Fix: dvb_subtitle_decoder.c: Initialize pointer members to NULL when creating a structure.
- Fix: lib_ccx.c: Initialize (memset 0) structure cc_subtitle after memory allocation.
- Fix: Added verboseness to error/warnings in dvb_subtitle_decoder.c.
- Fix: dvb_subtitle_decoder.c: Work on passing invalid streams errors upstream (plus some
warning messages) so we can eventually recover from this situation instead of crashing.
- Fix: telxcc.c: Currently setting a colour doesn't necessarily add a space even though the
specifications mandate it. (#930).
- Fix: dvb_subtitle_decoder.c: Fix null pointer derefence when region==NULL in write_dvb_sub.
- Fix: DVB Teletext subtitle incomplete.
- Fix: replace all 0xA characters within startbox with 0x20.
- Fix: DVB Teletext subtitle incomplete (#922).
- Fix: Add missing return value to one of the returns in process_tx3g().
- Fix: Typos and other minor bugs.
- Fix: Tidy CMakeLists & vcxproj (#920).
- Fix: Added m2ts and -mxf to help screen.
- Fix: Added MKV to demuxer_print_cfg.
- Fix: Added MXF to demuxer_print_cfg.
- Fix: "Out of order packets" error had wrong print() parameters.
- Fix: Updated Python documentation.
- Fix: Fix incorrect path in XML (#904).
- Fix: linux build script (non-debug): Don't hide warnings from compiler.
- Fix: linux build script (debug): Display what's step of the build script we're in.
- Fix: Make the build reproducible (#976).
- Fix: Remove instance of o1 and o2 from help.
- Fix: Colors of DVB subtitles with depth 2 broken due to a missing break.
- Fix: CEA-708: Caption loss due to CW command (#991).
- Fix: CEA-708: Update patch for windows priority with functions (#990).
0.86 (2018-01-09)
-----------------
- New: Preliminary MXF support
- New: Added a histogram in one-minute increments of the number of lines in a subtitle.
- New: Added Autoconf build scripts for CCExtractor to generate makefiles (mac).
- New: Added Autoconf build scripts for CCExtractor to generate makefiles (linux).
- New: Added .rpm package generation script.
- New: Added build/installation script for .pkg.tar.xz (Arch Linux).
- New: Added tarball generation script.
- New: Added --analyzevideo. If present the video stream will be processed even if the
subtitles are in a different stream. This is useful when we want video information
(resolution, frame type, etc). -vides now implies this option too.
[Note: Tentative - some possibly breaking changed were made for this, so if you
use it validate results]
- New: Added a GUI in the main CCExtractor binary (separate from the external GUIs
such as CCExtractorGUI).
- New: A Python binding extension so it's possible to use CCExtractor's tools from
Python.
- New: Added -nospupngocr (don't OCR bitmaps when generating spupng, faster)
- New: Add support for file split on keyframe (-segmentonkeyonly)
- New: Added WebVTT output from Matroska.
- New: Support for source-specific multicast.
- New: FreeType-based text renderer (-out=spupng with teletext/EIA608).
- New: Upgrade library UTF8proc
- New: Upgrade library win_iconv
- New: Upgrade library zlib
- New: Upgrade library LibPNG
- New: Support for Source-Specific Multicast
- New: Added Travis CI support
- New: Made error messages clearer, less ambiguous
- Fix: Prevent the OCR being initialized more than once (happened on multiprogram and
PAT changes)
- Fix: Makefiles, build scripts, etc... everything updated and corrected for all
platforms.
-Fix: Proper line ending for .srt files from bitmaps.
- Fix: OCR corrections using grayscale before extracting texts.
- Fix: End timestamps in transcripts from DVB.
- Fix: Forcing -noru to cause deduplication in ISDB
- Fix: TS: Skip NULL packets
- Fix: When NAL decoding fails, don't dump the whole decoded thing, limit to 160 bytes.
- Fix: Modify Autoconf scripts to generate tarball for mac from `/package_creators/tarball.sh`
and include GUI files in tarball
- Fix: Started work on libGPAC upgrade.
- Fix: DVB subtitle not extracted if there's no display segment
- Fix: Heap corruption in add_ocrtext2str
- Fix: bug that caused -out=spupng sometimes crashes
- Fix: Checks for text before newlines on DVB subtitles
- Fix: OCR issue caused by separated dvb subtitle regions
- Fix: DVB crash on specific condition (!rect->ocr_text)
- Fix: DVB bug (Multiple-line subtitle; Missing last line)
- Fix: --sentencecap for teletext samples
- Fix: Crash when image passed into OCR is empty
- Fix: Temporarily wrapped the Python API, not production ready yet
- Fix: -delay option in DVB
0.85b (2017-01-26)
------------------
- Fix: Base Windows binary (without OCR) compiled without DLL dependencies.
0.85 (2017-01-23)
-----------------
- New: Added FFMPEG 3.0 to Windows build - last one that is XP compatible.
- New: Major improvements in CEA-608 to WebVTT (styles, etc).
- New: Return a non-zero return code if no subtitles are found.
- New: Windows build files updated to Visual Studio 2015, new target platform is 140_xp.
- New: Added basic support of Tesseract 4.0.0.
- New: Added build script for .deb.
- New: Updated -debugdvbsub parameter to get the most relevant DVB traces for debugging.
- New: SMPTE-TT files are now compatible with Adobe Premiere.
- New: Updated libpng.
- New: Added 3rd party (Tracy from archive.org) static linux build script.
- New: Add chapter extraction for MP4 files.
- New: Return code 10 if no captions are found at all.
- Fix: Teletext duplicate lines in certain cases.
- Fix: Improved teletext timing.
- Fix: DVB timing is finally good.
- Fix: A few minor memory leaks.
- Fix: tesseract library file included in mac build command.
- Fix: Bad WTV timings in some cases.
- Fix: Mac build script.
- Fix: Memory optimization in HARDSUBX edit_distance.
- Fix: SubStation Alpha subtitles in bitmap.
- Fix: lept msg severity in linux.
- Fix: SSA, SPUPNG and VTT timing and skipping of subtitles for SAMI and TTML.
- Fix: SMPTE-TT : Added support for font color.
- Fix: SAMI unnecessary empty subtitle when extracting DVB subs.
- Fix: Skip the packet if the adaptation field length is broken.
- Fix: 708 - lots of work done in the decoder. Implementation of more commands. Better timing.
0.84 (2016-12-16)
-----------------
- New: In Windows, both with and without-OCR binaries are bundled, since the OCR one causes problems due to
dependencies in some system. So unless you need the OCR just use the non-OCR version.
- New: Added -sbs (sentence by sentence) for DVB output. Each frame in the output file contains a complete
sentence (experimental).
- New: Added -curlposturl. If used each output frame will be sent with libcurl by doing a POST to that URL.
- Fix: More code consistency checking in function names.
- Fix: linux build script now tries to verify dependencies.
- Fix: Mac build script was missing a directory.
0.83 (2016-12-13)
-----------------
- Fix: Duplicate lines in mp4 (specifically affects itunes).
- Fix: Timing in .mp4, timing now calculated for each CC pair instead of per atom.
- Fix: Typos everywhere in the documentation and source code.
- Fix: CMakeLists for build in cmake.
- Fix: -unixts option.
- Fix: FPS switching messages.
- Fix: Removed ugly debug statement with local path in HardsubX.
- Fix: Changed platform target to v120_xp in Visual Studio (so XP is supported again).
- Fix: Added detail in many error messages.
- Fix: Memory leaks in videos with XDS.
- Fix: Makefile compatibility issues with Raspberry pi.
- Fix: missing separation between WebVTT header and body.
- Fix: Stupid bug in M2TS that preventing it from working.
- Fix: OCR libraries dependencies for the release version in Windows.
- Fix: non-buffered reading from pipes.
- Fix: --stream option with stdin.
- New: terminate_asap to buffered_read_opt
- New: Added some TV-show specific spelling dictionaries.
- New: Updated GPAC library.
- New: ASS/SSA.
- New: Capture sigterm to do some clean up before terminating.
- New: Work on 708: Changed DefineWindow behavior, only clear text of an existing window is style has changed.
0.82 (2016-08-15)
-----------------
- New: HardsubX - Burned in subtitle extraction subsystem.
- New: Color Detection in DVB Subtitles
- Fix: Corrected sentence capitalization
- Fix: Skipping redundant bytes at the end of tx3g atom in MP4
- Fix: Illegal SRT files being created from DVB subtitles
- Fix: Incorrect Progress Display
0.81 (2016-06-13)
-----------------
- New: --version parameter for extensive version information (version number, compile date, executable hash, git commit (if appropriate))
- New: Add -sem (semaphore) to create a .sem file when an output file is open and delete it when it's closed.
- New: Add --append parameter. This will prevent overwriting of existing files.
- New: File Rotation support added. The user has to send a USR1 signal to rotate.
- Fix: Issues with files <1 Mb
- Fix: Preview of generated transcript.
- Fix: Statistics were not generated anymore.
- Fix: Correcting display of sub mode and info in transcripts.
- Fix: Teletext page number displayed in -UCLA.
- Fix: Removal of excessive XDS notices about aspect ratio info.
- Fix: Force Flushing of file buffers works for all files now.
- Fix: mp4 void atoms that was causing some .mp4 files to fail.
- Fix: Memory usage caused by EPG processing was high due to many non-dynamic buffers.
- Fix: Project files for Visual Studio now include OCR support in Windows.
0.80 (2016-04-24)
-----------------
- Fix: "Premature end of file" (one of the scenarios)
- Fix: XDS data is always parsed again (needed to extract information such as program name)
- Fix: Teletext parsing: @ was incorrectly exported as * - X/26 packet specifications in ETS 300 706 v1.2.1 now better followed
- Fix: Teletext parsing: Latin G2 subsets and accented characters not displaying properly
- Fix: Timing in -ucla
- Fix: Timing in ISDB (some instances)
- Fix: "mfra" mp4 box weight changed to 1 (this helps with correct file format detection)
- Fix: Fix for TARGET File is null.
- Fix: Fixed SegFaults while parsing parameters (if mandatory parameter is not present in -outinterval, -codec or -nocodec)
- Fix: Crash when input small is too small
- Fix: Update some URLs in code (references to docs)
- Fix: -delay now updates final timestamp in ISDB, too
- Fix: Removed minor compiler warnings
- Fix: Visual Studio solution files working again
- Fix: ffmpeg integration working again
- New: Added --forceflush (-ff). If used, output file descriptors will be flushed immediately after being written to
- New: Hexdump XDS packets that we cannot parse (shouldn't be many of those anyway)
- New: If input file cannot be open, provide a decent human readable explanation
- New: GXF support
0.79 (2016-01-09)
-----------------
- Support for Grid Format (g608)
- Show Correct number of teletext packet processed
- Removed Segfault on incorrect mp4 detection
- Remove xml header from transcript format
- Help message updated for Teletext
- Added --help and -h for help message
- Added --nohtmlescape option
- Added --noscte20 option
0.78 (2015-12-12)
-----------------
- Support to extract Closed Caption from MultiProgram at once.
- CEA-708: exporting to SAMI (.smi), Transcript (.txt), Timed Transcript (ttxt) and SubRip (.srt).
- CEA-708: 16 bit charset support (tested on Korean).
- CEA-708: Roll Up captions handling.
- Changed TCP connection protocol (BIN data is now wrapped in packets, added EPG support and keep-alive packets).
- TCP connection password prompt is removed. To set connection password use -tcppassword argument instead.
- Support ISDB Closed Caption.
- Added a new output format, simplexml (used internally by a CCExtractor user, may or may not be useful for
anyone else).
0.77 (2015-06-20)
-----------------
- Fixed bug in capitalization code ('I' was not being capitalized).
- GUI should now run in Windows 8 (using the include .Net runtime, since
3.5 cannot be installed in Windows 8 apparently).
- Fixed Mac build script, binary is now compiled with support for
files over 2 GB.
- Fixed bug in PMT code, damaged PMT sections could make CCExtractor
crash.
0.76 (2015-03-28)
-----------------
- Added basic M2TS support
- Added EPG support - you can now export the Program Guide to XML
- Some bug fixes
0.75 (2015-01-15)
-----------------
- Fixed issue with teletext to other then srt.
- CCExtractor can be used as library if compiled using cmake
- By default the Windows version adds BOM to generated UTF files (this is
because it's needed to open the files correctly) while all other
builds don't add it (because it messes with text processing tools).
You can use -bom and -nobom to change the behaviour.
0.74 (2014-09-24)
-----------------
- Fixed issue with -o1 -o2 and -12 parameters (where it would write output only in the o2 file)
- Fixed UCLA parameter issue. Now the UCLA parameter settings can't be overwritten anymore by later parameters that affect the custom transcript
- Switched order around for TLT and TT page number in custom transcript to match UCLA settings
- Added nobom parameter, for when files are processed by tools that can't handle the BOM. If using this, files might be not readable under windows.
- Segfault fix when no input files were given
- No more bin output when sending to server + possibility to send TT to server for processing
- Windows: Added the Microsoft redistributable MSVCR120.DLL to both the installation package and the application zip.
0.73 - GSOC (2014-08-19)
------------------------
- Added support of BIN format for Teletext
- Added start of librarization. This will allow in the future for other programs to use encoder/decoder functions and more.
0.72 - GSOC (2014-08-12)
------------------------
- Fix for WTV files with incorrect timing
- Added support for fps change using data from AVC video track in a H264 TS file.
- Added FFMpeg Support to enable all encapsulator and decoder provided by ffmpeg
0.71 - GSOC (2014-07-31)
------------------------
- Added feature to receive captions in BIN format according to CCExtractor's own
protocol over TCP (-tcp port [-tcppassword password])
- Added ability to send captions to the server described above or to the
online repository (-sendto host[:port])
- Added -stdin parameter for reading input stream from standard input
- Compilation in Cygwin using linux/Makefile
- Fix for .bin files when not using latin1 charset
- Correction of mp4 timing, when one timestamp points timing of two atom
0.70 - GSOC (2014-07-06)
------------------------
This is the first release that is part of Google's Summer of Code.
Anshul, Ruslan and Willem joined CCExtractor to work on a number of things
over the summer, and their work is already reaching the mainstream
version of CCExtractor.
- Added a huge dictionary submitted by Matt Stockard.
- Added DVB subtitles decoder, spupng in output
- Added support for cdt2 media atoms in QT video files. Now multiple atoms in
a single sample sequence are supported.
- Changed Makefile.
- Fixed some bugs.
- Added feature to print info about file's subtitles and streams (-out=report).
- Support Long PMT.
- Support Configuration file.
- There is an sample configuration file in doc/ folder with name
ccextractor.cnf.sample
- Just now only ccextractor.cnf named files kept beside ccextractor
executable is supported
- for details of which options can be set using configuration file,
please look at sample file.
- Added options for custom transcript output:
new parameter (-customtxt format), where the format must be like this: 1100100 (7 digits).
These indicate whether the next things should be displayed or not in the (timed) transcript:
- Display start time
- Display end time
- Display caption mode
- Display caption channel
- Use a relative timestamp (relative to the sample)
- Display XDS info
- Use colors
Examples:
0000101 is the default setting for transcripts
1110101 is the default for timed transcripts
1111001 is the default setting for -ucla
Make sure you use this parameter after others that might affect these
settings (-out, -ucla, -xds, -txt, -ttxt, ...)
- Fixed Negative timing Bug
0.69 (2014-04-05)
-----------------
- A few patches from Christopher Small, including proper support
for multiple multicast clients listening on the same port.
- GUI: Fixed teletext preview.
- GUI: Added a small indicator of data being received when reading from
UDP.
- GUI: Added UTF-8 support to preview Window (used for teletext).
- Fixes in Makefile and build script, compilation in linux and OSX failed
if another libpng was found in the system.
- WTV support directly in CCExtractor (no need for wtvccdump any more).
- Started refactoring and clean-up.
- Fix: MPEG clock rollover (happens each 26 hours) caused a time
discontinuity.
- Windows GUI: Started work on HDHomeRun support. For now it just looks
for HDHomeRun devices. Lots of other things will arrive in the next
versions.
- Windows GUI: Some code refactoring, since the HDHomeRun support makes
the code larger enough to require more than one source file :-)
0.68 (2013-12-24)
-----------------
- A couple of shared variables between 608 decoders were causing
problems when both fields were processed at the same time with
-12, fixed.
- Added BOM for UTF-8 files.
- Corrected a few extended characters in the UTF-8 encoding,
probably never used in real world captioning but since we got
a good test sample file...
- Color and fonts in PAC commands were ignored, fixed (Helen Buus).
- Added a new output format, spupng. It consists on one .png file
for each subtitle frame and one .xml with all the timing
(Heleen Buus).
- Some fixes (Chris Small).
0.67 (2013-10-09)
-----------------
- Padding bytes were being discarded early in the process in 0.66,
which is convenient for debugging, but it messes with timing in
.raw, which depends on padding. Fixed.
- MythTV's branch had a fixed size buffer that could not be enough
some times. Made dynamic.
- Better support for PAT changing mid-stream.
- Removed quotes in Start in .smi (format fix).
- Added multicast support (Chris Small)
- Added ability to select IP address to bind in UDP (Chris Small)
- Fixes in -unixts and -delay for teletext.
- Added -autodash : When two people are talking, add a dash as
needed (this is based on subtitle position). Only in .srt and
with -trim. Quite experimental, feedback appreciated.
- Added -latin1 to select Latin 1 as encoding. Default is now
UTF-8 (-utf8 still exists but it's not needed).
- Added -ru1, which emulates a (non-existing in real life) 1 line
roll-up mode.
0.66 (2013-07-01)
-----------------
- Fixed bug in auto detection code that triggered a message
about file being auto of sync.
- Added -investigate_packets
The PMT is used to select the most promising elementary stream
to get captions from. Sometimes captions are where you least
expect it so -datapid allows you to select a elementary stream
manually, in case the CC location is not obvious from the PMT
contents. To assist looking for the right stream, the parameter
"-investigate_packets" will have CCExtractor look inside each
stream, looking for CC markers, and report the streams that
are likely to contain CC data even if it can't be determined from
their PMT entry.
- Added -datastreamtype to manually selecting a stream based on
its type instead of its PID. Useful if your recording program
always hides the caption under the stream type.
- Added -streamtype so if an elementary stream is selected manually
for processing, the streamtype can be selected too. This can be
needed if you process, for example a stream that is declared as
"private MPEG" in the PMT, so CCExtractor can't tell what it is.
Usually you'll want -streamtype 2 (MPEG video) or -streamtype 6
(MPEG private data).
- PMT content listing improved, it now shows the stream type for
more types.
- Fixes in roll-up, cursor was being moved to column 1 if a
RU2, RU3 or RU4 was received even if already in roll-up mode.
- Added -autoprogram. If a multiprogram TS is processed and
-autoprogram is used, CCExtractor will analyze all PMTs and use
the first program that has a suitable data stream.
- Timed transcript (ttxt) now also exports the caption mode
(roll-up, paint-on, etc.) next to each line, as it's useful to
detect things like commercials.
- Content Advisory information from XDS is now decoded if it's
transmitted in "US TV parental guidelines" or "MPA".
Other encoding such as Canada's are not supported yet due
to lack of samples.
- Copy Management information from XDS is now decoded.
- Added -xds. If present and export format is timed transcript
(only), XDS information will be saved to file (same file as the
transcript, with XDS being clearly marked). Note that for now
all XDS data is exported even if it doesn't change, so the
transcript file will be significantly larger.
- Added some PaintOn support, at least enough to prevent it
from breaking things when the other modes are used.
- Removed afd_data() warning. AFD doesn't carry any caption related
data. AFD still detected in code in case we want to do something
with it later anyway.
- Ported last changes from Petr Kutalek's telxcc. Current version
is 2.4.4.
- In teletext mode when exporting to transcript (not .srt), an effort
is made to detect and merge line duplicates. This is done by using
the Levenshtein's distance, which is the number of changes requires
to convert one string to another. To simplify things, strings are
compared up to the length of the shortest one.
There are 3 parameters that can be used to tweak the thresholds:
-deblev: Enable debug so the calculated distance for each two
strings is displayed. The output includes both strings, the
calculated distance, the maximum allowed distance, and whether
the strings are ultimately considered equivalent or not, i.e.
the calculated distance is less or equal than the max allowed.
-levdistmincnt value: Minimum distance we always allow
regardless of the length of the strings. Default 2. This means
that if the calculated distance is 0, 1 or 2, we consider the
strings to be equivalent.
-levdistmaxpct value: Maximum distance we allow, as a
percentage of the shortest string length. Default 10%. For
example, consider a comparison of one string of 30 characters
and one of 60 characters. We want to determine whether the
first 30 characters of the longer string are more or less the
same as the shortest string, i.e. whether the longest string
is the shortest one plus new characters and maybe some
corrections. Since the shortest string is 30 characters and
the default percentage is 10%, we would allow a distance of
up to 3 between the first 30 characters.
- Added -lf : Use UNIX line terminator (LF) instead of Windows (CRLF).
- Added -noautotimeref: Prevent UTC reference from being auto set from
the stream data.
0.65 (2013-03-14)
-----------------
- Minor GUI changes for teletext
- Added end timestamps in timed transcripts
- Added support for SMPTE (patch by John Kemp)
- Initial support for MPEG2 video tracks inside MP4 files (thanks a
lot to GPAC's Jean who assisted in analyzing the sample and
doing the required changes in GPAC).
- Improved MP4 auto detection
- Support for PCR if PTS is not available (needed for some teletext
samples, and probably useful for everything else).
- Support for UDP streaming - finally. Use "-udp $port" to have
CCExtractor listen for a stream. I've only been able to test it
with an European HDHomeRun, but it should work fine with any other
tuner.
- Refactored PMT / PAT processing in transport streams, now allows to
display their contents (-parsePAT and -parsePMT) which makes
troubleshooting easier.
0.64 (2012-10-29)
-----------------
- Changed Window GUI size (larger).
- Added Teletext options to GUI.
- Added -teletext to force teletext mode even if not detected
- Added -noteletext to disable teletext detection. This can be needed
for streams that have both 608 data and teletext packets if you
need to process the 608 data (if teletext is detected it will
take precedence otherwise).
- Added -datapid to force a specific elementary stream to be used for
data (bypassing detections).
- Added -ru2 and -ru3 to limit the number of visible lines in roll-up
captions (bypassing whatever the broadcast says).
- Added support for a .hex (hexadecimal) dump of data.
- Added support for wtv in Windows. This is done by using a new program
(wtvccdump.exe) and a new DirectShow filter (CCExtractorDump.dll) that
process the .wtv using DirecShow's filters and export the line 21 data
to a .hex file. The GUI calls wtvccdump.exe as needed.
- Added --nogoptime to force PTS timing even when CCExtractor would
use GOP timing otherwise.
0.63 (2012-08-17)
-----------------
- Telext support added, by integrating Petr Kutalek's telxcc. Integration is
still quite basic (there's equivalent code from both CCExtractor and
telxcc) and some clean up is needed, but it works. Petr has announced that
he's abandoning telxcc so further development will happen directly in
CCExtractor.
- Some bug fixes, as usual.
0.62 (2012-05-23)
-----------------
- Corrected Mac build "script" (needed to add GPAC includes). Thanks to the
Mac users that sent this.
- Hauppauge mode now uses PES timing, needed for files that don't have
caption data during all the video (such as in commercial breaks).
- Added -mp4 and -in:mp4 to force the input to be processed as MP4.
- CC608 data embedded in a separate stream (as opposed as in the video
stream itself) in MP4 files is now supported (not heavily tested).
This should be rather useful since closed captioned files from iTunes
use this format.
- More CEA-708 work. The debugger is now able to dump the "TV" contents for
the first time. Also, a .srt can be generated, however timing is not quite
good yet (still need to figure out why).
- Added -svc (or --service) to select the CEA-708 services to be processed.
For example, -svc 1,2 will process the primary and secondary language
services. Valid values are 1-63, where 1 is the primary language, 2 is
the secondary language (this is part of the specification) and 3-63 are
provider defined.
- Rajesh Hingorani sent a fix for the MPEG decoder that fixes garbled output
or certain samples (we had none like this in our test collection). Thanks,
Rajesh.
0.61 (2012-03-08)
-----------------
- Fix: GCC 3.4.4 can now build CCExtractor.
- Fix: Damaged TS packets (those that come with 'error in transport' bit
on) are now skipped.
- Fix: Part of the changes for MP4 support (CC packets buffering in
particular) broke some stuff for other files, causing at least very
annoying character duplication. We hope we've fixed it without breaking
anything but please report).
- Some non-interesting cleanup.
0.60 (unreleased)
-----------------
- Add: MP4 support, using GPAC (a media library). Integration is currently
"enough so it works", but needs some more work. There's some duplicate
code, the stream must be a file (no streaming), etc.
- Fix: The Windows version was writing text files with double \r.
- Fix: Closed captions blocks with no data could cause a crash.
- Fix: -noru (to generate files without duplicate lines in
roll-up) was broken, with complete lines being missing.
- Fix: bin format not working as input.
0.59 (2011-10-07)
-----------------
- More AVC/H.264 work. pic_order_cnt_type != 0 will be processed now.
- Fix: Roll-up captions with interruptions for Text (with ResumeTextDisplay
in the middle of the caption data) were missing complete lines.
- Added a timed text transcript output format, probably only useful for
roll-up captions. Use --timedtranscript or -ttxt. Output is like this:
00:01:25,485 | HOST: LAST NIGHT THE REPUBLICAN
00:01:29,522 | HOPEFULS INTRODUCE THEMSELVES TO
00:01:30,623 | PRIMARY VOTERS.
- XDS parser. Not complete (no point in dealing with V-Chip stuff for
example), but enough to extract program and station information.
- Input streams can now come from standard input using - (just an hyphen)
as parameter.
- Added a new output format called 'null' (use -null or -out=null). This
format means "Don't produce any file", and is useful to have CCExtractor
process the stream (for XDS messages, debugging, etc) without actually
generating anything.
- Updated Windows GUI.
- Added -quiet => If used, CCExtractor will not write any message.
- Added -stdout => If used, the captions will be sent to stdout (console)
instead of file. Combined with -, CCExtractor can work as a filter in
a larger process, receiving the stream from stdin and sending the
captions to stdout.
- Some code clean up, minor refactoring.
- Teletext detection (not yet processing).
0.58 (2011-08-21)
-----------------
- Implemented new PTS based mode to order the caption information
of AVC/H.264 data streams. The old pic_order_cnt_lsb based method
is still available via the -poc or --usepicorder command switches.
- Removed a couple of those annoying "Impossible!" error messages
that appears when processing some (possibly broken, unsure) files.
- Added -nots --notypesettings to prevent italics and underline
codes from being displayed.
- Note to those not liking the paragraph symbol being used for the
music note: Submit a VALID replacement in latin-1.
- Added preliminary support for multiple program TS files. The
parameter --program-number (or -pn) will let you choose which
program number to process. If no number is passed and the TS
file contains more than one, CCExtractor will display a list of
found programs and terminate.
- Added support (basic, because I only received one sample) for some
Hauppauge cards that save CC data in their own format. Use the
parameter -haup to enable it (CCExtractor will display a notice
if it thinks that it's processing a Hauppauge capture anyway).
- Fixed bug in roll-up.
- More AVC work, now TS files from echostar that provided garbled
output are processed OK.
- Updated Windows GUI.
0.57 (2010-12-16)
-----------------
- Bug fixes in the Windows version. Some debug code was unintentionally
left in the released version.
0.56 (2010-12-09)
-----------------
- H264 support
- Other minor changes a lot less important
0.55 (2009-08-09)
-----------------
- Replace pattern matching code with improved parser for MPEG-2 elementary
streams.
- Fix parsing of ReplayTV 5000 captions.
- Add ability to decode SCTE 20 encoded captions.
- Make decoding of TS files more error tolerant.
- Start implementation of EIA-708 decoding (not active yet).
- Add -gt / --goptime switch to use GOP timing instead of PTS timing.
- Start implementation of AVC/H.264 decoding (not active yet).
- Fixed: The basic problem is that when 24fps movie film gets converted to 30fps NTSC
they repeat every 4th frame. Some pics have 3 fields of CC data with field 3 CC data
belongs to the same channel as field 1. The following pics have the fields reversed
because of the odd number of fields. I used top_field_first to tell when the channels
are reversed. See Table 6-1 of the SCTE 20 [Paul Fernquist]
0.54 (2009-04-16)
-----------------
- Add -nosync and -fullbin switches for debugging purposes.
- Remove -lg (--largegops) switch.
- Improve synchronization of captions for source files with
jumps in their time information or gaps in the caption
information.
- [R. Abarca] Changed Mac script, it now compiles/link
everything from the /src directory.
- It's now possible to have CCExtractor add credits
automatically.
- Added a feature to add start and end messages (for credits).
See help screen for details.
0.53 (2009-02-24)
-----------------
- Force generated RCWT files to have the same length as source file.
- Fix documentation for -startat / -endat switches.
- Make -startat / -endat work with all output formats.
- Fix sync check for raw/rcwt files.
- Improve timing of dvr-ms NTSC captions.
- Add -in=bin switch to read CCExtractor's own binary format.
- Fix problem with short input files (smaller 1MB).
- Clean up regular and debug output.
- Add -out=bin switch to write RCWT data.
- Remove -bo/--bufferoutput switch and functionality.
- [Volker] Added new generic binary format (RCWT
for Raw Captions With Time). This new format
allows one file to contain all the available
closed caption data instead of just one stream.
- Added --no_progress_bar to disable status
information (mostly used when debugging, as the
progress information is annoying in the middle
of debug logs).
- The Windows GUI was reported to freeze in some
conditions. Fixed.
- The Windows GUI is now targeted for .NET 2.0
instead of 3.5. This allows Windows 2000 to run
it (there's not .NET 3.5 for Windows 2000), as
requested by a couple of key users.
0.51 (unreleased)
-----------------
- Removed -autopad and -goppad, no longer needed.
- In preparation to a new binary format we have
renamed the current .bin to .raw. Raw files
have only CC data (with no header, timing, etc.).
- The input file format (when forced) is now
specified with
-in=format
such as -in=ts, -in=raw, -in=ps ...
The old switches (-ts, -ps, etc.) still work.
The only exception is -bin which has been removed
(reserved for the new binary format). Use
-in=raw to process a raw file.
- Removed -d, which when produced a raw file used
a DVD format. This has been merged into a new
output type "dvdraw". So now instead of using
-raw -d as before, use -out=dvdraw if you need
this.
- Removed --noff
- Added gui_mode_reports for frontend communications,
see related file.
- Windows GUI rewritten. Source code now included,
too.
- [Volker] Dish Network clean-up
0.50 (2008-12-12)
-----------------
- [Volker] Fix in DVR-MS NTSC timing
- [Volker] More clean-up
- Minor fixes
0.49 (2008-12-10)
-----------------
- [Volker] Major MPEG parser rework. Code much
cleaner now.
- Some stations transmit broken roll-up captions,
and for some reason don't send CRs but RUs...
Added work-around code to make captions readable.
- Started work on EIA-708 (DTV). Right now you can
add -debug-708 to get a dump of the 708 data.
An actually useful decoder will come soon.
- Some of the changes MIGHT HAVE BROKEN MythTV's
code. I don't use MythTV myself so I rely on
other people's samples and reports. If MythTV
is broken please let me know.
- Added new debug options.
- [Volker] Added support for DVR-MS NTSC files.
- Other minor bug fixes and changes.
0.46 (2008-11-24)
-----------------
- Added support for live streaming, CCExtractor
can now process files that are being recorded
at the same time.
- [Volker] Added a new DVR-MS loop - this is
completely new, DVR-MS specific code, so we no
longer use the generic MPEG code for DVR-MS.
DVR-MS should (or will be eventually at least)
be as reliable as TS.
Note: For now, it's only ATSC recordings, not
NTSC (analog) recordings.
0.45 (2008-11-14)
-----------------
- Added auto-detection of DVR-MS files.
- Added -asf to force DVR-MS mode.
- Added some specific support for DVR-MS
files. These format used to work
correctly in 0.34 (pure luck) but the
MPEG code rework broke it. It should
work as it used to.
- Updated Windows GUI to support the
new options.
- Added -lg --largegops
From the help screen:
Each Group-of-Picture comes with timing
information. When this info is too separate
(for example because there are a lot of
frames in a GOP) ccextractor may prefer not
to use GOP timing. Use this option is you
need ccextractor to use GOP timing in large
GOPs.
0.44 (2008-09-10)
-----------------
- Added an option to the GUI to process
individual files in batch, i.e. call
ccextractor once per file. Use it if you
want to process several unrelated files
in one go.
- Added an option to prevent duplicate
lines in roll-up captions.
- Several minor bug fixes.
- Updated the GUI to add the new options.
0.43 (2008-06-20)
-----------------
- Fixed a bug in the read loop (no less)
that caused some files to fail when
reading without buffering (which is
the default in the Linux build).
- Several improvements in the GUI, such as
saving current options as default.
0.42 (2008-06-17)
-----------------
- The option switch "-transcript" has been
changed to "--transcript". Also, "-txt"
has been added as the short alias.
- Windows GUI
- Updated help screen
0.41 (2008-06-15)
-----------------
- Default output is now .srt instead of .bin,
use -raw if you need the data dump instead of
.srt.
- Added -trim, which removes blank spaces at
the left and rights of each line in .srt.
Note that those spaces are there to help
deaf people know if the person talking is
at the left or the right of the screen, i.e.
there aren't useless. But if they annoy
you, go ahead...
0.40 (2008-05-20)
-----------------
- Fixed a bug in the sanity check function
that caused the Myth branch to abort.
- Fixed the OSX build script, it needed a
new #define to work.
0.39 (2008-05-11)
-----------------
- Added a -transcript. If used, the output will
have no time information. Also, if in roll-up
mode there will be no repeated lines.
- Lots of changes in the MPEG parser, most of
them submitted by Volker Quetschke.
- Fixed a bug in the CC decoder that could cause
the first line not to be cleared in roll-up
mode.
- CCExtractor can now follow number sequences in
file names, by suffixing the name with +.
For example,
DVD0001.VOB+
means DVD0001.VOB, DVD0002.VOB, etc. This works
for all files, so part001.ts+ does what you
could expect.
- Added -90090 which changes the clock frequency
from the MPEG standard 90000 to 90090. It
*could* (remains to be seen) help if there are
timing issues.
- Better support for Tivo files.
- By default ccextractor now considers the whole
input file list a one large file, instead of
several, independent, video files. This has
been changed because most programs (for example
DVDDecrypt) just cut the files by size.
If you need the old behaviour (because you
actually edited the video files and want to
join the subs), use -ve.
0.36 (unreleased)
-----------------
- Fixed bug in SMI, nbsp was missing a ;.
- Footer for SAMI files was incorrect (<body> and
<sami> tags were being opened again instead of
being closed).
- Displayed memory is now written to disk at end
of stream even if there is no command requesting
so (may prevent losing the last screen-full).
- Important change that could break scripts, but
that have been added because old behaviour was
annoying to most people: _1 and _2 at the end
of the output file names is now added ONLY if
-12 is used (i.e. when there are two output
files to produce). So
ccextractor -srt sopranos.mpg
now produces sopranos.srt instead of sopranos_1.srt.
If you use -12, i.e.
ccextractor -srt -12 sopranos.mpg
You get
sopranos_1.srt and
sopranos_2.srt
as usual.
0.35 (unreleased)
-----------------
- Added --defaultcolor to the help screen. Code
was already in 0.34 but the documentation wasn't
updated.
- Buffer is larger now, since I've found a sample
where 256 Kb isn't enough for a PES (go figure).
- At the end of the process, a ratio between
video length and time to process is displayed.
0.34 (2007-06-03)
-----------------
- Added some basic letter case and capitalization
support. For captions that broadcast in ALL
UPPERCASE (most of them), ccextractor can now
do the first part of the job.
--sentencecap or -sc will tell ccextractor to
follow the typical capitalization rules, such
as capitalize months, days of week, etc.
So from
YOU BETTER RESPECT
THIS ROBE, ALAN
You get
You better respect
this robe, alan.
--capfile or -caf also enables the case