Release v1.0.0 · ultravideo/kvazaar

It's been 9 months since last release. Now that the encoder just got 10x faster (on veryslow), and quite a bit faster and better on every other preset as well, I think it's time for a major verson bump.

Average BD bitrate (QP 17, 22, 27, 32) v1.0.0 vs v0.8.3

Class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
A	-16.4%	-26.9%	-27.5%	-31.0%	-11.2%	-11.9%	-11.3%	-6.7%	-4.8%
B	-16.2%	-33.7%	-31.7%	-37.6%	-11.6%	-14.8%	-15.7%	-9.1%	-6.3%
C	-7.0%	-17.6%	-28.0%	-31.2%	-8.3%	-9.0%	-11.3%	-7.1%	-8.1%
D	-3.7%	-12.3%	-29.2%	-30.3%	-5.4%	-5.9%	-11.5%	-8.3%	-9.9%
E	-28.4%	-42.6%	-33.5%	-39.4%	-22.6%	-28.5%	-20.3%	-7.0%	-0.7%
F	-6.1%	-11.3%	-12.8%	-16.5%	-10.1%	-2.1%	2.3%	10.8%	6.4%

|
|All|-13.0%|-24.1%|-27.1%|-31.0%|-11.5%|-12.0%|-11.3%| -4.6%| -3.9%|

Average speedup (QP 17, 22, 27, 32) v1.0.0 vs v0.8.3

Class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
A	1.61x	1.91x	1.89x	1.37x	2.69x	3.33x	4.79x	7.32x	11.06x
B	1.65x	1.98x	1.96x	1.46x	2.67x	3.36x	4.79x	8.15x	13.89x
C	1.76x	1.97x	1.98x	1.45x	2.52x	2.97x	4.87x	9.32x	15.77x
D	2.09x	1.87x	1.81x	1.32x	1.97x	2.36x	5.13x	8.78x	12.65x
E	1.91x	1.96x	1.75x	1.40x	3.00x	3.70x	4.87x	6.06x	7.56x
F	1.84x	1.83x	1.74x	1.41x	2.86x	2.98x	4.60x	8.18x	13.58x

|
|All|1.81x|1.92x|1.86x|1.40x|2.62x|3.12x|4.84x|7.97x|12.42x|

Paramaeters: --threads=4 --owf=1 --wpp -p64

New Features

--version
--help
--loop-input
--mv-constraint to constrain motion vectors
--tiles=2x2 as an alternative syntax for uniform tiles
--hash=md5
Print information about what SIMD optimizations are in use
--mv=full8 --mv=full16 --mv=full32 --mv=full64
--cu-split-termination=zero/off
--crypto for selective encryption of bitstream (for OpenHEVC)
--me-early-termination=sensitive/on/off for early termination of motion vector search
Added 4x8 SMP and 4x12 AMP motion partitions
--subme=0/1/2/3/4 for control over complexity of fractional pixel motion prediction
--lossless for lossless coding
Monochrome coding
--input-format=420/400
--input-bitdepth=8/10
--tmpv for temporal motion vector predictor
--rdoq-skip for not using rdoq for situations where it's unlikely to improve BDRate
Modified --gop=lp-g4d3r1t1 syntax to not take the reference frames as a parameter, it's now --gop=lp-g4d3t1.
Enable WPP and multithreading by default, with detection for number of cores
Update all presets to ratedistortion-complexity optimized versions. These are based on a search of all (~ish) possible encoding parameters and bring a huge boost to both speed and BDRate when encoding with the presets (10x speed for veryslow, ~1.1x-4x for others, up to 30% improved BDRate for some presets).
Set default options to match medium with intra period of 64, QP 22 and --gop=lp-g4d3t1
--implicit-rdpcm RExt feature

Optimizations

AVX2 version for Sample Adaptive Offset (SAO)
Optimized memory copying
AVX2 versions of filters for fractional pixel motion estimation
AVX2 version for half pixel chroma sampling for SMP/AMP
AVX2 versions for calculating two or four SATD values at once for small blocks
Rewrote AVX2 version of fractional pixel motion compensation
Rewrote motion vector cost calculation. It only got slightly faster, but BDRate improved a bunch due to the new implementation being more correct.
Made AVX2 SAD use SSE4.1 for cases where there isn't an AVX2 implementation, speeding up SMP/AMP.

Bugfixes

Fixed a bug in rate control where an int overflowed after coding 2^31 bits (2Gb)
Fixed non-determinism intiles
Fixed chroma reconstruction bug in tiles
Fixed a bug with calculating the number of bits used for intra mode on 4x4 CUs
Stopped checking zero motion vector multiple times in motion compensation
Fixed possible segfault in motion compensation
Fixed a race condition with OWF and SMP/AMP
Gave pthread_cond_timedwait time in correctly, such that main thread now sleeps instead of busylooping when it has nothing to do
Fixed rate control with lp-gop
Fixed full search not taking temporal motion vector into account
Allow non-gop-length intra period for lp-gop

Code / Building / Testing

Moved SAO to it's own file
Removed a ton of unnecessary includes
Updated autotools ax_pthread
Added build test for OS-X for Travis
Made tests check for bitstream correctness
Refactored some of the copypasta in motion vector search starting point selection
Refactored the cu_info_t datastructures to hold information at a 4x4 resolution needed for AMP and SMP
Changed cu_info_t to use bitfields to negate the effect of increasing the cu_info_t array by a factor of 4
Moved bitstream generation from encoderstate.c to encode_coding_tree.c
Renamed encoder_state_t.global to frame, which makes sense since it hold frame level data, not global data
Rewrote integer vector inter prediction, because it was so bad
Refactored init_lcu_t
Added more tests for inter SAD
Added speed tests for dual intra SAD functions
Added more realistic speed tests for inter SAD

Other

Added a manpage
Added scripts for updating manpage and README based on --usage.
Added a Dockerfile. Just because.
Added commit date to --version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0

New Features

Optimizations

Bugfixes

Code / Building / Testing

Other