Official Pytorch implementation for LSSVC: A Learned Spatially Scalable Video Coding Scheme
- We have added the evaluation w/o real bitstream writing!
- Python 3.6
- CUDA if want to use GPU for acceleration
- cudatoolkit=11.3
- pytorch==1.10.1
- torchvision==0.11.2
- pytorch-msssim==0.2.1
- bd-metric==0.9.0
We provide the test script with real bitstream writing. Please build the C++ code to test with real bitstream writing.
Set the --write_stream 1
to make real coding. There is little difference between the real bitrates and estimated bitrates.
sudo apt-get install cmake g++
cd src
mkdir build
cd build
conda activate $YOUR_PY36_ENV_NAME
cmake ../cpp -DCMAKE_BUILD_TYPE=Release
make -j
- RD curve with intra period 12
- RD curve with intra period 32
We provide our pretrained models:
- Scalable Intra models (IntraSS)
- Scalable Inter models (LSSVC)
IntraSS is a degraded version of LSSVC without interframe references.
We provide the command lines of the encoder and decoder of HM-18.0, VTM-21.2, and SHM. The VTM-21.2 is utilized for both simulcast coding and two-layer scalable coding in our experiments.
For simulcast encoding and decoding, it involves performing two separate single-layer encoding and decoding processes. Here, we summarize the command lines used for single-layer coding.
- HM-18.0 Encoder
TAppEncoder
-c encoder_lowdelay_main.cfg
--IntraPeriod={intra period}
--FramesToBeEncoded=96
--InputFile={input file name}
--FrameRate={frame rate}
--SourceWidth={width}
--SourceHeight={height}
--InputBitDepth=8
--InputChromaFormat=420
--DecodingRefreshType=2
--ConformanceWindowMode=1
--QP={qp}
--BitstreamFile={bitstream file name}
- HM-18.0 Decoder
TAppDecoder
-b {bitstream file name}
-o {reconstructed file name}
- VTM-21.2 Encoder
EncoderApp
-c encoder_lowdelay_vtm.cfg
--IntraPeriod={intra period}
--FramesToBeEncoded=96
--InputFile={input file name}
--FrameRate={frame rate}
--SourceWidth={width}
--SourceHeight={height}
--InputBitDepth=8
--InputChromaFormat=420
--DecodingRefreshType=2
--OutputBitDepth=8
--QP={qp}
--BitstreamFile={bitstream file name}
- VTM-21.2 Decoder
DecoderApp
-b {bitstream file name}
-o {reconstructed file name}
- SHM-12.4 Encoder
TAppEncoder
-c encoder_lowdelay_scalable.cfg
-c layers.cfg
--IntraPeriod0={intra period of layer0}
--InputFile0={input file name of layer0}
--FrameRate0={frame rate of layer0}
--SourceWidth0={width of layer0}
--SourceHeight0={height of layer0}
--QP0={qp of layer0}
--IntraPeriod1={intra period of layer1}
--InputFile1={input file name of layer1}
--FrameRate1={frame rate of layer1}
--SourceWidth1={width of layer1}
--SourceHeight1={height of layer1}
--QP1={qp of layer1}
--FramesToBeEncoded=96
--InputBitDepth=8
--InputChromaFormat=420
--DecodingRefreshType=2
--BitstreamFile={bitstream file name}
- SHM-12.4 Decoder
TAppDecoder
-b {bitstream file name}
-o {reconstructed file name}
- VTM-21.2 Encoder
EncoderApp
-c encoder_lowdelay_vtm.cfg
-c two-layers.cfg
-l0 --FrameRate={frame rate of layer0}
-l0 --FramesToBeEncoded=96
-l0 --IntraPeriod={intra period of layer0}
-l0 --SourceWidth={width of layer0}
-l0 --SourceHeight={height of layer0}
-l0 --OutputBitDepth=8
-l0 --InputFile={input file name}
-l0 --QP={qp of layer0}
-l1 --FrameRate={frame rate of layer1}
-l1 --FramesToBeEncoded=96
-l1 --IntraPeriod={intra period of layer1}
-l1 --SourceWidth={width of layer1}
-l1 --SourceHeight={height of layer1}
-l1 --OutputBitDepth=8
-l1 --InputFile={input file name}
-l1 --QP={qp of layer1}
--BitstreamFile={bitstream file name}
- VTM-21.2 Decoder
DecoderApp
-b {bitstream file name}
-o {reconstructed file name}
If you find our work useful for your research, please cite:
@ARTICLE{10521480,
author={Bian, Yifan and Sheng, Xihua and Li, Li and Liu, Dong},
journal={IEEE Transactions on Image Processing},
title={LSSVC: A Learned Spatially Scalable Video Coding Scheme},
year={2024},
volume={33},
number={},
pages={3314-3327},
keywords={Video coding;Encoding;Image coding;Standards;Scalability;Spatial resolution;Static VAr compensators;Learned video coding;spatial scalability;scalable video coding;contextual MV encoder-decoder;hybrid temporal-layer context mining;interlayer prior},
doi={10.1109/TIP.2024.3395025}}