Low-rank tensor completion using Truncated tensor Schatten p-Norm, LRTC-TSpN.
This is the code repository for paper 'Truncated tensor Schatten p-norm based approach for spatiotemporal traffic data imputation with complicated missing patterns' which is published on Transportation Research Part C: Emerging Technologies
This project provides some examples about how to use LRTC-TSpN to achieve efficient and accurate missing data imputation for transportation time series data. We aim at performing off-line data imputation tasks, with several realistic structural missing patterns. Missing data imputation problem is modelled as a low-rank tensor completion problem (low-rank tensor learning). The objective is to obtain a fully recovered tensor by minimizing a predefined tensor rank function, given the observations. We define a new truncated tensor Schatten p-norm to substitute for the traditional tensor nuclear norm. We recommend ones to refer to Kolda and Bader’s review Tensor Decompositions and Applications for more basics about tensor algebra.
We organise the multiple input time series data into a third-order tensor structure of (time intervals × locations × days). Traditional methods resort to tensor nuclear norm (or sum of nuclear norm) to substitute for the tensor rank, however, these convex surrogates are not powerful in practice. Therefore, we use the newly emerging Schatten p-norm and its truncated version to approximate tensor rank in order to achieve more accurate traffic data imputation.
Schatten p-norm always serves as a better rank surrogate (closer to the true rank) than nuclaer norm, and we can use its nonconvex properity to better approximate tensor rank.
The objective function of Schatten p-norm minimization is formulated as:
This is a typical noncnovex optimization problem. Previous works aiming at solving tensor completion problem always conduct a singular value thresholding (SVT) algorithm. While existing SVT could not be applied to our problem directly. So the main challenge is to develop a new generalized SVT algorithm for this new definition of norm.
We solve this non-convex problem by using Alternating Direction Method of Multipliers (ADMM) and Generalized Soft Thresholding (GST).
Generalized soft-thresholding algorithm:
ADMM framework:
Despite of nonconvexity, ADMM framework still ensures the convergence of our model. With proper updating scheme, our algorithm can converge with fewer iterations. More algorithmic details can be found in our paper. The preprint version is available at arXiv, and the published version can be found at the Elsevier publisher.
Besides the element-wise random missing case, we define three structured fiber mode-n missing scenarios, which are generated through the two-by-two combinations of tensor mode-n fibers. This can be described as:
- ’Intervals’ mode fiber-like missing (FM-0), which illustrates a temporal missing pattern, is caused by adverse weather, breakdown of wireless connections or apparatus maintenance;
- ’Locations’ mode fiber-like missing (FM-1), which denotes a spatial missing pattern, can be explained by lack of electricity for successive sensors or malfunction of Internet Data Center;
- ’Days’ mode fiber-like missing (FM-2) illuminates a spatial-temporal mixture missing situation that they are offline (do not operate) at regular time intervals everyday for specific sensors.
In this repository, we have used two small-size traffic flow datasets to show how to implement our model, they are:
- Guangzhou-small: Speed data with the first 50 locations and the first 15 days. The size is (144 × 50 × 15).
- Portland-small: Volume data with the first 80 locations and the first 15 days. The size is (96 × 80 × 15).
We provide the two datasets in ../Datasets/. The original links for the complete data are given as following.
The Python implementation of LRTC-TSpN is given in ../Imputer/. The core of the algorithm is the GST and ADMM iteration module. We organize this implementation in a tensor-only way to make it more efficient. Some utils and basic tensor operation functions are provided in ../Helper/.
We give some examples written in Jupyter notebook ../Examples/.
Please cite our paper if this repo helps your research.
bibtex:
@article{nie2022truncated,
title={Truncated tensor Schatten p-norm based approach for spatiotemporal traffic data imputation with complicated missing patterns},
author={Nie, Tong and Qin, Guoyang and Sun, Jian},
journal={Transportation Research Part C: Emerging Technologies},
volume={141},
pages={103737},
year={2022},
publisher={Elsevier}
}
-
Tong Nie, Guoyang Qin, Yunpeng Wang, and Jian Sun (2023). Towards better traffic volume estimation: Tackling both underdetermined and non-equilibrium problems via a correlation-adaptive graph convolution network. arXiv preprint arXiv:2303.05660. [Preprint] [Code]
-
Tong Nie, Guoyang Qin, Yunpeng Wang, and Jian Sun (2023). Correlating sparse sensing for large-scale traffic speed estimation: A Laplacian-enhanced low-rank tensor kriging approach. Transportation Research Part C: Emerging Technologies, 152, 104190, [Preprint] [DOI] [Code]
-
Tong Nie, Guoyang Qin, and Jian Sun (2022). Truncated tensor Schatten p-norm based approach for spatiotemporal traffic data imputation with complicated missing patterns. Transportation research part C: emerging technologies, 141, 103737, [Preprint] [DOI] [Code]
This work is released under the MIT license.