Zhengxue Cheng

I am currently an assistant researcher at MediaLab, in Shanghai Jiao Tong University, working with Prof. Wenjun Zhang, Prof. Li Song. Before that, I received the B.E. degree from Shanghai Jiao Tong University in 2014 and double M.E. degrees from Waseda University and Shanghai Jiao Tong University in 2015 and 2017, respectively. I received a PhD.degree at Waseda University in 2020 under the supervision of Prof. Jiro Katto. From 2018 to 2019, I was a visiting intern at EPFL, Switzerland, working with Prof. Touradj Ebrahimi. After that I worked in Ant Group, Hangzhou, China, as an Algorithm Expert until April 2024.

My research interests include deep learning-based multimodal data compression, image and video enhancement, and lightweight AI algorithm designs. I also received the JSPS DC2, Okawa Foundation Research Grant 2024, CVPR NTIRE 2025 Effficient Super Resolution Winner, VCIP 2024 Best Student Paper RunnerUp, VCIP 2025 Best Paper, PCS 2019 Silver Award for Grand Challenge.

For prospective students interested in AI or data coding, feel free to contact me via email!

news

Dec 05, 2025	Our Paper AlignGS received the VCIP 2025 Best Paper.
Oct 22, 2025	I will serve as an AE of IEEE TCSVT.

selected publications

TaCo: A Benchmark for Lossless and Lossy Codecs of Heterogeneous Tactile Data

Zhengxue Cheng^♯, Yan Zhao, Keyu Wang, Hengdi Zhang, and Li Song

In International Conference on Learning Representations (ICLR), 2026
Instance-Adaptive Spatial-Temporal Enhancement for Efficient Video Compression

Yan Zhao, Zhengxue Cheng^♯, Jiangchuan Li, Donghui Feng, Qunshan Gu, Qi Wang, Guo Lu, and Li Song^♯

IEEE Trans. on Image Processing, 2025

DOI
Linear Attention Modeling for Learned Image Compression

Donghui Feng^{^*}, Zhengxue Cheng^{^*}, Shen Wang, Ronghua Wu, Hongwei Hu, Guo Lu, and Li Song^♯

In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Jun 2025
L3TC: leveraging RWKV for learned lossless low-complexity text compression

Junxuan Zhang^{^*}, Zhengxue Cheng^{^*♯}, Yan Zhao, Shihao Wang, Dajiang Zhou, Guo Lu, and Li Song

In Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, Jun 2025

DOI
OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-Resolution

Xinning Chai, Zhengxue Cheng^♯, Yuhong Zhang, Hengsheng Zhang, Yingsheng Qin, Yucai Yang, Rong Xie, and Li Song^♯

IEEE Transactions on Circuits and Systems for Video Technology, Jun 2025

DOI
Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration

Yuhong Zhang, Hengsheng Zhang, Xinning Chai, Zhengxue Cheng^♯, Rong Xie, Li Song^♯, and Wenjun Zhang

IEEE Transactions on Circuits and Systems for Video Technology, Jun 2025

DOI
SSP-IR: Semantic and Structure Priors for Diffusion-Based Realistic Image Restoration

Yuhong Zhang, Hengsheng Zhang, Zhengxue Cheng^♯, Rong Xie, Li Song^♯, and Wenjun Zhang

IEEE Transactions on Circuits and Systems for Video Technology, Jun 2025

DOI
OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing

Zhengxue Cheng, Yiqian Zhang, Wenkang Zhang, Haoyu Li, Keyu Wang, Li Song, and Hengdi Zhang

Jun 2025
Rate-aware Compression for NeRF-based Volumetric Video

Zhiyu Zhang^{^*}, Guo Lu^{^*}, Huanxiong Liang, Zhengxue Cheng^♯, Anni Tang, and Li Song^♯

In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne VIC, Australia, Jun 2024

DOI
Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto

In , Jun 2020

Abs PDF

Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM.