A PDE-Based Convolutional Neural Network with Variational Information Bottleneck: Experimental Evaluation and Generalization Analysis
DOI:
https://doi.org/10.51408/1963-0138Keywords:
Information bottleneck, Partial differential equations, Deep learning, Convolutional neural networks, GeneralizationAbstract
We present a hybrid convolutional architecture that combines trainable PDE-based preprocessing with a Variational Information Bottleneck (VIB) to improve generalization in image classification. The PDE stage applies a small number of discretized Laplacian steps with learnable step size and depthwise coupling, injecting physics-inspired inductive bias into early feature maps. A tensor-wise VIB module then parameterizes a Gaussian latent (μ, log σ2) via 1×1 convolutions and enforces information compression through a KL penalty to a unit prior, encouraging retention of task-relevant features while discarding nuisance variability. The compressed representation feeds a ResNet-18 backbone adapted for CIFAR-10 inputs. On CIFAR-10, systematic variation of the VIB weight β shows that moderate compression yields improved test performance and training stability relative to both a baseline CNN and a PDE-only variant. Qualitative analysis indicates smoother activations and reduced sensitivity to input noise, consistent with the information-theoretic objective. The results suggest that PDE priors and variational compression act complementarily, offering a principled path to robust and generalizable convolutional models.
References
K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition”, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770-778, 2016.
V. R. Sahakyan, V. G. Melkonyan, G. A. Gharagyozyan, and A. S. Avetisyan, “Enhancing Image Recognition with Pre-Defined Convolutional Layers Based on PDEs”, Programming and Computer Software, vol. 49, no. 3, pp. 192-197, 2023.
Gor Gharagyozyan, “Improving CNN Generalization with PDE Preprocessing and the Variational Information Bottleneck”, Proceedings of International CSIT Conference 2025, Yerevan, Armenia, pp. 145–147, 2025.
A. A. Alemi, I. Fischer, J. V. Dillon and K. Murphy, “Deep variational information bottleneck”, arXiv:1612.00410, 2019.
Tishby, F.C. Pereira and W. Bialek, “The information bottleneck method”, arXiv:physics/0004057, 1999.
R.D. Richtmyer and K.W. Morton, Difference Methods for Initial-Value Problems, Second Edition, John Wiley & Sons, New York, 1967.
A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, University of Toronto, Tech. Rep., 2009.
PyTorch homepage. [Online]. Available: https://pytorch.org/
C. Guo, G. Pleiss, Y. Sun and K.Q. Weinberger, “On Calibration of Modern Neural Networks”, Proceedings of ICML, pp. 1321–1330, 2017.
K.P. Murphy, Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
S. Zagoruyko and N. Komodakis, “Wide Residual Networks”, arXiv:1605.07146, 2016.
D. Hendrycks, N. Mu, E.D. Cubuk et al., “AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty”, arXiv:1912.02781, 2020.
S. Kudo, “Label Smoothing is a Pragmatic Information Bottleneck”, arXiv:2508.14077, 2025.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Gor A. Gharagyozyan

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




