Information Theory Tools and Techniques to Overcome Machine Learning Challenges

Authors

  • Mariam E. Haroutunian Institute for Informatics and Automation Problems of NAS RA
  • Gor A. Gharagyozyan Institute for Informatics and Automation Problems of NAS RA

Keywords:

Information Bottleneck, Neural networks, Entropy-Based regularization, Mutual information, Feature selection, KL-Divergence

Abstract

In this survey, we explore the broad applications of Information Theory in Machine Learning, highlighting how core concepts like entropy, Mutual Information, and KL-divergence are used to enhance learning algorithms. Since its inception by Claude Shannon, Information Theory has provided mathematical tools to quantify uncertainty, optimize decision-making, and manage the trade-off between model flexibility and generalization. These principles have been integrated across various subfields of Machine Learning, including neural networks, where the Information Bottleneck offers insights into data representation, and reinforcement learning, where entropy-based methods improve exploration strategies. Additionally, measures like Mutual Information are critical in feature selection and unsupervised learning. This survey bridges foundational theory with its practical implementations in modern Machine Learning by providing both historical context and a review of contemporary research.. We also discuss open challenges and future directions, such as scalability and interpretability, highlighting the growing importance of these techniques in next-generation models.

References

C. E. Shannon, “A Mathematical Theory of Communication”, Bell System Technical Journal, vol. 27, no 3, pp. 379-423, 1948.

J. Rissanen, “Modeling by shortest data description”, Automatica, vol. 14, no. 5, pp. 465-471, 1978.

E. T. Jaynes, “IT and statistical mechanics”, Physical Review, vol. 106, no. 4, pp. 620-630, 1957.

N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle”, IEEE IT Workshop, Jeju Island, Korea, pp. 1-5, 2015.

V. Mnih, K. Kavukcuoglu and D. Silver, “Human-level control through deep reinforcement learning”, Nature, vol. 518, pp. 529-533, 2015.

H. Peng, F. Long and C. Ding, “Feature selection based on MI: Criteria of max-dependency, max-relevance, and min-redundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, 2005.

D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes”, International Conference on Learning Representations, vol. 1, 2013.

R. Shwartz-Ziv and N. Tishby, “Opening the black box of Deep Neural Networks via Information”, arXiv abs/1703.00810, 2017.

J. Biamonte, “Quantum Machine Learning”, Nature, vol. 549, pp. 195-202, 2017.

E. Haroutunian, M. Haroutunian and A. Harutyunyan, “Reliability criteria in information theory and in statistical hypothesis testing”. Foundations and Trends in Communications and Information Theory, vol 4(2-3), pp. 97-263, 2007.

T. M. Cover and J. A . Thomas, Elements of information theory, Second Edition, Wiley, New York, 2006.

A. L. Berger, V. J. D. Pietra and S. A. D. Pietra, “A maximum entropy approach to natural language processing”, Computational Linguistics, vol. 22, no. 1, pp. 39-71, 1996.

F. Fleuret, “Fast binary feature selection with conditional mutual information”, Journal of Machine Learning Research, vol. 5, pp. 1531-1555, 2004.

H. H. Yang and J. Moody, “Data visualization and feature selection: new algorithms for nongaussian data”, Advances in Neural Information Processing Systems, vol. 12, 2000.

M. V. Naquet and S. Ullman, “Object recognition with informative features and linear classification”, Ninth IEEE International Conference on Computer Vision (ICCV), pp 281-288, 2003.

J. R. Quinlan, “Induction of decision trees”, Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.

T. M. Hong, S. T. Roche and B. Carlson, “Nanosecond anomaly detection with decision trees and real-time application to exotic Higgs decays”, Nature Communications, vol. 15, 2024.

L. Liu, R. Chen, X. Liu, J. Su and L. Qiao, “Towards practical privacy-preserving decision tree training and evaluation in the cloud”, IEEE Transactions on information forensics and security, vol. 15, pp. 2914-2929, 2020.

M. Haroutunian, D. Asatryan and K. Mastoyan, “Analyzing the quality of distorted images by the normalized mutual information measure”, Mathematical Problems of Computer Science, vol. 61, pp. 7-14, 2024.

R. G. Congalton, et al., “Robust possibilistic fuzzy additive partition clustering motivated by deep local information”, Circuits, Systems, and Signal Processing, 2024.

I. S. Dhillon, S. Mallela and D. S. Modha “Information-theoretic co-clustering”, Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89-98, 2003.

M. Wang and F. Sha, “Information theoretical clustering via semidefinite programming”, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 761-769, 2011.

C. Liang and Y. Leng, “Collaborative filtering based on information-theoretic co-clustering”, International Journal of Systems Science, vol. 45, no. 3, pp. 589-597, 2012.

C. Bloechl, R. A. Amjad and B. C. Geiger, “Co-clustering via information-theoretic Markov aggregation”, arXiv:1801.00584, 2018.

M. Haroutunian, K. Mkhitaryan and J. Mothe, “A new information-theoritical distance measure for evaluating community detection algorithms”, Journal of Universal computer science, vol. 25, no. 8, pp. 887-903, 2019.

R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler and Y. Bengio, “Learning deep representations by mutual information estimation and maximization”, Proceedings of the International Conference on Learning Representations (ICLR), 2019.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, Massachusetts, USA, 2016.

J. V. Davis, B. Kulis, P. Jain, S. Sra and I. S. Dhillon, “Information-theoretic metric learning”, Proceedings of the 24th International Conference on Machine Learning (ICML), pp. 209–216, 2007.

E. Erdemir, P. L. Dragotti, D Gunduz, “Privacy-aware time-series data sharing with deep reinforcement learning”, EEE Transactions on Information Forensics and Security, vol. 16, pp. 389-401, 2021.

J. C. Principe and D. Xu, “An introduction to information theoretic learning”, Proceedings of the IJCNN'99 International Joint Conference on Neural Networks, vol. 3, pp. 1783-1787, 1999.

X. Wu, J. H. Manton, U. Aickelin and J. Zhu, “On the generalization for transfer learning: an information-theoretic analysis”, arXiv:2207.05377, 2022.

N. Tishby, F. C. Pereira and W. Bialek, “The information bottleneck method”, arXiv:physics/0004057, 1999.

P. Harremoes and N. Tishby, “The information bottleneck revisited or how to choose a good distortion measure”, 2007 IEEE International Symposium on Information Theory, Nice, France, pp. 566-570, 2007.

M. Vera, P. Piantanida and L. R. Vega, “The role of the information bottleneck in representation learning”, IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, pp. 1580-1584, 2018.

A. A. Alemi, I. Fischer, J. V. Dillon and K. Murphy, “Deep variational information bottleneck”, arXiv:1612.00410, 2019.

R. Shwartz-Ziv, “Information flow in deep neural networks”, arXiv:2202.06749, 2022.

J. Li and D. Liu, “Information bottleneck theory on convolutional neural networks”, Neural Processing Letters, vol. 53, no. 2, pp. 1385-1400, 2021.

S. Han, K. Nakamura and B. Hong, “Splitting of composite neural networks via proximal operator with information bottleneck”, IEEE Access, vol. 12, pp. 157-167, 2024.

A. Bardera, J. Rigau, I. Boada, M. Feixas and M. Sbert, “Image segmentation using information bottleneck method”, IEEE Transactions on Image Processing, vol. 18, no. 7, pp. 1601-1612, 2009.

Z. An, J. Zhang, Z. Sheng, X. Er and J. Lv, “RBDN: Residual bottleneck dense network for image super-resolution”, IEEE Access, vol. 9, pp. 103440-103451, 2021.

B. Lee, K. Ko, J. Hong, B. Ku and H. Ko, “Information bottleneck measurement for compressed sensing image reconstruction”, IEEE Signal Processing Letters, vol. 29, pp. 1943-1947, 2022.

M. Stark, L. Wang, G. Bauch and R. D. Wesel, “Decoding rate-compatible 5G-LDPC codes with coarse quantization using the information bottleneck method”, IEEE Open Journal of the Communications Society, vol. 1, pp. 646-660, 2020.

J. Wu, Y. Huang, M. Gao, Z. Gao, J. Zhao, J. Shi and A. Zhang, “Exponential information bottleneck theory against intra-attribute variations for pedestrian attribute recognition”, IEEE Transactions on Information Forensics and Security, vol. 1, pp. 5623-5635, 2023.

S. Wang, C. Li, Y. Li, Y. Yuan and G. Wang, “Self-supervised information bottleneck for deep multi-view subspace clustering”, IEEE Transactions on Image Processing, vol. 32, pp. 1555-1567, 2023.

X. Yan, Y. Ye, Y. Mao and H. Yu, “Shared-private information bottleneck method for cross-modal clustering”, IEEE Access, vol. 7, pp. 36045-36056, 2019.

Z. Liu, X. Wang, X. Huang, G. Li, K. Sun and Z. Chen, “Incomplete multi-view representation learning through anchor graph-based GCN and information bottleneck”, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pp. 71130-7134, 2024.

X. Yan, Y. Mao, Y. Ye and H. Yu, “Cross-modal clustering with deep correlated information bottleneck method”, IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 10, pp. 13508-13522, 2024.

K. Yang, W. Tai, Z. Li, T. Zhong, G. Yin, Y. Wang and F. Zhou, “Exploring self-explainable street-level IP geolocation with graph information bottleneck”, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, pp. 7270-7274, 2024.

S. Cui, J. Cao, X. Cong, J. Sheng, Q. Li, T. Liu and J. Shi, “Enhancing multimodal entity and relation extraction with variational information bottleneck”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1274-1285, 2024.

T. Gu, G. Xu and J. Luo, “Sentiment analysis via deep multichannel neural networks with variational information bottleneck”, IEEE Access, vol. 8, pp. 121014-121021, 2020.

Z. Wu and S. King, “Improving trajectory modelling for DNN-based speech synthesis by using stacked bottleneck features and minimum generation error training”, IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 24, 2016.

S. H. Lee, H. R. Noh, W. J. Nam and S. -W. Lee, “Duration controllable voice conversion via phoneme-based information bottleneck”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1173-1183, 2022.

C. Wang, S. Du, W. Sun and D. Fan, “Self-supervised learning for high-resolution remote sensing images change detection with variational information bottleneck”, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 5849-5866, 2023.

X. Liu, Y. l. Li and S. Wang, “Learning generalizable visual representations via self-supervised information bottleneck”, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, pp. 5385-5389, 2024.

L. Sun, C. Guo, M. Chen and Y. Yang, “Privacy-aware joint source-channel coding for image transmission based on disentangled information bottleneck”, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, pp. 9016-9020, 2024.

Z. Chen, Z. Yao, B. Jin, M. Lin and J. Ning, “FIBNet: Privacy-enhancing approach for face biometrics based on the information bottleneck principle”, IEEE Transactions on Information Forensics and Security, vol. 19, pp. 8786-8801, 2024.

A. Pensia, V. Jog and P. L. Loh, “Extracting robust and accurate features via a robust information bottleneck”, IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 1, pp. 131-144, 2020.

M. Vera, L. R. Vega and P. Piantanida, “Collaborative information bottleneck”, IEEE Transactions on Information Theory, vol. 65, no. 2, pp. 787-815, 2019.

M. Vera, L. Rey Vega and P. Piantanida, “The two-way cooperative information bottleneck”, 2015 IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, pp. 2131-2135, 2015.

Z. Yan, G. Hanyu and X. Yugeng, “Modified bottleneck-based heuristic for large-scale job-shop scheduling problems with a single bottleneck”, Journal of Systems Engineering and Electronics, vol. 18, no. 3, pp. 556-565, 2007.

A. Gronowski, W. Paul, F. Alajaji, B. Gharesifard and P. Burlina, “Classification utility, fairness, and compactness via tunable information bottleneck and Rényi measures”, IEEE Transactions on Information Forensics and Security, vol. 19, pp. 1630-1645, 2024.

Z. Goldfeld, Y. Polyanskiy, “The information bottleneck problem and its applications in machine learning”, IEEE Journal On Selected Areas In Information Theory, vol. 1, no. 1, pp. 19-38, 2020.

M. I. Belghazi, “Mutual information neural estimation (MINE)”, Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1391-1400, 2018.

D. McAllester and K. Stratos, “Formal limitations on the measurement of mutual information”,The Proceedings of AISTATS 2020, pp. 875-884, 2020.

Downloads

Published

2025-06-01

How to Cite

Haroutunian, M. E., & Gharagyozyan, G. A. (2025). Information Theory Tools and Techniques to Overcome Machine Learning Challenges. Mathematical Problems of Computer Science, 63, 25–41. Retrieved from https://mpcs.sci.am/index.php/mpcs/article/view/882

Most read articles by the same author(s)

1 2 > >>