Effective and Accurate Binary Clone Detection

Authors

  • Hayk K. Aslanyan Ivannikov Institute for System Programming of the RAS

Keywords:

Binary code clone, Program static analysis, Program dependence graph

Abstract

Software developers usually copy and paste a particular piece of code as they prefer to use a pre-written or a partial solution as a basis for solving their problem. However, it can lead to various errors, as well as increase the size of the source and binary code. Finding similar parts of code (clones) in binary code becomes more applicable when the source code is not available. Additionally, a compiler can copy some parts of code during various transformations and create code clones, which do not exist in the source code. Detection of binary code clones is used for malware analysis, finding semantic errors, detecting copyright violation, etc. This article discusses the existing methods of binary code clones detection and introduces a new method for binary clone detection. It consists of three main stages. The first stage is based on Binnavi platform [1] and generates program dependence graphs for each binary function. Graphs are generated based on REIL [2] (Reverse Engineering Intermediate Language) platform-independent language. REIL representation is supported for several architectures (x86, x86-64, ARM, MIPS, PPC), thus ensuring the independence of the tool from the target architecture. The second stage detects clones based on previously created graphs. A polynomial heuristic algorithm is suggested for finding the maximum common subgraph of two program dependence graphs. At the third stage, the obtained clones are visualized for manual analysis.

References

[Online]. Available: https://www.zynamics.com/binnavi.html

[Online]. Available: https://www.zynamics.com/binnavi/manual/html/reil_language.htm

S. Ducasse, M. Rieger and S. Demeyer, "A language independent approach for detecting duplicated code," in Proceedings of the 15th International Conference on Software Maintenance, pp. 109-118, 1999.

T. Kamiya, S. Kusumoto and K. Inoue, "CCFinder: A multilinguistic tokenbased code clone detection system for large scale source code," in IEEE Transactions on Software Engineering, vol 28, no. 7, pp. 654-670, Jul 2002.

I. Baxter, A. Yahin, L. Moura and M. Anna, "Clone detection using abstract syntax trees," in Proceedings of the 14th IEEE International Conference on Software Maintenance, pp. 368-377, 1998.

R. Tairas and J. Gray, "Phoenix-based clone detection using suffix trees," in Proceedings of the 44th Annual Southeast Regional Conference, pp. 679-684, 2006.

L. Jiang, G. Misherghi, Z. Su and S. Glondu, "DECKARD : Scalable and accurate treebased detection of code clones," in Proceedings of the 29th International Conference on Software Engineering, pp. 96-105, 2007.

R. Komondoor and S. Horwitz, "Using slicing to identify duplication in source code," in Proceedings of the 8th International Symposium on Static Analysis, pp. 40-56, 2001.

J. Krinke, "Identifying similar code with program dependence graphs," in Proceedings of the 8th Working Conference on Reverse Engineering, pp. 301-309, 2001.

M. Gabel, L. Jiang and Z. Su, "Scalable detection of semantic clones," in Proceedings of 30th International Conference on Software Engineering, pp. 321-330, 2008.

S. Sargsyan, S. Kurmangaleev, A. Baloian and H. Aslanyan, "Scalable and Accurate Clones Detection Based on Metrics for Dependence Graph," Mathematical Problems of Computer Science, vol. 42, pp. 54-62, 2014.

A. Avetisyan, S. Kurmangaleev, S. Sargsyan, M. Arutunian and A. Belevantsev, "LLVM Based code clone detection framework," in 10th International Conference on Computer Science and Information Technologies, pp. 100-104, 2015.

S. Sargsyan, S. Kurmangaleev, A. Belevantsev and A. Avetisyan, "Scalable and accurate code clone detection," Programming and Computer Software, vol. 6, pp. 9-17, 2015.

S. Sargsyan, S. Kurmangaleev, A. Belevantsev, H. Aslanyan and A. Baloian , "Scalable tool for code clone detection based on semantic analysis of program," Trudy. ISP RAS, vol. 1, pp. 39-50, 2015.

J. Jang and D. Brumley, "Bitshred: Fast, scalable code reuse detection in binary code," CyLab, p. 28, 2009.

B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors," Communications of the ACM, pp. 422-426, 1970.

A. Schulman, "Finding binary clones with opstrings function digests: Part III," Dr. Dobb’s Journal, pp. 56-61, 2005.

M.E. Karim, A. Walenstein, A. Lakhotia and L. Parida, "Malware phylogeny generation using permutations of code," Computer Virology, pp. 13-23, 2005.

D. Bruschi, L. Martignoni and M. Monga, "Code normalization for self-mutating malware," IEEE Security & Privacy, pp. 46-54, 2007.

A. Sæbjørnsen, J. Willcock, T. Panas, D. Quinlan and Z. Su, "Detecting code clones in binary executables," in Proceedings of the 18th International Symposium on Software Testing and Analysis, pp. 117-128, 2009.

A. Andoni and P. Indyk, "Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions," in 47th Annual IEEE Symposium on Foundations of Computer Science , pp. 459-468, 2006.

M. R. Farhadi, B. C. M. Fung, P. Charland and M. Debbabi, "BinClone: Detecting Code Clones in Malware," in 2014 Eighth International Conference on, San Francisco, CA, pp. 78-87, 2014.

T. Dullien, E. Carrera, S. Eppler and S. Porst, "Automated attacker correlation for malicious code," DTIC Document, 2010.

Y. David and E. Yahav, "Tracelet-based code search in executables," in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 349-360, 2014.

D. E. Krutz and E. Shihab, "CCCD: Concolic Code Clone Detection," WCRE, 2013. D. E. Krutz and E. Shihab, "CCCD: Concolic code clone detection," 2013 20th Working Conference on Reverse Engineering, Koblenz, 2013, pp. 489-490.

https://www.hex-rays.com/products/ida. 72 Effective and Accurate Binary Clone

H. K. Aslanyan, S. F. Kurmangaleev, V. G. Vardanyan, M. S. Arutunian and S. S. Sargsyan, "Platform-independent and scalable tool for binary code clone detection," Trudy ISPRAN/Proc. ISP RAS, vol. 1, no. 2, pp. 215-226, 2016.

Downloads

Published

2021-12-10

How to Cite

Aslanyan, H. K. (2021). Effective and Accurate Binary Clone Detection. Mathematical Problems of Computer Science, 48, 64–73. Retrieved from http://mpcs.sci.am/index.php/mpcs/article/view/123

Most read articles by the same author(s)