Evaluation of duplicated code detection tools in cross-project context

Johnatan A. de OliveiraEduardo M. FernandesEduardo Figueiredo

Two or more code segments are considered duplicated when there is a high rate of similarity among them or they are exactly the same. Aiming to detect duplicated code in single software projects, several tools have been proposed. However, in case of cross-project detection, there are few tools. There is little empirical knowledge about the efficacy of these tools to detect duplicated code across different projects. Therefore, our goal is to assess the efficacy of duplicated code detection tools for single projects in cross-project context. It was concluded that the evaluated tools has no sufficient efficacy in the detection of some types of duplicated code beyond exact copy-paste. As a result, this work proposes guidelines for future implementation of tools.

