Design Roots

Using DV8, the user can detect a small group of related files involved in a set of target issues, such as bug issues or refactoring issues, specified using an issue ID list file. This function will enable the user to examine the design relations among files with similar properties, such as error-proneness. We call the detected file groups roots. If the target issues are bug issues, we call them bug roots; if the target issues include all change activities, we call them change roots.

Our research [13]  has shown that just five bug roots typically cover 50% to 90% of the most error-prone files in a system. This observation has been validated over dozens of industrial and open-source software systems. The implication is that most error-prone files are usually connected in design; the more error-prone the files are, the more likely they are related, and that errors propagate through the connections.

Here file error-proneness is determined by the number of times a file is involved in bug fixes. The more often a file is changed to fix bugs, the more error-prone it is. Using DV8, the user can specify a threshold for a file to be considered error-prone. By default, we used a threshold of 2 for error-proneness. That is, files that were changed for bug fixes two or more times are considered as error-prone. 

(1) A sample root: capturing most change-prone files and their design flaws. The following figure presents an example of a detected change root in an industrial project [9] . In this figure, the "CF" column lists the change frequency of each file; and the "Top" column lists the percentage ranking in terms of change-proneness of each file. For example, the file "p4.F3" in row 26 was changed 361 times, and it is ranked the most change-prone (top 0.1%ile) among all 2,403 changed files in Proj_SS. 84% of the files in this root ranked within the top 10th percentile most change prone, and six out of the 31 files ranked within the top 1st percentile, which indicates that the root is a real maintenance hotspot. Files in this root are clustered into three  design rules hierarchy layers: L1: (rc1-rc27), L2: (rc28), and L3: (rc29-rc31). Files in each layer are recursively clustered into independent modules. For example, files 10 - 26 are grouped into 5 modules, and these modules are structurally separated from each other. 

From each root, the user can detect design anti-patterns that may be responsible for the propagation of bugs. For example, in the figure below: 1) p1.F1, an unstable interface, is depended upon by most of the files, and most of these dependents have changed together with it frequently;  2) Multiple dependency cycles are identified, such as, p1.F 5 ↔p2.F 2, and p2.F 2 →p2.F 1 →p1.F 6 →p1.F 5 → p2.F 2; 3) p1.F1 depends on its child, which is an Unhealthy Inheritance; 4) Many modularity violations are highlighted in red: structurally independent modules that have changed together frequently.   


Figure: DRH-Clustered Architecture Root

d: depend; i: inherit; CF: Change Frequency; Top: percentile rank 

(2) Cumulative effects of roots: a few roots capture most bugs or changes. The advantage of root detection is that the user doesn't need to examine many files or instances to figure out which design problems contribute most to error-proneness and/or change-proneness. Instead, the user only needs to explore a few, usually fewer than 5, file groups to figure out which design problems are most severe. 

A file may participate in more than one root; that is, roots overlap with each other. DV8 also calculates their cumulative data, as shown in the following table (from [9] ):  In this table, "Size" means the number of distinct files in the first n roots, where, n = 1, 2, ..., 4. The "%Size" column presents the percentage of the root size compared with the project's total number of files. For example, "222" in the second row means that root1 and root2 (the first 2 Roots) contain 222 distinct files, which cover 14% of all files in the project. The "Coverage" column presents the cumulative coverage of change-prone or bug-prone files by these roots. This table's fourth row indicates all these 4 roots contain only 24% of all the files in this project but cover 55% of all change-prone files and 65% of all bug-prone files. Files in each root are connected in design. Hence change-proneness or bug-proneness may be propagated among these files.


Table: Cumulative Data of Architecture Roots