CSMR 2012
Technical Track
Software Clustering
Wednesday, March 2, 2011, from 14:00 to 15:30, A14 HS2- Anna Corazza, Sergio Di Martino, Valerio Maggio and Giuseppe Scanniello - Investigating the use of Lexical Information for Software System Clustering
- Rashid Naseem, Onaiza Maqbool and Siraj Muhammad - Improved Similarity Measures For Software Clustering
- Gladston Aparecido, Humberto Marques and Marco Tulio Valente - On the Benefits of Planning and Grouping Software Maintenance Requests
Investigating the use of Lexical Information for Software System Clustering
nna Corazza, Sergio Di Martino, Valerio Maggio and Giuseppe Scanniello
Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of information to understand what a software system implement, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting lexical information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce lexical information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the EM algorithm. To group source files accordingly, we customized a well-known hierarchical clustering algorithm. The investigation has been conducted on a dataset of a number of open source Java software systems.
Improved Similarity Measures For Software Clustering
Rashid Naseem, Onaiza Maqbool and Siraj Muhammad
Software clustering is a useful technique to recover architecture of a software system. The results of clustering depend upon choice of entities, features, similarity measures and clustering algorithms. Different similarity measures have been used for determining similarity between entities during the clustering process. In software architecture recovery domain the Jaccard and the Unbiased Ellenberg measures have shown better results than other measures for binary and non-binary features respectively. In this paper we analyze the Russell and Rao measure for binary features to show the conditions under which its performance is expected to be better than that of Jaccard. We also show how our proposed Jaccard-NM measure is suitable for software clustering and propose its counterpart for non-binary features. Experimental results indicate that our proposed Jaccard-NM measure and Russell and Rao measure perform better than Jaccard for binary features, while for non-binary features, the proposed Unbiased Ellenberg-NM produces results which are closer to the decomposition prepared by experts.
On the Benefits of Planning and Grouping Software Maintenance Requests
Gladston Aparecido, Humberto Marques and Marco Tulio Valente
Despite its unquestionable importance, software maintenance usually has a negative image among software developers and even project managers. As a result, it is common to consider maintenance requests as short-term tasks that should be implemented quickly to have a minimal impact in the work of end-users. In order to promote software maintenance to a first-class development activity, we first propose in this paper a process for handling maintenance requests as software projects. The proposed process, called PASM (Process for Arranging Software Maintenance Requests), supports a periodic maintenance policy with three main phases: Registration, Grouping, and Processing. Next, we evaluate the benefits achieved by the PASM process at a real software development organization. For this purpose, we rely on clustering analysis techniques in order to better understand and compare the requests handled before and after the adoption of the proposed process.