PHP Advent Calendar Day 3
03 Dec 2007Today's entry is provided by Sebastian Bergmann.
- Name
- Sebastian Bergmann
- Blog
- sebastian-bergmann.de
- Biography
- Sebastian Bergmann is a long-time contributor to various PHP projects, including PHP itself. He is the developer of PHPUnit and offers consulting, training, and coaching services to help enterprises improve the quality assurance process for their PHP-based software projects.
- Location
- Siegburg, Germany
Where do most bugs hide in a software project? A small script written in PHP can help us answer this question by mining a version control repository for the relevant information. This assumes, of course, that you are using version control software to manage your project, and that you are using consistent messages when you commit a bug fix, and only touch source code files relevant to the bug fix in that commit.
So, let us assume that we are using Subversion to manage our project's source code, and that we use messages such as "Fix #2204." when a bug fix is committed. We also assume that this script has filesystem access to the Subversion repository. We start with some configuration (repository location) and variable initialization:
<?php
// Configure the repository location.
$repository = '/var/svn/phpunit';
$paths = array();
$repository = realpath($repository);
?>
The first step is to look for all commits made to the repository for which the commit message matches our bug fix format. The svn log
command can help us here. It shows log messages from the repository and does so, optionally, in XML format. PHP's SimpleXML extension provides a very simple and easily usable toolset to parse XML.
In our script, we use shell_exec()
to run the svn log --xml
command on our repository. The generated XML is then loaded via simplexml_load_string()
into an object that we can iterate.
<?php
$log = simplexml_load_string(
shell_exec(sprintf('svn log --xml file://%s', $repository))
);
?>
For each revision that matches our search criteria, we use the svnlook changed
command to get the paths that were changed in that particular revision.
<?php
foreach ($log->logentry as $logentry) {
$attributes = $logentry->attributes();
$revision = (int)$attributes['revision'];
$message = (string)$logentry->msg;
if (preg_match('/Fix #([0-9]*)/i', $message, $matches)) {
$ticket = (int)$matches[1];
$changedPaths = explode(
"\n",
shell_exec(
sprintf(
'svnlook changed -r %d %s',
$revision,
$repository
)
)
);
unset($changedPaths[count($changedPaths) - 1]);
foreach ($changedPaths as $changedPath) {
$changedPath = substr($changedPath, 4);
if (!isset($paths[$changedPath])) {
$paths[$changedPath] = array(
array(
'revision' => $revision,
'ticket' => $ticket
)
);
} else {
$paths[$changedPath][] = array(
'revision' => $revision,
'ticket' => $ticket
);
}
}
}
}
?>
For each source code file that is changed at least once as part of a bug fix, we maintain an array with the information of the respective revision and ticket number. In the end, we use uasort()
to sort that array and print a list of the source code files that were involved in a bug in descending order respective to the number of bugs.
<?php
uasort($paths, 'cmp');
foreach ($paths as $path => $data) {
printf("%4d: %s\n", count($data), $path);
}
function cmp($a, $b)
{
$a = count($a);
$b = count($b);
if ($a == $b) {
return 0;
}
return ($a > $b) ? -1 : 1;
}
?>
This entry shows you how easy it is to parse XML data with PHP in order to solve a problem that might look hard at first glance: mining a code repository for data to map past bugs to source code files. The resulting ranking of the most bug-prone source code files is a perfect base to decide which parts of your code base need more tests.
If this got you interested in quality assurance for PHP projects, you might be interested in the PHPUnit and phpUnderControl projects.