One of the benefits of having a repl inside your CI server is that you get a programmatic access to the history of your repositories. There is an entire conference dedicated to mining this history. Whether you use Perforce, svn, mercurial or, God forbid, ClearCase you can use the same API to analyze your commits.
Let's use this API to implement a Google's bug prediction algorithm.
History
First we need to get a history of commits. We can get it from a build configuration where commits were detected. And configuration can be found by its id:
(def idea (.findBuildTypeByExternalId tc/project-manager "BugPrediction_Idea"))
Once we have a build configuration, we can get all its commits:
(def changes (.getAllModifications tc/vcs-history idea))
From each commit we can get a comment, a date when commit was made and changed files, that's all we need to implement an algorithm.
bug-fix?
Let's define a predicate to distinguish bug-fixing commits from regular ones:
(defn bug-fix? [m]
(-> m
.getDescription
.toLowerCase
(.contains "fix")))
The predicate assumes that a commit fixes a bug if it contains fix
in a commit message.
We will also need two helper functions:
(defn vcs-time [m]
"Returns a time commit was made"
(.. m getVcsDate getTime))
(defn files [m]
"Returns names of the files changed by commit"
(map (memfn getFileName) (.getChanges m)))
Score
Now we can define a score function for a bug-fixing commit:
(defn score-modification
[m min-vcs-time max-vcs-time]
(let [t (vcs-time m)
normalized-time (/ (* (- t min-vcs-time) 1.0)
(- max-vcs-time min-vcs-time))]
(/ 1 (+ 1 (Math/exp (+ (* -12 normalized-time) 12))))))
It takes a modification, the time of the earliest and the latest commits in the history and returns a bug-fixing score for a modification. The greater a score, the more likely commit's files will have some bugs in the future.
Having a score function for a single commit we can write a function that takes a sequence of commits and calculates a total score for each file:
(defn score-files
"Returns a sequence of pairs [file name, its bug-fix score] for a
given sequence of modifications. Result is sorted in descendant
order and includes only files matched by pred, or all the files if
pred is not specified."
([ms] (score-files ms (fn [f] true)))
([ms pred]
(let [vcs-times (map vcs-time ms)
min-vcs-time (apply min vcs-times)
max-vcs-time (apply max vcs-times)
score-mod-files (fn [m]
(let [files (filter pred (files m))
score (score-modification m min-vcs-time max-vcs-time)]
(zipmap files (repeat score))))]
(sort-by #(% 1) >
(reduce #(merge-with + %1 %2)
(map score-mod-files (filter bug-fix? ms)))))))
Now we can get top 20 files with highest bug score, ignoring tests and non-java files:
(take 20 (score-files changes #(and (.endsWith % ".java") (not (.contains % "/test/")))))
Cool, eh? Just 5 functions and you can predict bugs!
Futher improvements
We can implement a more precise bug-fix?
predicate using an
integration with issue
trackers. It
would also be nice to mark bug-prone files in UI and to recalculate
score by schedule or every time new commit is detected.