One of the benefits of having a repl inside your CI server is that you get a programmatic access to the history of your repositories. There is an entire conference dedicated to mining this history. Whether you use Perforce, svn, mercurial or, God forbid, ClearCase you can use the same API to analyze your commits.
Let's use this API to implement a Google's bug prediction algorithm.
(def idea (.findBuildTypeByExternalId tc/project-manager "BugPrediction_Idea"))
Once we have a build configuration, we can get all its commits:
(def changes (.getAllModifications tc/vcs-history idea))
Let's define a predicate to distinguish bug-fixing commits from regular ones:
(defn bug-fix? [m] (-> m .getDescription .toLowerCase (.contains "fix")))
The predicate assumes that a commit fixes a bug if it contains
in a commit message.
We will also need two helper functions:
(defn vcs-time [m] "Returns a time commit was made" (.. m getVcsDate getTime)) (defn files [m] "Returns names of the files changed by commit" (map (memfn getFileName) (.getChanges m)))
Now we can define a score function for a bug-fixing commit:
(defn score-modification [m min-vcs-time max-vcs-time] (let [t (vcs-time m) normalized-time (/ (* (- t min-vcs-time) 1.0) (- max-vcs-time min-vcs-time))] (/ 1 (+ 1 (Math/exp (+ (* -12 normalized-time) 12))))))
It takes a modification, the time of the earliest and the latest commits in the history and returns a bug-fixing score for a modification. The greater a score, the more likely commit's files will have some bugs in the future.
Having a score function for a single commit we can write a function that takes a sequence of commits and calculates a total score for each file:
(defn score-files "Returns a sequence of pairs [file name, its bug-fix score] for a given sequence of modifications. Result is sorted in descendant order and includes only files matched by pred, or all the files if pred is not specified." ([ms] (score-files ms (fn [f] true))) ([ms pred] (let [vcs-times (map vcs-time ms) min-vcs-time (apply min vcs-times) max-vcs-time (apply max vcs-times) score-mod-files (fn [m] (let [files (filter pred (files m)) score (score-modification m min-vcs-time max-vcs-time)] (zipmap files (repeat score))))] (sort-by #(% 1) > (reduce #(merge-with + %1 %2) (map score-mod-files (filter bug-fix? ms)))))))
Now we can get top 20 files with highest bug score, ignoring tests and non-java files:
(take 20 (score-files changes #(and (.endsWith % ".java") (not (.contains % "/test/")))))
Cool, eh? Just 5 functions and you can predict bugs!
We can implement a more precise
bug-fix? predicate using an
integration with issue
would also be nice to mark bug-prone files in UI and to recalculate
score by schedule or every time new commit is detected.