Comments:"Introducing CommitQ | CommitQ"
URL:http://commitq.com/blog/2013/04/01/hello/
01 Apr 2013
The Familiar Workflow
So you are a programmer working with a team on a project. One day a colleague sends you a pull request or a patch asking you to review it. The patch is deemed to fix some bug, or implement a new feature. The change might be really trivial or large enough for you to spend a few hours reviewing it. Either way, you are presented with a bunch of changed files with removed and inserted lines of text colored red and green in your favorite diff tool. But those are not just plain text files. After all, you are not writing a poem. They are source code files full of declarations of classes and methods, or objects and functions, with statements and expressions. And you are parsing them in your head, trying to reverse engineer the work that your team-mate has done, to understand what really has changed, and how it might influence other parts of the project. In other words, you are doing the job of a parser and a linker; again, all in your head. Among all those colored text lines you see that the new function foo()
was created, and that its author was kind enough to document its purpose.
--- a/foo.c
+++ b/foo.c
@@ -0,0 +1,4 @@
+/** Does blah. */
+void foo(char* str) {
+ print("Hello, %s!", str);
+}
So far, so good. Then you see that the function foo()
shares some code fragments with the previous version of the function bar()
, which are now gone.
--- a/bar.c
+++ b/bar.c
@@ -2,5 +2,4 @@
void bar(char* str) {
- print("Hello, %s!", str);
print("How are you doing today?");
}
Aha, so the other programmer had split the large function definition into several smaller ones. You like that one refactoring! It only took you a little bit of time and effort to spot it, as the author has moved the code blocks across the files, so they are no more nearby. Worse yet, they are in different files now! Let’s move further. You can see a couple of changed lines of code, but it’s hard to say what the context of the change is.
--- a/blah.c
+++ b/blah.c
@@ -3,6 +3,6 @@
for (int n = 0; n < length; n++) {
- foo(i);
+ foo(n);
}
If that is a part of a function definition that was changed, then what is that function? Again, what is the context of that function in turn, and is it defined in the class Baz or somewhere else? You just don’t remember that code exactly. The context of the change is much larger than the change, so you have to resort to the full version of the file to fully recover the scope of the change.
The Challenge
Reading code is hard, and reading changes in code is even harder. But peer code review is an essential practice employed by many teams for a good reason. The tools that we use for this purpose haven’t evolved much since 70’s. It’s the same old diff familiar to many, which is already a few decades old. Is that still the state of the art? And the point of the project we are working on is to prove that it’s not. We want to retire the plain old generic-text diff and replace it with a programming-language aware semantic diff tool. That is the tool that makes code changes much more visual and comprehensible by doing the parser’s job instead of you to recover the program features that have changed, such as modules, classes, methods, objects, functions, or variables, and to compare them in turn using various similarity heuristics and differentiation algorithms. It does so by implementing a language-neutral source code model that consists of feature trees, the generic feature tree differentiation algorithm, and a bunch of language-specific parsers that build feature trees. The algorithm compares the feature trees of the old and new files to find features that were removed, created, modified, renamed, copied, split, or merged between to revisions in a repository. As a result, we get a list of feature pairs along with similarity score between two features in each pair. Then we build various UI elements to convey the changes in the code using feature pairs. We experiment with different presentation modes, and the current UI looks like this:
We also augment the traditional textual diff interface to reveal the proper context of the changes. Please visit the tour page for more details.
Currently we have implemented support for Java, Python, and ECMAScript. Obviously, more languages are on the way!
Give Me The Big Picture!
But CommitQ is more than a single algorithm. It’s a full featured Git repository hosting solution with the focus on enterprise teams. All of the functionality that you would expect from it, is already there. Create and manage Git repositories, manage people, assign fine-grained permissions, discuss code, review changes, and moderate branches—just the standard stuff.
Here are the key features, besides the semantic diff algorithm, that we want you to take note of:
- Graphical branch explorer. This is the graph that shows relationships between refs. It suppresses all the commits except those that are labeled with branches and tags. So it’s the easiest way to find what branches are ahead of each other, or to find the common ancestor.
- Branch merge monitor for proactive collaboration conflict detection. It does tentative merging of the configured branches in background every time a repository is updated to see if there are any potential merge conflicts. And if there are any, it notifies the developers early to prevent conflicts from accumulating over time when branches diverge significantly.
For more details please visit the tour page.
Can I Take a Look At It?
As someone has already said, “If you are not ashamed of your product when you launch it, you launched too late.” Right now the product is in the Minimal Valuable Product state. It has incomplete features and is full of bugs. But we are still excited to introduce it to the community in hope to get early adopters and valuable feedback, and to exchange ideas. If you are not afraid to try it, please sign-up for early access!