I just released an update for Code Connections, featuring a new mode which I’ve dubbed ‘Top Types’. Top Types mode tries to calculate the most important types in an entire solution, and lays them all out for you in the Code Connections dependency graph.
The idea for Top Types mode nucleated while I was developing v1 and trying to work out the best layout algorithm to use. I wanted to understand the typical averages and ranges for various graph properties for real .NET solutions, so I wrote some rough code to calculate those properties from the in-memory dependency graph and log them in Visual Studio.
One thing that struck me was that, when I extracted those statistics for codebases I was familiar with, the classes that scored highly on certain metrics were classes that were ‘important’ in some way. Two properties in particular were interesting. One was the number of dependencies a given class had: in graph terms, the number of outbound edges in the dependency graph. The other was the number of dependents it had, or the number of inbound edges.
Classes with a lot of dependencies by definition are referencing many other types. I found that the classes that had the most dependencies were the ‘manageresque’ classes that had the broadest set of responsibilities and the biggest share of the business logic.
Meanwhile, the classes with the most dependents seemed to be the key primitives, the most widely-used building blocks in the codebase.
What if this kind of analysis could be used, I wondered, to pick out the most important classes in any codebase, in an unguided way? And with that could you build a kind of skeletal structure that would help to understand the code as a whole, to see at a glance what an application or library is doing, and what’s doing the doing?
This idea sat at the back of my mind, but it returned to the forefront when I was trying to quickly come to grips with an unfamiliar codebase, and trying to understand what was important and how different classes related. Code Connections was already useful, but I found myself wishing I had that ‘important classes’ feature. That’s a pretty good sign that’s something’s worth doing.
Building the thing
How does an algorithm identify the key classes in a codebase? I have a lot of ideas on this topic, that I freely admit to not having had the time to implement. For Top Types v1, I’ve focused on building out the foundations of the feature based on the easiest metrics to calculate.
This includes number of dependents and number of dependencies that I’ve already mentioned, since these quantities basically come for free with the dependency graph. Lines of code (LOC) was also low-hanging fruit. You can show top-ranked types by any of these metrics in isolation, and there’s also a combined option that tries to be smart about using them to guess at the overall most important types in the graph.
How does it look? Here’s top types from the Entity Framework Core codebase, using the combined score mode:
Code Connections started out, as these things often do, as an attempt to address a specific, personal need.
I was working on a fun side-project, an early prototype of a puzzle game. I had been happily writing code, and had a whole pile of new or modified files. I was at the point where the logic was getting a bit fiddly, and I wanted to commit my work to source control before going further so that I had a restore point.
Now, I’m personally on the perfectionist end of the spectrum when it comes to Git. Partly by aesthetic preference, partly because having a well-structured source history to refer to really has saved me a ton of time, on more than one occasion. So I was looking at 100+ added and modified files, and wondering how I was going to wrestle them into a logical sequence of commits.
That’s when I had an idea. If I could just see the formal dependency relationship between all my classes, it’d be much easier to work out which changes logically depended on others. What if I made a Visual Studio extension that used Roslyn to show me a dependency graph of all my changes?
At the very least, it sounded like a fun project, so I decided to take a shot at it.
VSIX 101
One thing I’d learnt from previous brief forays into authoring Visual Studio extensions, or VSIX packages, is that it’s surprisingly easy, at least at the beginning.
Tick the box to add the right workload, create a new project, and you’re away. Want to add a new window? There’s a helpful tutorial, there’s a wizard that adds the various bits for you, it all just works.
Oh, you wanted your VSIX to actually do something useful? That part is harder. Outside the brightly-lit basics covered in the getting-started tutorials, Visual Studio’s extension API is intimidating: vast, tersely documented, riddled with COM-isms, and layered with decades’ worth of redundant interfaces, making it hard to tell at times if any given type is obsolete or still current.
Fortunately, others have trodden the path. Most of the time there’s a StackOverflow answer, or a blog post. And there’s plenty of published extensions up on GitHub to peruse. (In fact the GitHub VS extension itself happens to be a particularly rich seam.)
Building the graph
Before we can visualize anything, we need something to visualize. The first step then is to build a model of the dependency relationships we’re interested in, which will take the form of a graph, with each type as a vertex in the graph, and a dependency of TypeA on TypeB as an edge from the TypeA vertex to the TypeB vertex.
The graph-building phase evolved considerably from my initial prototype through to its current form.
To extract any information, we need a Roslyn workspace[] for the solution that’s currently open. This is very easy to get from Visual Studio:
var componentModel = GetService(typeof(SComponentModel)) as IComponentModel;
var workspace = componentModel.GetService<VisualStudioWorkspace>();
My first ‘simplest thing that worked’ approach was the following algorithm:
start at a type (TypeA)
do a depth-first or breadth-first search to add dependencies (eg, types referenced by TypeA, types referenced by those types, etc)
repeat for other types of interest
The second step there, finding the types referenced by TypeA, is simply a matter of traversing the syntax tree (or trees) corresponding to TypeA provided by Roslyn and checking what type symbols are referenced in it.
This worked fine for my initial narrow vision which only included a fixed set of types in the graph (namely, those that had been locally modified in source control). If TypeA and TypeB were both modified, and TypeA depended directly or indirectly on TypeB, the algorithm above would pick it up.
But I quickly realised that the tool could be more broadly useful if I could say things like, ‘please visualize the connections of TypeA in both directions’, ie types referenced by TypeA and types that themselves reference TypeA. The only way to get that kind of information is to check every type in the solution. So that’s the approach I opted for. (The worst-case performance is anyways the same as for the simple algorithm, since it’s possible that one of your ‘root’ types may depend (directly or indirectly) on every other type in the solution.)
The current version of Code Connections in fact constructs two graphs. First, it builds a ‘model graph’ that contains every type in the solution (except some that may be filtered out, eg generated types, or types within manually-excluded projects) with their dependency. Second, it builds a (typically much smaller) ‘display graph’ containing only the types that will actually be visualised according to current settings.
Constructing the ‘model graph’ was fairly quick for my game prototype’s young codebase, but more time-consuming for large codebases. Initially, if the code changed anywhere, we would simply throw away that work and rebuild the whole graph from scratch.
Much better is to incrementally update the graph. If the code in a file is edited, then to a good approximation we can say that only the dependencies of the type or types defined in the file will change. So we only need to update the out-edges of the vertices for those types. I eventually succumbed to temptation and ended up implementing incremental graph updates, and it turned out really well.
(Why “to a good approximation”? There are a few cases where this isn’t true: the most significant I’m aware of being the case of code that’s initially in error. That is, say I’m editing AlreadyExists.cs and I decide I’ll need a new class, DoesntExistYet. In my AlreadyExists code, I add a call to DoesntExistYet.BrandNewMethod(). Now, subsequently, I actually create the DoesntExistYet class, and give it the BrandNewMethod() method. By so doing, I’ve now created a valid dependency relationship of AlreadyExists, without actually editing AlreadyExists.cs.)
Working with Git
Visual Studio has solid Git integrations out of the box, and I thought maybe it would expose internal Git-related APIs, but as far as I can tell it doesn’t. After looking at other Git-related VSIX packages (particularly GitDiffMargin), I opted to use LibGit2Sharp, a .NET wrapper for the libgit2 library.
It took some reorienting from my user-level mental model of Git towards libgit2’s lower-level API, but in the end it was pretty easy to do what I wanted, which for v1 was just to get a list of modified and added files.
Visualizing the graph
For visualizing the graph, I didn’t know much to begin with, other than that it was possible. I had no idea if it would be easy or hard.
Automatically generating a visually appealing 2-dimensional mapping of a set of vertices and edges is a long-standing topic of interest for computer scientists (not least in the context of producing nice figures for academic articles).
One venerable stalwart here is GraphViz, a widely-used software package which provides both a number of routines for creating mappings, in various styles (hierarchal, circular, all blobbed together, etc), and also provides a standardized text format for defining graphs (both mapped and unmapped).
I was looking for a WPF library that I could easily incorporate into the Visual Studio extension. The first thing I found in my exploratory phase was a CodeProject project, Dot2WPF, which visualises GraphViz-formatted graphs within a WPF control. It supported mouse interaction with the elements, which was one thing I was looking for. I ran the sample, and indeed it seemed to do what I needed. I thought I was set.
When I got around to actually having output I wanted to graph, however, I found that Dot2WPF is somewhat limited. The problem was that the GraphViz text output format it supports is assumed to specify the Cartesian coordinates of the vertices. In other words, Dot2WPF assumes that the layouting problem has already been solved.
One option would be to use GraphViz itself for the layouting part, but the more I looked at that option the less I liked it. GraphViz didn’t seem to be available as a library, even a native library. I found one or two .NET wrappers, but they operated on the assumption that GraphViz was already installed by the user on their system. Installing GraphViz as a separate manual step might be acceptable for my own use, but it’d certainly limit adoption if I ever ended up with something I wanted to publicly release. I didn’t like the idea.
What then? GraphViz is open source; perhaps I could port one of its layouting routines from C? I didn’t relish the idea. My enthusiasm for the whole adventure was faltering.
I was too focused on one potential solution; it was time to pull back. I read the description of ‘NEATO’, one of the more useful GraphViz routines, which notes that it’s based on a 1989 paper by Kamada and Kawai. I searched for ‘Kamada and Kawai’; the sixth Google hit was a StackOverflow question, and the fourth answer was 1-line paydirt:
GraphSharp turned out to be everything I wanted and more. Not only did it implement Kamada & Kawai’s algorithm and a number of other graph layouting strategies, but beyond that it also provided a WPF control for visualizing the results. When I tried out GraphSharp’s UI tooling, it was notably more feature-rich and polished than Dot2WPF. GraphSharp takes a slightly different approach to creating individual graph elements: where Dot2WPF used lightweight Visuals for better performance, GraphSharp used full-fat WPF controls; but the performance didn’t seem noticeably worse for large numbers of elements, and having real controls would anyway be advantageous if I were to want to customize the appearance or interactivity of the graph elements.
For defining input graphs, GraphSharp depended on QuickGraph, a library I’d already come across as a widely-used standard for general-purpose graph and network algorithms in .NET.
With GraphSharp for visualization, along with Roslyn and LibGit2Sharp, my proof-of-concept fell into place, and it wasn’t long before I could finally see all those 100+ changes in Git arranged visually by dependency relationship. It was a glorious mess, but the visual graph really did help me make sense of it. With the help of proto-Code-Connections, I organised my sprawling pile of changes into a useful commit sequence.
Connecting it all together
Finally I had a tool that scratched my immediate itch. Did I have something more than that on my hands?
I found myself working on something else and wishing I had Code Connections to help me, which was a good sign. I added a simple button that would add the current open document to the graph, so I wasn’t just restricted to looking at Git-modified files.
What would a more general-purpose tool look like?
Visual Studio already has a feature to map dependencies; I’d tried it a long time before, at one point when I had access to the Enterprise version. I loved the idea, but my recollection was that it took a long time to generate the map, the map was enormous, and by the time I had it I wasn’t quite sure what I wanted to ask it. I tried it once and then forgot about it. If I had to guess what it was for, it seemed like it was for printing out and pinning up next to the whiteboard while you had a long architectural argument about inheritance hierarchies.
So if I created a new tool it should be completely different in spirit, and I wanted something completely different anyway. I wanted something that would help me, selfishly, make sense of my code in my day to day. I spend a significant fraction of my professional life just trying to understand how everything fits together, how one class or method relates to all the others. If it could help me, maybe it’d help other people as well.
Some requirements: the tool should be fast. If I pose a question to it, I want an answer in seconds; otherwise I’ll get tired of waiting.
The information it gave you had to be tractable. In terms of building the graph, that meant pulling in more information as you needed it – show me this type, now add its connections, now show me that type – rather than dropping a ton of information on you and making you drill down to what you were interested in. This obviously would rely upon it being fast.
Those top-level priorities I had formulated even before I had a proof-of-concept. But once I had a POC, I was able to get a better feel for what they meant, and also to get a sense of what the performance characteristics were, at least on an order-of-magnitude level, and what might be doable.
The UX of the POC was quite different from the way that the released version of the tool works. In the POC, some of the graph vertices were ‘roots’, and there was an adjustable ‘depth’ value to set how much graph to show beyond the roots: depth 1 would should the first-nearest neighbours of the roots, depth 2 would show first and second-nearest neighbours, and so on. It was an artifact of my initial forays into building the data model, as much as anything.
I scrapped it, in favour of the principle of simply giving the user various options for adding and removing types from the graph. Once I had implemented incremental updates and caching, modifying the displayed elements slightly was very quick.
It was so quick, in fact, that I decided to make the leap and default to including the current open document, and its connections, in the graph at all times. It turned out to work well. One nice thing about it is that the first ever time you open Code Connections, you generally see some content in the graph straight away, rather than getting a blank window. First impressions matter!
With the active document locked to the graph, the option to ‘pin’ additional types to the graph, and the Git mode, I felt like I had enough for a public release. And the rest is history, if polish and bugfixes count as history.
The unfinished product
That is how I got to Code Connections v1. I have all sorts of ideas for features I’d like the tool to have, from the fairly straightforward to the entirely not-straightforward. The guiding vision is to make it easier to make sense of your code while you’re writing the code.
Thanks for reading about the making of Code Connections! It’s free and open source – check out the code if you’re curious, and if it sounds useful, you can install it in Visual Studio and try it out right now.
Code Connections started out as a quick tool when I was frustrated trying to understand all the changes I was making as part of a side project. As happens sometimes, it quickly grabbed my full attention, rudely shouldered the first side project out of the way, and finally has reached the point that I’m comfortable making it publicly available.