Making Code Connections

Code Connections started out, as these things often do, as an attempt to address a specific, personal need.

I was working on a fun side-project, an early prototype of a puzzle game. I had been happily writing code, and had a whole pile of new or modified files. I was at the point where the logic was getting a bit fiddly, and I wanted to commit my work to source control before going further so that I had a restore point.

Now, I’m personally on the perfectionist end of the spectrum when it comes to Git. Partly by aesthetic preference, partly because having a well-structured source history to refer to really has saved me a ton of time, on more than one occasion. So I was looking at 100+ added and modified files, and wondering how I was going to wrestle them into a logical sequence of commits.

That’s when I had an idea. If I could just see the formal dependency relationship between all my classes, it’d be much easier to work out which changes logically depended on others. What if I made a Visual Studio extension that used Roslyn to show me a dependency graph of all my changes?

At the very least, it sounded like a fun project, so I decided to take a shot at it.

Populating a dependency graph in Code Connections.

VSIX 101

One thing I’d learnt from previous brief forays into authoring Visual Studio extensions, or VSIX packages, is that it’s surprisingly easy, at least at the beginning.

Tick the box to add the right workload, create a new project, and you’re away. Want to add a new window? There’s a helpful tutorial, there’s a wizard that adds the various bits for you, it all just works.

Oh, you wanted your VSIX to actually do something useful? That part is harder. Outside the brightly-lit basics covered in the getting-started tutorials, Visual Studio’s extension API is intimidating: vast, tersely documented, riddled with COM-isms, and layered with decades’ worth of redundant interfaces, making it hard to tell at times if any given type is obsolete or still current.

Fortunately, others have trodden the path. Most of the time there’s a StackOverflow answer, or a blog post. And there’s plenty of published extensions up on GitHub to peruse. (In fact the GitHub VS extension itself happens to be a particularly rich seam.)

Building the graph

Before we can visualize anything, we need something to visualize. The first step then is to build a model of the dependency relationships we’re interested in, which will take the form of a graph, with each type as a vertex in the graph, and a dependency of TypeA on TypeB as an edge from the TypeA vertex to the TypeB vertex.

The graph-building phase evolved considerably from my initial prototype through to its current form.

To extract any information, we need a Roslyn workspace[] for the solution that’s currently open. This is very easy to get from Visual Studio:

var componentModel = GetService(typeof(SComponentModel)) as IComponentModel;
var workspace = componentModel.GetService<VisualStudioWorkspace>();

My first ‘simplest thing that worked’ approach was the following algorithm:

  • start at a type (TypeA)
  • do a depth-first or breadth-first search to add dependencies (eg, types referenced by TypeA, types referenced by those types, etc)
  • repeat for other types of interest

The second step there, finding the types referenced by TypeA, is simply a matter of traversing the syntax tree (or trees) corresponding to TypeA provided by Roslyn and checking what type symbols are referenced in it.

This worked fine for my initial narrow vision which only included a fixed set of types in the graph (namely, those that had been locally modified in source control). If TypeA and TypeB were both modified, and TypeA depended directly or indirectly on TypeB, the algorithm above would pick it up.

But I quickly realised that the tool could be more broadly useful if I could say things like, ‘please visualize the connections of TypeA in both directions’, ie types referenced by TypeA and types that themselves reference TypeA. The only way to get that kind of information is to check every type in the solution. So that’s the approach I opted for. (The worst-case performance is anyways the same as for the simple algorithm, since it’s possible that one of your ‘root’ types may depend (directly or indirectly) on every other type in the solution.)

The current version of Code Connections in fact constructs two graphs. First, it builds a ‘model graph’ that contains every type in the solution (except some that may be filtered out, eg generated types, or types within manually-excluded projects) with their dependency. Second, it builds a (typically much smaller) ‘display graph’ containing only the types that will actually be visualised according to current settings.

Constructing the ‘model graph’ was fairly quick for my game prototype’s young codebase, but more time-consuming for large codebases. Initially, if the code changed anywhere, we would simply throw away that work and rebuild the whole graph from scratch.

Much better is to incrementally update the graph. If the code in a file is edited, then to a good approximation we can say that only the dependencies of the type or types defined in the file will change. So we only need to update the out-edges of the vertices for those types. I eventually succumbed to temptation and ended up implementing incremental graph updates, and it turned out really well.

(Why “to a good approximation”? There are a few cases where this isn’t true: the most significant I’m aware of being the case of code that’s initially in error. That is, say I’m editing AlreadyExists.cs and I decide I’ll need a new class, DoesntExistYet. In my AlreadyExists code, I add a call to DoesntExistYet.BrandNewMethod(). Now, subsequently, I actually create the DoesntExistYet class, and give it the BrandNewMethod() method. By so doing, I’ve now created a valid dependency relationship of AlreadyExists, without actually editing AlreadyExists.cs.)

Working with Git

Visual Studio has solid Git integrations out of the box, and I thought maybe it would expose internal Git-related APIs, but as far as I can tell it doesn’t. After looking at other Git-related VSIX packages (particularly GitDiffMargin), I opted to use LibGit2Sharp, a .NET wrapper for the libgit2 library.

It took some reorienting from my user-level mental model of Git towards libgit2’s lower-level API, but in the end it was pretty easy to do what I wanted, which for v1 was just to get a list of modified and added files.

Visualizing the graph

For visualizing the graph, I didn’t know much to begin with, other than that it was possible. I had no idea if it would be easy or hard.

Automatically generating a visually appealing 2-dimensional mapping of a set of vertices and edges is a long-standing topic of interest for computer scientists (not least in the context of producing nice figures for academic articles).

One venerable stalwart here is GraphViz, a  widely-used software package which provides both a number of routines for creating mappings, in various styles (hierarchal, circular, all blobbed together, etc), and also provides a standardized text format for defining graphs (both mapped and unmapped).

I was looking for a WPF library that I could easily incorporate into the Visual Studio extension. The first thing I found in my exploratory phase was a CodeProject project, Dot2WPF, which visualises GraphViz-formatted graphs within a WPF control. It supported mouse interaction with the elements, which was one thing I was looking for. I ran the sample, and indeed it seemed to do what I needed. I thought I was set.

When I got around to actually having output I wanted to graph, however, I found that Dot2WPF is somewhat limited. The problem was that the GraphViz text output format it supports is assumed to specify the Cartesian coordinates of the vertices. In other words, Dot2WPF assumes that the layouting problem has already been solved.

One option would be to use GraphViz itself for the layouting part, but the more I looked at that option the less I liked it. GraphViz didn’t seem to be available as a library, even a native library. I found one or two .NET wrappers, but they operated on the assumption that GraphViz was already installed by the user on their system. Installing GraphViz as a separate manual step might be acceptable for my own use, but it’d certainly limit adoption if I ever ended up with something I wanted to publicly release. I didn’t like the idea.

What then? GraphViz is open source; perhaps I could port one of its layouting routines from C? I didn’t relish the idea. My enthusiasm for the whole adventure was faltering.

I was too focused on one potential solution; it was time to pull back. I read the description of ‘NEATO’, one of the more useful GraphViz routines, which notes that it’s based on a 1989 paper by Kamada and Kawai. I searched for ‘Kamada and Kawai’; the sixth Google hit was a StackOverflow question, and the fourth answer was 1-line paydirt:

There is also http://graphsharp.codeplex.com which provides a number of layout algorithms for C#.

https://stackoverflow.com/a/22942336/1902058

GraphSharp turned out to be everything I wanted and more. Not only did it implement Kamada & Kawai’s algorithm and a number of other graph layouting strategies, but beyond that it also provided a WPF control for visualizing the results. When I tried out GraphSharp’s UI tooling, it was notably more feature-rich and polished than Dot2WPF. GraphSharp takes a slightly different approach to creating individual graph elements: where Dot2WPF used lightweight Visuals for better performance, GraphSharp used full-fat WPF controls; but the performance didn’t seem noticeably worse for large numbers of elements, and having real controls would anyway be advantageous if I were to want to customize the appearance or interactivity of the graph elements.

For defining input graphs, GraphSharp depended on QuickGraph, a library I’d already come across as a widely-used standard for general-purpose graph and network algorithms in .NET.

With GraphSharp for visualization, along with Roslyn and LibGit2Sharp, my proof-of-concept fell into place, and it wasn’t long before I could finally see all those 100+ changes in Git arranged visually by dependency relationship. It was a glorious mess, but the visual graph really did help me make sense of it. With the help of proto-Code-Connections, I organised my sprawling pile of changes into a useful commit sequence.

Connecting it all together

Finally I had a tool that scratched my immediate itch. Did I have something more than that on my hands?

I found myself working on something else and wishing I had Code Connections to help me, which was a good sign. I added a simple button that would add the current open document to the graph, so I wasn’t just restricted to looking at Git-modified files.

What would a more general-purpose tool look like?

Visual Studio already has a feature to map dependencies; I’d tried it a long time before, at one point when I had access to the Enterprise version. I loved the idea, but my recollection was that it took a long time to generate the map, the map was enormous, and by the time I had it I wasn’t quite sure what I wanted to ask it. I tried it once and then forgot about it. If I had to guess what it was for, it seemed like it was for printing out and pinning up next to the whiteboard while you had a long architectural argument about inheritance hierarchies.

So if I created a new tool it should be completely different in spirit, and I wanted something completely different anyway. I wanted something that would help me, selfishly, make sense of my code in my day to day. I spend a significant fraction of my professional life just trying to understand how everything fits together, how one class or method relates to all the others. If it could help me, maybe it’d help other people as well.

Some requirements: the tool should be fast. If I pose a question to it, I want an answer in seconds;  otherwise I’ll get tired of waiting.

The information it gave you had to be tractable. In terms of building the graph, that meant pulling in more information as you needed it – show me this type, now add its connections, now show me that type – rather than dropping a ton of information on you and making you drill down to what you were interested in. This obviously would rely upon it being fast.

Those top-level priorities I had formulated even before I had a proof-of-concept. But once I had a POC, I was able to get a better feel for what they meant, and also to get a sense of what the performance characteristics were, at least on an order-of-magnitude level, and what might be doable.

The UX of the POC was quite different from the way that the released version of the tool works. In the POC, some of the graph vertices were ‘roots’, and there was an adjustable ‘depth’ value to set how much graph to show beyond the roots: depth 1 would should the first-nearest neighbours of the roots, depth 2 would show first and second-nearest neighbours, and so on. It was an artifact of my initial forays into building the data model, as much as anything.

I scrapped it, in favour of the principle of simply giving the user various options for adding and removing types from the graph. Once I had implemented incremental updates and caching, modifying the displayed elements slightly was very quick.

It was so quick, in fact, that I decided to make the leap and default to including the current open document, and its connections, in the graph at all times. It turned out to work well. One nice thing about it is that the first ever time you open Code Connections, you generally see some content in the graph straight away, rather than getting a blank window. First impressions matter!

With the active document locked to the graph, the option to ‘pin’ additional types to the graph, and the Git mode, I felt like I had enough for a public release. And the rest is history, if polish and bugfixes count as history.

The unfinished product

That is how I got to Code Connections v1. I have all sorts of ideas for features I’d like the tool to have, from the fairly straightforward to the entirely not-straightforward. The guiding vision is to make it easier to make sense of your code while you’re writing the code.

Thanks for reading about the making of Code Connections! It’s free and open source – check out the code if you’re curious, and if it sounds useful, you can install it in Visual Studio and try it out right now.

View/model separation in Unity using events

In the last post I discussed the Model-View-Controller pattern and sketched a Unity-specific implementation for it. In this post I want to dive into the details. In particular I want to talk about the ‘observer’ pattern and the subtleties of using C# events. In a subsequent post I’ll cover how reactive extensions via the UniRx library can make your life easier.

Did you change yet? Did you change yet? Did you change yet?

One of the conceptual challenges of Model-View-Controller separation or any layer-based architecture, particularly for newcomers to programming, is how to propagate information from layer A to layer B without A’s code referring to B’s code.

It’s tempting, with Unity’s model of explicit update loops, to simply check every ‘tick’ in an Update() method if some property in the game model changes. But we want to avoid this – it’s performance unfriendly when you have large numbers of objects, and moreover it’s just ugly.

One answer is the Observer pattern. In essence, a subject type provides a contract allowing any interested observers to say ‘notify me whenever such-and-such happens.’ In our case, the game model is the subject and the controller layer is the observer.

C# implements the Observer pattern as a first-class language feature via ‘event’ declarations. For example:

public class Terrain
    {
        private int _terrainElevation;
        public int TerrainElevation
        {
            get { return _terrainElevation; }
            set
            {
                var elevationHasChanged = value != _terrainElevation;
                _terrainElevation = value;
                if (elevationHasChanged && ElevationChanged != null)
                {
                    ElevationChanged(value);
                }
            }
        }
        public event Action<int> ElevationChanged;
    }

    public class TerrainController : MonoBehaviour
    {
        private Terrain _terrain;

        public TerrainController(Terrain terrain)
        {
            _terrain = terrain;
            _terrain.ElevationChanged += OnElevationChanged;
        }

        private void OnElevationChanged(int newElevation)
        {
            UpdateGameObjectPositionForNewElevation(newElevation);
        }
    }

Now TerrainController will be notified whenever the TerrainElevation property changes, and it can adjust its view accordingly.

The beauty is that anybody can subscribe to the ‘ElevationChanged’ event as long as they have a Terrain object. This achieves the layer separation that we talked about: the Terrain object in the game model layer doesn’t ‘know about’ the TerrainController in the controller layer. This makes development much easier: when you change your game logic, you just change the code pertaining to game logic – you don’t have to make a bunch of changes to your display code.

Once in a lifetime

So we’re golden, right?

Not quite; there’s one essential consideration we’re neglecting, which is lifetime management.

When you subscribe to the event on Terrain, you effectively pass a reference to TerrainController, so that its callback can be called. Now, remembering that Terrain is a Component attached to a Unity GameObject, what happens if you unload the scene it’s in – perhaps because you’re navigating back to the menu screen, say? You’re expecting TerrainController to be destroyed, but Terrain (which is sitting in your model layer, not going anywhere) still has a reference to it. We’ve created a memory leak.

The good news: there’s an easy fix. We simply unsubscribe from the event:

    public class TerrainController : MonoBehaviour
    {
        …

        private void OnDestroy() {
            _terrain.ElevationChanged -= OnElevationChanged;
        }
    }

Voila, leak patched, crisis averted.

The bad news? Well, if you have more such event subscriptions, or if deciding when they should be unsubscribed is a bit more complicated, it rapidly becomes easy to accidentally introduce bugs. Even writing the example just now, I almost forgot to change the + to a – when I copy-pasted.

In the next post, I’ll discuss a nifty toolset for making lifetime management less error-prone.

Finding the energy

Why is the world’s wealth distributed as it is?

At the end of the last post, looking at Gapminder, we took note that the rich countries of the world could be divided up into roughly four groups. There are the states of western Europe; a few former English colonies, most notably the US; the oil-rich countries of the Persian Gulf; and finally the highly-industrialised countries of east Asia.

How did it come to be that way? Well, how long has it been that way? Continue reading “Finding the energy”

For richer or poorer

I want to write about a topic I don’t fully understand. In my defense, it seems like nobody else does, either. But the posts to follow will be as much about documenting my own learning process as about sharing what I already know.

Why are some countries rich, and other countries poor? That’s the question I’m interested in. How did that state of affairs come about? Why does it persist? Continue reading “For richer or poorer”