Top Types for Code Connections

I just released an update for Code Connections, featuring a new mode which I’ve dubbed ‘Top Types’. Top Types mode tries to calculate the most important types in an entire solution, and lays them all out for you in the Code Connections dependency graph.

The idea for Top Types mode nucleated while I was developing v1 and trying to work out the best layout algorithm to use. I wanted to understand the typical averages and ranges for various graph properties for real .NET solutions, so I wrote some rough code to calculate those properties from the in-memory dependency graph and log them in Visual Studio.

One thing that struck me was that, when I extracted those statistics for codebases I was familiar with, the classes that scored highly on certain metrics were classes that were ‘important’ in some way. Two properties in particular were interesting. One was the number of dependencies a given class had: in graph terms, the number of outbound edges in the dependency graph. The other was the number of dependents it had, or the number of inbound edges.

Classes with a lot of dependencies by definition are referencing many other types. I found that the classes that had the most dependencies were the ‘manageresque’ classes that had the broadest set of responsibilities and the biggest share of the business logic.

Meanwhile, the classes with the most dependents seemed to be the key primitives, the most widely-used building blocks in the codebase.

What if this kind of analysis could be used, I wondered, to pick out the most important classes in any codebase, in an unguided way? And with that could you build a kind of skeletal structure that would help to understand the code as a whole, to see at a glance what an application or library is doing, and what’s doing the doing?

This idea sat at the back of my mind, but it returned to the forefront when I was trying to quickly come to grips with an unfamiliar codebase, and trying to understand what was important and how different classes related. Code Connections was already useful, but I found myself wishing I had that ‘important classes’ feature. That’s a pretty good sign that’s something’s worth doing.

Building the thing

How does an algorithm identify the key classes in a codebase? I have a lot of ideas on this topic, that I freely admit to not having had the time to implement. For Top Types v1, I’ve focused on building out the foundations of the feature based on the easiest metrics to calculate.

This includes number of dependents and number of dependencies that I’ve already mentioned, since these quantities basically come for free with the dependency graph. Lines of code (LOC) was also low-hanging fruit. You can show top-ranked types by any of these metrics in isolation, and there’s also a combined option that tries to be smart about using them to guess at the overall most important types in the graph.

How does it look? Here’s top types from the Entity Framework Core codebase, using the combined score mode:

You can try out Top Types mode in Code Connections with your own code right now. I’d love to hear your feedback!

Author: David Oliver

I’m a developer, ex-physicist, and occasional game designer. I’m interested in history, society, and the sciences of human behaviour, as well as technology and programming.