This article is in response to one posted a few days ago by Cindy Sridharan, in which she makes some really well-reasoned arguments about small functions. You should check it out, it’s a good read and does a good job laying out the frustrations she’s encountered working with code written by developers who value small functions.
That said, I want to take some time to respond to some of the concerns she raised because, well, I disagree with them and I think they can lead to some real problems in one’s code. And it’s not the first time I’ve heard most of them, so I think it may be useful to take some time and work through my own thoughts on the issue.
A note about the title: I don’t like the “x considered harmful” posts, since they imply a broad industry agreement where, usually, one doesn’t exist. Programmers are a highly opinionated bunch and it’s pretty rare to find an issue where there really is broad agreement. That said, on the issue of preferring small functions, I can point to many oft-cited works that are in agreement — long functions are a code smell.
So, let’s take a look at the concerns Cindy raised about a preference for small functions.
Addressing concerns about small functions
The author raises several points about small functions, and how she thinks they can be abused. These concerns are, I think, relatively common whenever the size and complexity of functions is debated, so let’s walk through them.
I do want to point out that these concerns are held by real people, who’ve worked in real code bases and experienced real frustration. I don’t want to minimize that, my hope here is that we can redirect that frustration in a way that results in higher quality code.
“Do one thing” is fuzzy and sometimes taken too far
What, exactly, is the one thing that a given function should do? That’s often not an easy question to answer. How far is too far when trying to reach that one thing? The author gives us an example that she seems to indicate would be best expressed as a single function:
For instance, in a web application, a CRUD operation like “create user” can be “one thing”. Typically, at the very least, creating a user entails creating a record in the database (and handling any concomitant errors). Additionally, creating a user might also require sending them a welcome email. Furthermore, one might also want to trigger a custom event to a message broker like Kafka to feed this event into various other systems.
This is actually a really good example of something that should be broken out into different functions. It’s doing several things, it’s operating at different levels of abstraction, and (perhaps worst of all) it has side-effects the caller is unlikely to anticipate. If you call something like
User.create would you expect that it would send an email? That’s a recipe for disaster.
Let’s walk through this example to see why this would be a problematic way to design our code.
class User < TotallyNotActiveRecord # ...Other functions on the User model... def create(data=nil) # Create the user record in the database super(data) # Send a welcome email WelcomeEmail.send(email) if email.present? # Send a message kafka = Kafka.new(...) producer = kafka.producer producer.produce(email, topic: "user-created") producer.deliver_messages end end
The funny thing is this method isn’t actually that long. It’s only six real lines of code. I probably wouldn’t really think of its length as a smell if I found it in a pull request. The smell isn’t just its size, it’s all the various things its doing and how misleading the message name is.
Now, imagine this method were long enough that it was smelly by size alone. Think about how much more it would be doing with ten lines or twenty lines. How about fifty or a hundred? The definition of “big” might vary depending on language, but the truth is nearly unavoidable: a method is highly unlikely to be both big and well-designed.
In only six lines we’re performing at least three seperate functions, we’re operating at different levels of abstraction, and OH the dependencies —three classes and six method signatures! In just six lines we’ve written something that can be broken by a multitude of unrelated code.
This is a problem with big functions: they are inherently fragile. Would you expect that by making a change in the
WelcomeEmail class you’d potentially break your
User model? I think we’ve all been there, and it’s not a place we typically like to visit.
Finally, this method is very challenging to test. In order to write a simple unit test we either have to mock all of these dependencies, write an integration test instead, or just give up out of frustration. Frustration when writing tests is a huge, glowing red flag: it’s telling you, “Hey! Stop! There’s something wrong with the code you’re trying to test.”
Instead, many developers will just get fed up with testing because it doesn’t accommodate their god methods or because they don’t feel like they have time to refactor code that, in all likelihood, they didn’t write. Then you end up with the worst of all worlds: an app full of huge, untested methods. Things break when you don’t expect them to and you don’t even know it until it’s in production because your test coverage is in the garbage.
You need to make a small change to support some trivial feature, but you open up the code and find it’s buried in a two hundred line behemoth. You now have to read, understand, and load the entire thing into your mental memory before you can make your trivial change. Even then you have no confidence that your change won’t break something unrelated.
so what many of them end up doing is not stop until they've made it fully DRY and modular – which never ends very well
— Cindy Sridharan (@copyconstruct) August 11, 2017
DRY and modular? The horror!
DRY-driven functions can lead to the wrong abstraction
This is a fair concern. In the relentless pursuit of not repeating anything, we pull code out into the wrong abstraction. That practice is worse than the original duplication.
We don’t really disagree here. You shouldn’t abstract code in the wrong way just to make a method shorter. But a wrong abstraction usually takes the form of a class, not just a method. Classes form the nouns, the ideas or concepts that we’re trying to represent. These concepts are what we’re abstracting. Your methods should be the verbs, the simple actions that those nouns can perform.
Creating the wrong class can be painful to refactor later on, but there’s very little danger in pulling out some code into a private method in the same class. In fact, refactoring by extracting methods out of large methods can reveal the need for additional refactorings — such as the data clump code smell which may indicate a new object would be useful.
Long names are hard to read and understand
What could’ve been, say, 4–5 lines of code is now stashed away in a function that bears an extremely long name. When I’m reading code, seeing such a verbose word pop up suddenly gives me pause as I try to process all the different syllables in this function’s name, try to fit it into the mental model I’ve been building thus far and then decide whether or not to investigate the function in greater detail by jumping to its definition and reading the implementation.
Naming things is hard. Mostly because coming up with a word or two to represent the concept that you’re looking at requires that we back up from the context we’ve been working in. If you abstract a function and it really has an absurdly long name there’s a good chance some additional refactoring is needed. Perhaps a new object should be abstracted to represent the concept, for instance.
I will say that I’ve heard this argument before, and it’s usually related to the “comments are/aren’t a code smell” debate. Opponents of the comments as code smells view will sometimes point to the edge case where a function name would become very long in absence of a comment. The problem is that it’s just that, an edge case. The main case is that code blocks can be pulled into methods with perfectly reasonable names.
Loss of locality
Is it better to shimmy new functionality into existing methods than writing new ones to handle these cases? It’s something of a trick question: I would submit that neither approach is really ideal. You should be able to change the behavior of existing code by extending it (with new objects) rather than modifying it with new functions that change the interface of an existing object (Open/Closed Principle).
The article rightly points out the flaw in adding new methods for every new use case, but (in my view) wrongly concludes that the solution to that problem is to just extend existing functions. The clean solution is likely to involve new classes, with their own interfaces developed in light of new requirements.
On this note, I’ve seen this objection raised before:
Small functions work best when we don’t have to jump across file or package boundaries to find the definition of a function.
All modern editors allow you to quickly jump to the definition of a function with a shortcut. Any reasonably complex code that’s well designed will require you to navigate multiple files. SOLID virtually ensures this. Don’t constrain the design of your system because a better design would mean more files.
Why small functions are awesome
Easy to understand, resilient to change
What’s easier to change: a tiny method that’s very simple to understand or a large method that requires several minutes of careful thought to grasp? Not only does the latter require vastly more mental effort to understand, it requires you to keep that knowledge loaded into your mind while you work in the domain of the problem at hand. Interruptions suddenly become much more expensive, forcing you to start over from the beginning. If the method is large enough it’s likely to become so daunting to fully understand that no one wants to touch it. It becomes the part of the app we just don’t touch anymore because every time we do it causes a cascade of unintended effects.
Small methods, on the other hand, offer little resistance to change. They can be quickly understood and changed without fear that such changes will have unintended consequences. They’re fast to load into our minds and require minimal effort to process. Interruptions, though still not without cost, are no longer expensive. Lastly, we’re unlikely to be afraid to change a function that’s only a couple lines long. We feel assured in our changes, confident we didn’t just do something we didn’t intend to do.
For these reasons, systems architected with small functions tend to be easier to change, which is a critical concern when building software.
Have you ever tried to write a unit test for a huge method? I think most of us have at some point. Sometimes we write ourselves into a corner thinking, “I’ll write a test at the end and refactor.” Only, by the time we’re at that point our method has grown in size so much that a simple test seems daunting.
Other times we go boldly into some existing mess, optimistically believing we can break up that massive function. But first, we rightly want to add test coverage so that we know we haven’t broken anything. Soon we realize that our quick unit tests are a mess of dependency mocks and an endless series of tests trying desperately to cover every branch of every conditional.
Both of these cases often end in us giving up, and accepting that our huge method is now a huge, untested method. It’s become a beast we can no longer tame.
If large methods become beasts, small methods are like faithful golden retrievers: happily doing the simple tasks we ask them to and never turning on us. We can easily test that one-liner, and a couple asserts to cover a five-line method is easy. Our method is now safe to change, assured in the knowledge that we can prove it works.
Ever reached out to a method that indicated one thing in its name, but did something totally different or had some far-reaching affect you couldn’t possibly have anticipated? We saw an example at the beginning when our
User.create method sent an email. Imagine you triggered a process to create a bunch of users and ended up sending thousands of emails.
Side-effects tend to congregate in large functions. After all, if your function is large you’re very likely doing more than one thing. Does the method name reflect that? Probably not, right? Cindy makes that the point that she doesn’t really like long names. She’s not alone, we want out names to be as concise as we can make them. But we also need them to reflect what the method is doing. An accurate name for a huge method becomes impossibly long, so we end up instead giving huge methods short names that completely fail to indicate what they actually do.
Small methods are the complete opposite. When a method is small, it can have a short name and still be accurate and descriptive.
Large functions are not SOLID
Programmers who consistently write large functions are writing code that is, by nature, not SOLID.
For those unfamiliar with SOLID, it is an acronym for a set of object-oriented best practices that, if followed, tend to make code easier to understand, change, and maintain. If you’re writing a big function, especially in a highly expressive language like Ruby, you’re almost certainly violating at least four of the principles.
Single Responsibility Principle. A unit of code should have one reason to change. In our simple example above, the
User.create method has at least three reasons to change. Each additional responsibility means the code is likely to change that much more often and is that much more likely to break when it does. It also carries the cognitive load of figuring out what irrelevant bits of logic around email and messaging are doing when all you care about is creating a database record.
Open/Closed Principle. Code should be open for extension but closed for modification. You should be able to make your code do new things without changing existing code. Whole design patterns exist that allow you to extend software simply by writing new classes (template method, strategy, bridge, etc.). Highly coupled, procedural code in a big function can’t comply with the O/CP since you have to muck around in its guts if you want it to do anything new. Often this will mean jumping into an enormous case statement and adding yet another branch to the conditional.
Interface Segregation Principle. Prefer many small, client-specific interfaces to one giant one. A function that’s very large is likely to be playing many roles, doing different things depending on the input it receives. I’ve found long methods tend to go hand-in-hand with another code smell: large classes. So, you end up with a single, large class interface instead of smaller, more specialized ones.
Dependency Inversion Principle. Higher level, abstract classes should not depend on lower level, concrete implementations. Large methods tend to simply not consider dependencies at all: they are often littered with inline references to far-flung objects at higher and lower levels of abstraction. Having a practice of preferring small functions builds habits such as isolating dependencies into their own private methods or wrapper classes, or using dependency injection to remove the design-time coupling altogether.
Can you write a big function without running afoul of some of these? Sure. You can write a huge method that carefully manages dependencies or a gigantic function that only has one reason to change. It’s just not very likely.
Remember, long methods are a code smell — that only means there may be a problem. But it is a big smell, and it’s proportional to the scale of the violation. A twenty line method might be able to skate by, but a two hundred line method has almost no chance of being SOLID.
Small functions are awesome! Yes, you can go overboard. Don’t make your function harder to read just to knock a line off. But, generally, we as developers struggle much more with writing methods that are far too large than we do at the other end of the spectrum.
Given the choice between a code base full of god objects with huge methods, and one full of small functions in small, singularly responsible classes, I’d take the latter ever time.
(Originally appeared on Medium.)