Bad Code Considered Harmful: Test Your Public API

The thing that I've encountered a software developer which people are least likely to do that they should do is write good tests.

I've worked on several code bases that had no tests when I started working in them, and the ones that had tests usually had bad tests. Bad tests differ from good tests in many ways, but the one that I want to focus on today is the layer of abstraction being tested.

Bad tests test implementation details. Good tests test your API.

The implementation details should be expected to change and evolve constantly throughout the lifecycle of a project. By the time a project is delivered, the API should be stable. Good unit tests assert that stable things remain stable. Bad unit tests assert that nothing is changing.

Your public API is the part of you code that you expect users to use invoked the way you expect user to invoke it. If your code is a script, your API is the shell command to run that script. You test it by running the script. If your code is a web service, your API is the collection of webhooks that your service provides. And you test them by making HTTP requests against them. You spin up the service in the setup of your unit tests and you shut down the service in the tear down for your unit tests.

I do most of my coding in python, and I am use it for both the service and the clients, if I am writing a web service. But in principle there is no reason this can't be done for everything. A few languages are much better suited for one piece of this than another. But you can still test them the same way. As discussed above, you can shell out as part of running unit tests. As such, there's no reason that you can't write a service in one language and test it using a different language.

Isn't this integration testing?

No, it's not. Integration testing is testing that all of your code works together the way that it's supposed to work together and that it properly integrates with third party APIs. What makes it unit testing is that you mock out everything that isn't specifically what that API is doing to ensure that each individual component of your API works as intended when isolated.

But, but, (someone says), you can't mock out the things you need to mock out and properly isolate your tests if you shell out to it as a script.

Yes, you can.

It's not hard to let your scripts mock out what they need to mock out. The can accept an argument called --mock-context that takes two additional args, one to specify the module from which the mock context is imported, and a second to specify the name of the context manager in that mock context that you are using to isolate this test.

Mocking is beautifully supported in Python. I know other languages don't support it quite as well, but the concept still exists everywhere. It's just harder to use in some languages than others. Unit testing is testing where you've isolated each individual thing you are delivering and are testing it in isolation; in Python, the primary things that separate unit testing from integration testing is that you use mock extensively in unit testing, and unit tests run quickly. Integration tests run something end-to-end and make sure that your code continues to integrate with the rest of your stack correctly. Integration tests end up having to run for however long they have to run, which is often a long time.

But, but, (someone says), it's bad practice to insert things into your public API that only exist for testing.

I disagree, but that's a topic for another day. However, you don't have to add support for these command line arguments part to your public API.

You can define the script as a class and subclass it, so that it accepts the one additional argument it needs. Then you can point explicitly to that script. Unit tests are still supposed to be testing individual features in isolation, so a unit test that is testing that the scripts properly get installed should just be testing installation. As long as you aren't implicitly testing installation as part of the script, you should be able to subclass it to take whatever additional testing-specific arguments it needs.

(Again this is easy to do in Python. It's harder in a lot of other languages, but it's still doable in most languages I've used.)

I don't consider this ideal, but it is good enough. (I like having the ability to mock on the fly. I think it makes it easy to test all sort of things. The --mock-context you pass in as I described it doesn't have to be part of the given library. It can be used to test subbing out one backend for another pretty easily. Not a good long term solution, but mock is also useful for testing out proof of concepts, not just for unit testing. Suddenly your script is a lot more flexible than it was originally intended to be. Is v2 supposed to be backwards compatible? Don't update the client code just yet. Just spin up v2 and run the script that runs client code mocked out to hit the v2 API instead of v1.)

By the way, a unit test of installation of a script ensures that it can output its version when asked to do so, and that the version it outputs is the one that was just installed. A side effect of good unit testing is that you end up writing scripts correctly.

The help message that your script outputs should almost certainly be considered part of its public API.

That said, you should only be testing that the stable portion of the API remains stable. Since you write your scripts as well-factored classes, you often add features through mixins that the scripts you actually use inherit from. You don't want to have to update every test that tests the help message of every script that inherits from it every time you add a new feature to one of these mixins.

Your unit tests should not ensure that no new features are added, only that no existing features are broken or deleted. Testing your help message should ensure that all of the lines that you expect to see in the help message are printed in the help message, and whenever order matters, they are printed in the appropriate order.

Since all good code is self-documenting, your web service auto-generates its own documentation just like your scripts generate their own help text, and you test this too.

If your public API is just a class, then the self-documentation is the code itself because the only people who need to be able to read the documentation already know how to program in your programming language, so they don't need the behavior translated. Where necessary, you still have comments in the code to help other programmers understand how to use these classes. The comments typically point them towards your unit tests where you are using them to assert that they behave as expected.

I've mostly been focused on testing the public API as a form of ensuring that the stable part of your code remains stable, but there are other advantages.

It expands your coverage to actually test everything that's important. Bugs anywhere can be catastrophic. Bugs in an argument parser can be catastrophic. I've seen a lot of projects where the different between a script having major side effects, and having the script just print output without doing anything was entirely determined by whether or not someone passed in an argument named --dry, or something similar. I've seen these scripts used to test scenarios that were not real and that under no circumstance should have been treated as real. What happens if everything in the code works exactly as intended except there's a typo in the argument parser. Do you know the difference between how python's argparse treats action="store_true" and action="store-true"? I don't know if this is still the case, but "store-true" used to just be ignored. It didn't raise an error telling you it was an invalid action. It just was silently ignored. You can do a lot of damage very quickly if the only bug in your code is that hyphen. I've also written bugs where I got confused and made an argument "store_true" when I really wanted it to be "store_false" or vice versa. This can all be catastrophic, and it's not necessarily the case that these things would get caught. The existence of a script to expose functionality is sometimes an afterthought. Sometimes, the code is thoroughly tested in its integration with something else, and the script is just pieced onto it. (This might sound like I've been burned, and I'm trying to make excuses. The opposite is actually the case. I've never caused a catastrophic failure. I have from time to time said, "Oh, thank goodness I wrote that test," or "Thank goodness I decided that I needed to check one more thing before pushing this commit," because I was working in a codebase with woefully inadequate test coverage.)

There are still more benefits to good testing. If you test at the right abstraction layer, you are ensuring that the code you deliver is usable. If you write automated tests to hit each part of your API and test the functionality of each thing in isolation, you are eating your own dog food. It's one of the easiest ways to eat your own dog food, and one of the ways that gives you the most immediate returns. If you find the tests frustrating to write, your users will find the API frustrating to use.

I could give several more reasons, but I've covered the major ones, and I think this is enough reasons to say that unit tests should be testing your public API.

Bad Code Considered Harmful

Saturday, May 23, 2020

Test Your Public API

No comments:

Post a Comment