Testing Theory

it is a bigger space than you think

Apr 03, 2023

Testing is an important part of the software development process. Companies spend huge sums of money to find bugs before feature releases. Then they spend more responding to the bugs that do get released. There are entire career tracks devoted to testing.

But it is rarely part of a software engineer's formal education. Now, you did ad-hoc testing on your assignments. Maybe you took a course with a lecture on unit testing and you wrote some JUnit tests in your projects. Maybe you took a graduate level Semantics course and created formal proofs of correctness. These are all types of testing but are only small parts of a much larger picture.

Let's consider the field of software testing in a few different ways. Your projects will require varied approaches and a broad overview is a great place to start.

Dall-E: An abstract painting of a software bug

Goal of Testing

I'll phrase the goal of testing this way:

Find as many flaws as efficiently as possible

Let's break this down word by word.

Find

Testing is about discovery. What is the issue? How do we get into that state? How bad is it for users? There are issues that we will never fix, often due to low priority or high complexity. And that is OK! The role of testing is to find.

as many

In a complex system, you cannot find all the flaws. It is impossible to test all the state combinations your users might encounter. Different devices, browsers, operating systems, network configurations, accessibility settings, traffic shapes, languages, data flows, and more! And even if you did have infinite time to test, you would be unable to prove that you have found all the flaws. There is always something else.

flaws

I use the word "flaws" instead of "bugs" quite intentionally. The feature might work exactly as spec'd, but still be flawed. Design issues, accessibility issues, scaling difficulties, and other flaws might emerge due to either unexpected conditions or incorrect assumptions. We'd like our testing to discover those flaws also.

as efficiently as possible

There is a saying: "The perfect is the enemy of the good." Your team needs to release to its customers. You don't want to release a broken, terrible experience. You also don't want to delay for years while you polish every little corner. There are tradeoffs to consider here. It is preferable to leave a flaw unfound rather than spend an exorbitant amount of time testing.

How can we manage this tradeoff between finding the flaws and using our time well?

Equivalence Classes

One of the most important approaches for improving testing efficiency is to form equivalence classes. In this approach you split your potential test cases into classes (categories) that are functionally equivalent. You then assume that if a test passes for one case, then it will pass for all cases in the same class. This transforms an unbounded set of test cases into a small finite set.

Consider this REST API endpoint:

/users/find?q={query}

Given infinite time, we could test by passing every potential string to the query parameter. This is very wasteful of our time since many strings are essentially the same when testing the endpoint.

/users/find?q=ThisUserDoesNotExist
/users/find?q=TheUserDoesNotExist
/users/find?q=TehUserDoesNotExist

There is no reason to test each of the above. Instead we generate an equivalence class to lump them all together.

Case 1: A string that does not match any users

Determining how to organize the space of potential test cases into equivalence classes is a skill that develops over time. Too many, too narrow classes and you waste testing time. Too few, too broad classes and you won't be fully exercising the feature.

Front-loaded Testing

Another valuable approach is to front-load your testing as early into the development process as possible. Flaws found earlier in the process are cheaper and easier to resolve. Cheaper because you don't have to redo the latter parts of the process. Easier because the code is fresher in the developer's mind.

Consider an application's "share via email" functionality. The application incorrectly reports that some email addresses are invalid. This is a regression (which means it used to work, but now it doesn't).

Aside: Seriously, don’t write your own validation regex unless you absolutely have to.

The worst way to discover this is for your customers to tell you. Your customer success team fields many calls about it. They reproduce the behavior and file a bug. Product management triages the bug as High Priority. The developer drops everything and context switches back into the code change they made days or even weeks ago. They find the root cause, create a fix, and shepherd the fix out to production.

This is expensive. It negatively impacted customers. It scrambled several teams. It delayed the feature the developer planned to build today.

A better way to discover this bug is for a manual tester to find it shortly before release. No negative customer impact. No customer success team involvement.

A better way is for automated integration tests to find it the evening after the code is merged. No repeating the pre-release process. A lighter context switch for the developer.

A better way is for a unit test to find it before the code is even merged. Nothing delayed. The developer is the only one involved. The developer barely needs any context switch.

Now not everything can be a unit test. If the bug was the result of poor interaction between the API and the client, then integration testing is the earliest you are going to be able to find it. But you should move your testing as early in the process as possible.

Some folks go further than that and advocate for Test-Driven Development (TDD). In a nutshell, the TDD approach is when you create the test cases before you develop the feature. As you write the feature, your test cases start passing. Supporters argue that this approach leads to improved code quality with a faster feedback loop.

I disagree. The attempts at TDD I've seen end up being time-consuming and wasteful. Sometimes the tester wasn't able to make good equivalence classes for code that didn't exist. Sometimes they wrote many tests for functionality that was later cut or adjusted as the feature took shape. Sometimes both.

If you are a TDD fan, drop a comment and let us know why you think it is great.

Static vs Dynamic testing

It can be helpful to divide testing into two categories based on whether the code is being executed.

Static testing examines the code without executing it. The core question it asks is "Will this code perform the intended function?"

Static testing is primarily done by tools:

Compilers
Linters, like ESLint for JavaScript, Pylint for Python, or RuboCop for Ruby
Style checkers, like Prettier for JavaScript (and others!), Black for Python, or Stylelint for CSS
Complexity analyzers, like SonarQube or other vendors

Using these tools can go a long way toward catching issues before you ever push your code up to your branch.

These tools cannot find everything, however, so we still do a manual form of static testing: Code Reviews! It is best practice for another engineer to perform static testing by reading and reasoning about your code. How to do this well is a topic for another time.

Dynamic testing examines executing code. The core question it asks is "Does this system perform the intended function?"

This is what we normally think of as "testing" and we can break it down further by how much of the system is being examined:

Unit tests for functions or components
Integration tests for interactions between components
End-to-end tests (or System tests) for the fully functioning system

But for dynamic testing, how do we approach generating the initial set of potential test cases?

Black Box vs White Box testing

It can be helpful to divide dynamic testing into two categories based on whether the tester has knowledge of the internals of the product. We use the terms black box and white box to describe these states.

A black box is completely opaque. The tester can interact with the interface but has no insight into the internal functionality.

In this approach, the tester relies on the inputs and outputs to determine if the system functions as designed. They generate test cases from the design and by predicting how users might try to operate the system.

Benefits of the black box approach:

Small learning curve. You only need to know how to interact with the product.
Unaffected by the bias of the developer. If you don't know how the internals function, you will use the system in unexpected ways. Like real customers.

Downsides of the black box approach:

If it isn't obvious from the design, it may not get tested. This is especially true for error handling or other uncommon states.
Prone to overtesting due to the difficulty in forming good equivalence classes.

A special case of black box testing is Exploratory Testing. This describes using the product without any idea how you are supposed to use it or what it is supposed to do. I've seen very interesting bugs found with this approach.

A white box is completely transparent. The tester can investigate the internals of any part of the system.

In this approach, the tester can examine the intent of the code to verify if the result matches the intent. They generate test cases based on code flows and can debug, inject faults, and otherwise make direct internal edits.

Benefits of the white box approach:

It is provable which code paths you have tested. We refer to this as Code Coverage.
You can generate the minimum necessary equivalence classes.

Downsides of the white box approach:

High learning curve for the tester if they are not also the developer.
Tends to focus on the code as written, not the feature as intended.

A special case of white box testing is Unit Testing. The developer writes automated tests for a "unit" of code (usually a function). These tests are often very quick to run and can be run frequently both locally and as part of the CI/CD pipeline.

From Theory to Practice

This installment examines testing from a very high level. In the next installment we will stick with the testing topic, but zoom in further on how to generate test cases and on some specific focus areas.

If you have any examples of code or features that were difficult to test, let us know in a comment or email reply. Real examples can be much more fun to work with than contrived ones.

The Virtual Mentor

Discussion about this post