Testing Python code with pytest
In this section, we are going to be using pytest to run automated tests on some code. The code we are going to be using is in the arrays folder within this repository. The functions that will be tested are in arrays.py, and the test code will go in test_arrays.py. Testing your code is extremely important, and it should be done WHILE you are writing it, rather than AFTER.
Usual methods for testing code are doing some manual checks, such as running it over particular input files or variables and checking the results. This has limitations: it might fail to check some parts of the code, or it might fail to find errors that are not immediately obvious. In either case, it is also difficult to find exactly where your errors might be.
To avoid those pitfalls, we should write a set of tests that use known inputs and check for matching with a set of expected outputs. We will write each test to run over as little of the code as possible, such that we can easily identify which parts of the code are failing. This type of test is called a “unit test”, in contrast to “integration tests” that test multiple parts of the code at once. Tip: break long, complex functions with multiple control structures (e.g. if/else, for, while) into smaller functions that do one thing each to make your code easier to read, test, and maintain.
Let’s start by looking into arrays.py and checking the
add_arrays()function. It’s a pretty simple function; it takes two arrays and add them up element-wise. Now, let’s look attest_add_arrays()in test_arrays.py. How is testing being done here? Can you break it? What happens when you run:python arrays/test_arrays.py(Is your virtual environment active? If not, activate it first!)
The output of this test is not particularly useful. Imagine if you had five different functions with five different tests; would an output like
OK BROKEN OK BROKEN BROKENhelp much?
Instead of that structure, we are going to useassertstatements;assertis always followed by something boolean (i.e. something that will be either true or false). Empty lists, the number 0 andNoneare all false-y. If the boolean is true, nothing happens whenassertis run; if it is false, an exception is raised. You can try runningassert 5 == 5andassert 5 == 6in a Python shell to see what happens.Now we are going to replace the
if/elseblock intest_add_arrays()with an assert that looks likeassert output == expect. What happens when you runpython arrays/test_arrays.py? What if the test fails?You see that now at least we get a specific line when the test fails; that’s a good start! However, if we had multiple tests, code execution would be stopped at the exception thrown by
assert. Also, we still need to explicitly calltest_add_arrays()in that test file, which would be easy to forget, especially if we had a bunch of tests. That’s where we are going to be usingpytest!Our first step is removing the call to
test_add_arrays()from the end oftest_arrays.py;pytestwill take care of that for us. Now, in your terminal, just runpytest. What happened? What if the test fails?pytestwill find all files namedtest_*.pyand*_test.pyand all functions starting with names starting withtestinside these files, and it will run those, one at a time, reporting the results of each.Exercise: It’s your turn to write a test! Write a
test_subtract_arrays()function intest_arrays.pythat tests thesubtract_arrays()function inarrays.py! What happens when you runpytestnow?Exercise: Let’s do the opposite: write a
test_multiply_arrays()function with the behavior you would expect to see from amultiply_arrays()function. Then, write themultiply_arrays()to fulfill the test requirements.
This is a process called Test-driven development (TDD): you start by writing your code requirements as tests and then write code that passes those tests. It is a popular approach in certain kinds of software development.Now imagine you want to test multiple cases in
test_add_arrays(): positive results, negative results, zero results, etc. You could change the code in that test function to create thea,bandexpectarrays multiple times, and do one assertion per case.
However,pytestallows for a simpler possibility: parameterizing inputs to your test. You can do this using the decorator@pytest.mark.parametrize()before your test function.
It takes two arguments: a tuple or string with the names of the parameters you want to pass to this function, and a list containing tuples of values of the parameters you want to pass. So for a single case, it would look like@pytest.mark.parametrize(("a", "b", "expect"), [([1, 2, 3], [4, 5, 6], [5, 7, 9])]), and you could add extra tuples for additional test cases. Then all you need to do is adda,bandexpectas arguments to your test function.Exercise: Try using
@pytest.mark.parametrize()for your test functions. What happens when you runpytestnow? Write a test fordivide_arrays()using this approach. Can you find the bugs in thedivide_arrays()function with some clever testing?So far, we have assumed that everything passed to our array functions is correct; that is rarely an assumption you can make in real life. What happens if you run
add_arrays("this is a string", 1)? What aboutadd_arrays([1, 2], [1, 2, 3])?It’s time to add explicit exception handling to our functions. You will probably want to do
raise ValueError("array size mismatch")for the case where arrays sizes are different, andraise TypeError("arguments should be lists")for when the arguments are not lists.Now, we can add new test functions named
test_add_arrays_error()and so on, where we check if errors are being raised correctly. That is done by wrapping our function call withwith pytest.raises(ValueError), for example. What happens when you runpytestnow? Add checks for both possible errors we came up with. Better yet, use thematchkeyword argument to check the string and ensure the right ValueError was raised! What other cases can happen in e.g.divide_arrays()?We have successfully found a way to separate the data to be tested from the code to be tested.
pytesthas an even better way to do that for more complex cases, for example, when you want multiple test functions using the same data. They are called fixtures.
A fixture is defined as a function that returns something we want to use repeatedly in our tests.pytestprovides some fixtures out of the box, like the very usefultmp_pathfixture that gives you a unique temporary location.
But you can also create your own: fixtures can be used to set up test data, create mock objects, or perform any other setup tasks that are needed for your tests. To define a fixture, you use the decorator@pytest.fixturebefore a function. After defining a fixture, you can pass the name of the function as an argument in your test function, and that argument will assume the value that is returned by the fixture.Try creating a
pair_of_lists()fixture and passing it to a test function. What happens when you runpytest? ispair_of_lists()run?Some final tips: append
-vfor more verbose output or-sto seeprint()outputs. You can also run specific tests by passing the file name and function name topytest, e.g.pytest arrays/test_arrays.py::test_add_arrays.
Test coverage
When writing tests, it’s important to know how much of your code is actually being tested by your test suite. This is called code coverage. Code coverage is a metric that tells you what percentage of your code is run (“covered”) when your tests are executed. High coverage means most of your code is tested, while low coverage means there are many untested parts, where bugs could hide.
The most common tool for measuring code coverage in Python is
coverage. You can run it from the command line to see how much of your code is covered by your tests.To use
coveragewithpytest, run:coverage run -m pytestThis will run your tests and collect coverage data.
To see a summary in your terminal, run:
coverage reportTo generate a detailed HTML report you can view in your browser, run:
coverage htmlThen open the file
htmlcov/index.htmlin your browser to explore which lines of code are covered and which are not.Reviewing your coverage report can help you identify untested code and improve your test suite.
More advanced testing
pytestoffers extensive features to make testing your code easier, so check out thepytestdocumentation.The
unittest.mockmodule is also very useful for testing more complex code. It allows you to create mock objects needed by your code and then make assertions about how they have been used. You can even “patch” objects with your mocks. When unit testing, the idea is to test only the code that you own. Mocks are particularly useful for testing code that interacts with external systems, such as databases or web services.There is also the
pytest“monkeypatching” fixture, which allows you to safely set/delete an attribute, dictionary item or environment variable, or even modifysys.pathfor importing.pytesthas an extensive ecosystem of plugins that can help you with specific testing needs. For example, there are plugins for testing web applications, databases, and more. You can find a list of available plugins on the pytest website.Finally, you can consider using hypothesis, a property-based testing library that will generate random input data based on specified properties. This can help you find edge cases and unexpected behavior in your code that you might not have thought of when writing traditional unit tests.
Automated testing
If you’re using GitHub for your code repositories, you can set up automated testing so that your tests are run automatically whenever you push new code or open a pull request. This is done using GitHub Actions, which allows you to define workflows that run on specific events.
Let’s look at the file at
.github/workflow/run_tests.yml. This is a Github Action file - it will specify actions that will happen on Github when you do some things on your repository. What is happening here? When it is triggered? Try using a file like this in one of your repositories (if you have one) or a fork of this repository, pushing a new commit to Github and checking the “actions” tab on your repository’s page.Finally, try adding code coverage to your Github action. Did it work when a new commit was pushed? What was produced?