Mocks are a necessary evil in Python testing. If overused, they essentially become "change detectors", meaning that every time you refactor the implementation of your code, you must update the mocks too. Ideally, your tests should verify what your code does, not how it does it.

What do I mean by "mocks"?

To some mocks is a catch-all term for all test doubles. In this article, I'm referring to the test doubles Martin Fowler calls mocks and spies. unitest.mock gives us mock/spy functionality by recording calls and allowing us to make assertions based on these calls. These features are what can get us into trouble.

I believe the only legitimate function of mocks is to enable us to write fast unit tests for code that reaches out to external systems, through network calls or system calls for example.

Through the lens of functional core/imperative shell, mocks should only be used to test the imperative shell of a program.

In this article, I'll be using the simple example of part of a web scraper, where the external calls are

requests.get to send HTTP requests
and time.sleep to pause the execution of the program.

The code

The code is split into two sections:

that which handles external calls and requires mocking
and business logic with no side effects that can be tested much more simply.

External calls

The generator function below returns an iterator of property IDs and property page HTML strings to be parsed. It maintains an HTTP client session with a plausable browser User-Agent header. It makes HTTP requests and sleeps between each request. Beyond this, the only logic it contains is basic handling for HTTP status codes.

iter_listings yields immutable primitive types, meaning that the code that parses the resulting HTML only needs to worry about transforming HTML data it into something useful. The external calls have been isolated to one part of the program.

def iter_listings(first_page_url: str) -> Iterator[tuple[int, str]]:
    session = requests.Session()
    session.headers["User-Agent"] = random.choice(USER_AGENTS)

    for base_url, page_url in iter_page_urls(first_page_url):
        page_resp = session.get(page_url)
        if page_resp.status_code != 200:
            break     

        time.sleep(random.randrange(MIN_WAIT_SECS, MAX_WAIT_SECS))

        for listing_id, listing_url in iter_listing_urls(page_resp.content,
                                                         base_url):

            listing_resp = session.get(listing_url)
            if listing_resp.status_code != 200:
                break

            yield listing_id, listing_resp.content

            time.sleep(random.randrange(MIN_WAIT_SECS, MAX_WAIT_SECS))

The business logic is encapsulated in the iter_page_urls and iter_listing_urls generators which are covered further down.

In the test below, in addition to requests.Session and time.Sleep, iter_page_urls is mocked. This isn't necessary for the test to pass. However because the iter_listing_urls generator uses regex and parses HTML, the test gets slow with multiple pages or results.

def test_iter_listings(mocker, to_rent_url, results_page):
    mock_iter_page_urls = mocker.patch("scraping.iter_page_urls")
    first_page_url = f"{to_rent_url}&index=0"
    mock_iter_page_urls.return_value = [
        ("https://www.rm.co.uk", first_page_url)
    ]
    mock_sleep = mocker.patch("scraping.time.sleep")
    mock_session_cls = mocker.patch("scraping.requests.Session")
    mock_session = mock_session_cls.return_value
    mock_session.headers = {}
    dummy_listing = (
        "<html><head><title>foo</title></head><body><p>bar</p></body></html>"
    )

    def get_stub(url):
        if url == first_page_url:
            return mocker.Mock(content=results_page, status_code=200)
        elif url.startswith("https://www.rm.co.uk/properties/"):
            return mocker.Mock(content=dummy_listing, status_code=200)

    mock_session.get.side_effect = get_stub

    for _, listing in scraping.iter_listings(to_rent_url):
        assert listing == dummy_listing

    assert mock_session.headers["User-Agent"] in scraping.USER_AGENTS
    assert mock_session.get.call_count == 26
    assert mock_sleep.call_count == 26

The test is longer and more complex than an ideal unit test. It has to do the dirty work of monkey-patching code that calls to external systems. Nevertheless, it provides some confidence that the code under test works as expected.

To get more confidence we'd need an integration test. This could entail setting up a testing HTTP server and dealing with the slowness resulting from the scraper thread sleeping between requests.

Business Logic

The first of our generators takes a URL of search results and yields the URLs of all pages of results.

def iter_page_urls(first_page_url: str) -> Iterator[tuple[str, str]]:
    url = parse.urlparse(first_page_url)
    base_url = f"{url.scheme}://{url.hostname}"
    query_map = parse.parse_qs(url.query)

    # Scrape up to the last page of 24 results (page 42)
    for index in range(0, 1008, 24):
        page_query = parse.urlencode({**query_map, "index": index}, doseq=True)
        yield base_url, f"{base_url}{url.path}?{page_query}"

The function handles the complexity of parsing and rebuilding the URL without being responsible for accessing it over the network.

The test below is my favourite kind of unit test. It passes some input and verifies the expoected output. There are no external calls to mock nor side effects to think about.

@pytest.fixture
def to_rent_url():
    return (
        "https://www.rm.co.uk/property-to-rent/find.html?"
        "locationIdentifier=REGION%5E87490&"
        "maxBedrooms=2&"
        "minBedrooms=1&"
        "maxPrice=2000&"
        "propertyTypes=bungalow%2Cdetached%2Cflat&"
        "includeLetAgreed=false&"
        "dontShow=houseShare%2Cretirement%2Cstudent&"
        "furnishTypes=unfurnished"
    )

# ...

def test_iter_page_urls(to_rent_url):
    urls = list(scraping.iter_page_urls(to_rent_url))

    for page, (base_url, page_url) in enumerate(urls, start=0):
        url, query = page_url.split("?")
        query_map = dict(pair.split("=") for pair in query.split("&"))

        assert url == "https://www.rm.co.uk/property-to-rent/find.html"
        assert len(query_map) == 9
        expected_index = str(page * 24)
        assert query_map["index"] == expected_index

    assert base_url == "https://www.rm.co.uk"

iter_listing_urls is another generator function. Its job is to take some HTML containing search results and yield URLs for all the listings.

LISTING_PATH_ID_REGEX = r"^\/properties\/([0-9]+)#/"

# ...

def iter_listing_urls(result_page: str,
                      base_url: str) -> Iterator[tuple[int, str]]:

    soup = bs4.BeautifulSoup(result_page, "html.parser")
    search = soup.find("div", {"id": "propertySearch"})
    for a in search.find_all("a", attrs={"data-test": "property-details"}):
        relative_link = a.get("href")
        identifier = re.search(LISTING_PATH_ID_REGEX, relative_link).group(1)
        yield (int(identifier), base_url + relative_link)

Why generators

Like the previous function, iter_listing_urls could just as well be called get_existing_urls return a list. My preference for generators is partly stylistic and partly because they're memory-efficient and not mutable in the same way as lists. (Yes, they're stateful and sometimes this can cause bugs when code tries to iterate multiple times, but they can't be changed from outside.)

Again, the test is very simple. It passes some data into the function and makes sure the result is what's expected.


@pytest.fixture(scope="module")
def results_page():
    test_dir = path.dirname(__file__)
    fixture_path = path.join(test_dir, "fixtures", "results-page.html")
    with open(fixture_path) as file_obj:
        return file_obj.read()

# ...

def test_iter_listing_urls(results_page):
    base_url = "https://rm.co.uk"

    results = list(scraping.iter_listing_urls(results_page, base_url))

    assert len(results) == 25  # 24 normal results + 1 featured
    assert results == [
        (151268708, "https://rm.co.uk/properties/151268708#/?channel=RES_LET"),
        (151626689, "https://rm.co.uk/properties/151626689#/?channel=RES_LET"),
        (151626230, "https://rm.co.uk/properties/151626230#/?channel=RES_LET"),
        (151625894, "https://rm.co.uk/properties/151625894#/?channel=RES_LET"),
        (151625804, "https://rm.co.uk/properties/151625804#/?channel=RES_LET"),
        (120819272, "https://rm.co.uk/properties/120819272#/?channel=RES_LET"),
        (151625567, "https://rm.co.uk/properties/151625567#/?channel=RES_LET"),
        (151625369, "https://rm.co.uk/properties/151625369#/?channel=RES_LET"),
        (151625381, "https://rm.co.uk/properties/151625381#/?channel=RES_LET"),
        (151329554, "https://rm.co.uk/properties/151329554#/?channel=RES_LET"),
        (151625171, "https://rm.co.uk/properties/151625171#/?channel=RES_LET"),
        (150066716, "https://rm.co.uk/properties/150066716#/?channel=RES_LET"),
        (151625024, "https://rm.co.uk/properties/151625024#/?channel=RES_LET"),
        (151624967, "https://rm.co.uk/properties/151624967#/?channel=RES_LET"),
        (151624928, "https://rm.co.uk/properties/151624928#/?channel=RES_LET"),
        (151624880, "https://rm.co.uk/properties/151624880#/?channel=RES_LET"),
        (151624841, "https://rm.co.uk/properties/151624841#/?channel=RES_LET"),
        (151624832, "https://rm.co.uk/properties/151624832#/?channel=RES_LET"),
        (151624775, "https://rm.co.uk/properties/151624775#/?channel=RES_LET"),
        (151624742, "https://rm.co.uk/properties/151624742#/?channel=RES_LET"),
        (151624670, "https://rm.co.uk/properties/151624670#/?channel=RES_LET"),
        (58973337, "https://rm.co.uk/properties/58973337#/?channel=RES_LET"),
        (151624205, "https://rm.co.uk/properties/151624205#/?channel=RES_LET"),
        (151624097, "https://rm.co.uk/properties/151624097#/?channel=RES_LET"),
        (151624082, "https://rm.co.uk/properties/151624082#/?channel=RES_LET"),
    ]

These kinds of tests are the bedrock of my unit testing practice. They're fast, simple and most importantly they test what the code does, not how it does it.

Putting The Code in Context

In my view, anything that accesses an external system, be it a file system, database or HTTP service, should be called or instantiated in or as close as possible to the entry point of a program.

In the case of this web scraper, the entry point is the main function, but it could be an HTTP request handler, a CLI command handler, an event listener or something similar.

The example below is pseudo-code in that I haven't finished or integrated it yet. Nonetheless, it should give some idea of where the iter_listings generator fits into the program.

# service.py
import typing

import protocols, data


def ingest_listings(url: str,
                    ingested: protocols.SetStore,
                    listings: typing.Iterable[tuple[int, str]],
                    runner_callback: typing.Callable):

    for identifier, listing in scraping.iter_new_listings(url):
        if identifier in ingested:
            continue

        profile = data.make_profile_from_listing_html(listing)

        runner_callback(identifer, profile)
        ingested.add(identifier)

This service function would be called in the main function. The main function would be responsible for

calling iter_listings to get the generator
providing something that satisfies the SetStore protocol (which will wrap a key-value store track which IDs have been ingested)
providing a callback function that will send the profile data to the runner service for processing

Note that listings is passed in as an Iterable. This leverages Python's static duck typing and means that when testing ingest_listings, a list or any other iterators can be used rather than a generator. Similarly SetStore could be satisfied by a set.

Conclusion

This has been the first of several posts about how I structure my Python code with testing in mind. The main focus has been on when to use mocking, and how to limit mocking to small parts of your program that are close to its main function.

I hope this has been useful. I'd like to hear about how you solve similar testing problems in the comments.

In a future post, I'll expand on the ingest_listings service function and how I test it.

Python Testing Part 1: isolating external calls and minimising mocks