Adding test cases to GitHub Echo
Introduction
In our open-source course at Seneca this week, we focused on adding tests to the projects we've been building. If you've read my previous blogs, you might know that I've been working on GitHub Echo for the past couple of weeks (check it out here: GitHub Echo). Until now, I thought the project was flawless—but that's easy to assume when you haven’t written any tests, so there’s nothing to fail! This week, my goal was to integrate testing with pytest (pytest documentation) and incorporate it into my CI pipeline. Now, whenever there's a new change, I can quickly verify that everything works smoothly without breaking my code.
In this blog, I will walk you through the steps and my thought process while doing this.
Getting started
The first step in the course was to choose a testing framework, and I selected Pytest, as it’s known to be one of the most popular Python testing frameworks. Since I was new to Pytest, I began exploring and learning more about it to ensure I could apply it effectively in my project. I primarily learned the essentials through a Codecademy video available here and from Real Python’s comprehensive guide on testing in Python.
As I manage my project dependencies with Poetry, I added Pytest to the project with the command poetry add pytest
. Additionally, I included requests-mock
to mock API requests, following the requests-mock documentation for setup and usage.
The goal of this week’s lab was not to cover the entire project but to focus on a few utility functions, particularly the core LLM functionality. I started by testing some parser utility functions in my project’s codebase, which allowed me to get comfortable with the process of writing and structuring tests. You can view the specific utility functions I worked with here in the parser module. This initial setup helped me get a good feel for Pytest and laid the foundation for more extensive testing as the project progresses.
Here are some important things that I learnt while doing this:
Using pytest.fixture
to Reuse Data Across Tests
In my tests, I use pytest.fixture
to set up reusable data. Fixtures are great for situations where I need a consistent setup across multiple tests, such as when I need the same configuration data for validating my load_toml_config
function.
Here's how it works:
I created
config_file_content
, a fixture that provides a sample configuration in TOML format. Any test that needs this fixture can just accept it as a parameter. This makes my tests cleaner since I don’t need to redefine the configuration in each test function.I also have an
expected_config
fixture that holds the expected dictionary output. Withpytest
, I can keep bothconfig_file_content
andexpected_config
reusable and separate from the tests, which makes my setup more flexible.
For example, here’s how I use config_file_content
in a test:
def test_load_valid_config(self, config_file_content, expected_config):
# use config_file_content as the input and expected_config to verify the output
Using fixtures this way keeps my tests more readable and avoids duplicate setup code.
Mocking with patch
to Simulate File and Path Operations
In my tests, I need to interact with the filesystem to load configuration files. But I don’t want to rely on real files because they might not exist or could be difficult to set up. That’s where unittest.mock.patch
comes in handy. I can use patch
to replace these file and path operations with mock objects that simulate their behavior.
For instance, I mock Path.exists
to simulate whether a file exists. If I want to test a missing config file, I can patch Path.exists
to return False
, like this:
with patch('pathlib.Path.exists', return_value=False):
# Now, load_toml_config will think the file doesn't exist
In cases where I need to mock the file read itself, I patch open
. This way, I can simulate different file contents without creating any actual files. For example, here’s how I use open
to provide specific content:
with patch('builtins.open') as mock_file:
mock_file.return_value.__enter__.return_value.read.return_value = config_file_content
# Now, when load_toml_config reads the file, it gets config_file_content
Using patch
like this makes my tests more reliable and faster, and I have full control over the input data.
Testing Error Handling with pytest.raises
Error handling is a crucial part of any function, and pytest
makes it easy to verify errors with pytest.raises
. In my code, I use pytest.raises
to check that certain inputs or file errors raise the expected exceptions. For example, when I want to ensure a PermissionError
is raised if there’s a file permission issue, I can use pytest.raises
like this:
with pytest.raises(PermissionError):
load_toml_config('.github-echo-config.toml')
This is very clean and expressive—I can specify both the error type and inspect the error message if needed. pytest.raises
helps me confirm that my code correctly handles various exceptional cases.
Asserting Outputs and Structures in JSON to Markdown Conversion
In my json_to_markdown
function tests, I use straightforward assertions to confirm the output structure. I don’t need to mock anything here, as the function works on JSON data directly. I assert that the Markdown conversion matches expected text.
For example:
assert result == expected_result
Using assertions like this verifies the function’s logic without extra setup. If the function were more complex, I could add more assertions for intermediate states, but for now, a simple check against expected_result
does the job.
Testing the core LLM functionality
Writing tests for two different Large Language Models (LLMs), Google Gemini and Groq, was a complex task. Testing Google Gemini was relatively straightforward, but testing Groq required more work because mocking its responses wasn’t feasible without significant setup. Here's how I approached testing each model, the challenges involved, and some example code to illustrate the process.
Why Test LLM Integration?
Testing LLMs, especially when integrating multiple models, is crucial to ensure consistent outputs and to handle unexpected errors gracefully. In this case, my product leverages both Google Gemini and Groq LLMs to generate summaries of GitHub repositories. Each model has unique capabilities and limitations, so testing them individually was essential.
Mocking Google Gemini
For Google Gemini, I was able to directly mock the API responses using unittest.mock.patch
. This approach was simple because the response structure of the model was predictable and didn’t require the complex setup Groq did. Here’s a test that checks if the summary generation function (get_gemini_summary
) works as expected:
# Test function to ensure successful summary generation using Gemini
@patch('google.generativeai.GenerativeModel.generate_content')
def test_get_gemini_summary(
self, mock_generate_content, mock_gemini_response, mock_usage_metadata
):
# Set up a mock response to simulate the API output
mock_response = MagicMock()
mock_response.text = json.dumps(mock_gemini_response)
mock_response.usage_metadata = mock_usage_metadata
mock_generate_content.return_value = mock_response
# Call the function to generate a summary with mocked data
github_data = {'repo_name': 'example-repo', 'owner': 'user'}
model_temperature = 0.7
result = get_gemini_summary(github_data, model_temperature)
# Verify if the formatted response matches the expected markdown format
expected_formatted_response = json_to_markdown(mock_gemini_response)
assert result['formatted_response'] == expected_formatted_response
assert result['usage'] == mock_usage_metadata
In this example, I:
Mocked the Gemini API response using
mock_generate_content
to simulate the output.Compared the function output to the expected response, verifying that the
formatted_response
field in the result matches the expected markdown format.
The simplicity of mocking here allowed me to avoid setting up an entire client, keeping the test lightweight.
Testing Groq with a Mock Client
For Groq, I ran into challenges when trying to mock specific responses. Unlike Google Gemini, Groq’s response structure and API setup were harder to simulate. Instead of mocking just the response, I had to mock the entire Groq client and simulate its behavior, which took time and required additional setup.
Here’s how I tested Groq using a mock client:
# Mock Groq client setup
@pytest.fixture
def mock_groq_client(self, mock_claude_response):
mock_client = MagicMock()
mock_choice = MagicMock()
mock_choice.message.content = json.dumps(mock_claude_response)
mock_response = MagicMock()
mock_response.choices = [mock_choice]
mock_response.usage = {
'completion_tokens': 123,
'prompt_tokens': 456,
'total_tokens': 579,
}
mock_client.chat.completions.create.return_value = mock_response
return mock_client
# Test function to ensure successful summary generation using Groq
def test_get_groq_summary(self, mock_groq_client):
with patch('application.core.models.groq_model.client', mock_groq_client):
repo_data = {'some_key': 'some_value'}
temperature = 0.5
result = get_groq_summary(repo_data, temperature)
# Check if the Groq client was called with correct parameters
mock_groq_client.chat.completions.create.assert_called_once()
call_args = mock_groq_client.chat.completions.create.call_args[1]
assert call_args['model'] == 'mixtral-8x7b-32768'
assert 'formatted_response' in result
assert 'usage' in result
formatted_response = result['formatted_response']
assert '## Branch Protection' in formatted_response
assert 'No branch protection' in formatted_response
Here’s a breakdown of the steps and reasoning:
Mock the Groq client: The
mock_groq_client
fixture sets up a complete mock of the client, including a simulatedcompletions.create
response. This level of detail was necessary because mocking only the response itself led to inconsistencies.Use
patch
to replace the actual client: I patched the client in theget_groq_summary
function to use the mock client during testing.Validate the client call: After calling
get_groq_summary
, I confirmed the client was called with the right parameters (model name, response format, and temperature).Check the output structure: I verified the presence of key fields like
formatted_response
and ensured it contained expected elements, such as "Branch Protection."
Setting Up Test Coverage with pytest-cov
and Custom Testing Commands
In this section of my project, I focused on improving the testing infrastructure and making it more accessible to contributors. One of the main steps I took was integrating pytest-cov
into my testing workflow. pytest-cov
is a plugin for pytest
that enables automatic generation of code coverage reports, helping developers identify areas of their codebase that need more test coverage. To install pytest-cov
, I used the following command:
poetry add pytest-cov
This command installs pytest-cov
as a dependency in my project. Once installed, I learned how to use it to generate coverage reports and documented the process in the project's documentation.
Running Tests in Various Scenarios
To make it easier for contributors to run tests, I set up several custom scripts in the pyproject.toml
file. These scripts allow users to run tests in different scenarios, whether they want to test only a specific file, class, or run tests continuously as the code changes.
The following scripts were added under the [tool.poetry.scripts]
section in pyproject.toml
:
[tool.poetry.scripts]
lint = "_scripts:lint"
format = "_scripts:format_code"
lint-and-format = "_scripts:lint_and_format"
run-tests = "_scripts:run_tests"
run-tests-on-files = "_scripts:run_tests_on_files"
run-tests-on-classes = "_scripts:run_tests_on_classes"
run-coverage = "_scripts:run_coverage"
run-coverage-report = "_scripts:run_coverage_report"
run-coverage-html = "_scripts:run_coverage_html"
watch-tests = "_scripts:watch_tests"
watch-tests-coverage = "_scripts:watch_tests_with_coverage"
These scripts provide functionality for various testing and coverage needs:
lint
: Runs the code linter (using tools like Ruff) to ensure the code adheres to style guidelines.format
: Automatically formats the code to adhere to style conventions.lint-and-format
: Runs both linting and formatting.run-tests
: Runs all tests in the project.run-tests-on-files
: Runs tests on specific files.run-tests-on-classes
: Runs tests on specific classes.run-coverage
: Runs tests and generates a code coverage report.run-coverage-report
: Generates a coverage report in the terminal.run-coverage-html
: Generates a detailed HTML report of code coverage.watch-tests
: Runs tests continuously as files change, useful during development.watch-tests-coverage
: Similar towatch-tests
, but with coverage reporting.
These scripts are documented in the Contributing Guide to ensure that other developers can easily run tests, check coverage, and keep their code in good shape.
Example: Running Tests on Specific Files or Classes
If you want to run tests on a specific file or class, you can use the following commands:
To run tests on specific files:
poetry run run-tests-on-files path/to/file.py
To run tests on specific classes:
poetry run run-tests-on-classes test_module.TestClass
These commands are especially helpful when you're working on a specific part of the codebase and want to quickly verify that your changes don't break anything.
Automatically Running Tests with --watch
Another useful feature I set up is the ability to run tests automatically when code changes. This is done using the watch-tests
script, which watches for changes in your Python files and reruns the tests as soon as those files are saved. This is extremely helpful for continuous testing during development. You can use it as follows:
poetry run watch-tests
If you also want to track test coverage in real-time while running the tests, you can use:
poetry run watch-tests-coverage
This will run your tests continuously and generate coverage reports on the fly.
Setting Up Continuous Integration (CI) with GitHub Actions
To ensure that tests run automatically in a CI environment, I set up a GitHub Actions pipeline. This pipeline runs whenever changes are pushed to the main
branch or when a pull request is created. Here's the full configuration for the pipeline in the .github/workflows/ci.yml
file:
name: CI Pipeline
on:
push:
branches:
- main
pull_request:
branches:
- main
workflow_dispatch:
jobs:
code-lint:
name: Lint with Ruff
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install ruff
- name: Run Ruff
run: ruff check .
test:
name: Run Tests
runs-on: ubuntu-latest
needs: ["code-lint"]
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install poetry
poetry install
- name: Run tests
run: |
poetry run run-tests
Explanation:
code-lint
job: This job runs the Ruff linter to ensure that the code is well-formed and adheres to style guidelines.test
job: This job installs dependencies using Poetry and runs the tests using thepoetry run run-tests
command.
The CI pipeline ensures that every change to the main
branch is linted and tested automatically, providing confidence that the code is always in a good state.
If you'd like to learn more about testing with pytest
, you can check out the official documentation here. To learn more about integrating code coverage, refer to pytest-cov's documentation.
Conclusion
From this process, I’ve learned a lot about the value of testing and how it improves the reliability of code. While I’ve done testing before, specifically with Jest for JavaScript projects, this experience with pytest was a bit more challenging, but also more rewarding in the end. I’ve realized that testing isn’t just about catching bugs but also about ensuring that my code behaves as expected, even as it evolves.
Before diving into testing with pytest, I had primarily worked with Jest, which is known for its simplicity, especially in front-end applications. Jest’s setup is minimal, and it often handles mocking and assertions in a more automated way, which made it easier for me to get started with. In contrast, pytest requires a bit more effort in terms of manual setup, such as managing fixtures and organizing test cases. While the learning curve for pytest was steeper, I found it much more powerful once I became familiar with its features, particularly when it comes to managing complex testing scenarios and providing more granular control over test execution.
I believe testing is essential, and I definitely plan to incorporate it into all my future projects. Whether using Jest for JavaScript or pytest for Python, testing ensures that my code remains robust and maintainable. Additionally, with tools like pytest, I now appreciate the importance of writing tests early on and integrating them into the development process rather than retrofitting them later. So yes, I’ll absolutely be doing more testing in the future, and I’ll continue to learn and refine my testing skills along the way.
Subscribe to my newsletter
Read articles from Aryan Khurana directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by