Background

First of all, why use a wrapper? Well, wrappers are incredibly useful high-level tools that allow you to use an API without having to worry about constructing correct API calls, allowing the user of such wrapper to focus more on their code logic or data-related purposes related to the API usage, and less on API usage efficiency, authentication logic, and the actual API call structure.

The EgyTech API is a simple RESTful API with good documentation. It has 2 publicly available endpoints, and no authentication flow. It's part of a pioneering open-source initiative started by software engineers Abdelrahman El-Adawy, Ahmed ElAdawy, and Mahmoud Salem. The API is built with Node.js and deployed on Cloudflare. You can also check out their website here, API source code here, and frontend source code here.

Disclaimer

This is by no means a professional recipe, or a structured learning resource. There are many great learning resources out there. This is just a walkthrough, if you will, addressing how I tried to solve the following problem:

The EgyTech API gives access to anonymized survey data for tech salaries in Egypt. Python is a popular choice for data analysts as well as data scientists. It is used for data manipulation, visualization, as well as for creating, evaluating, and using machine learning models. This wrapper allows such parties to access these data in a performant and simple manner, requiring less attention for query validation, data deserialization, and giving room for more creative use-cases by lowering the minimum technical requirements for interacting with this API.

This wrapper also offers a great learn-by-example resource wherein you can re-apply the same principles outlined here to create your own wrapper for another API, or even partially implement the ideas outlined in this article.

Why I Created A Wrapper

This wrapper is sort of an overkill for how simple its underlying API is. However, it serves as a good example of using pydantic for data serialization and deserialization, while also minimizing possible user input errors. Pydantic is a python library that allows you to validate data under schemas, or "Models" in pydantic terms. You can check out the pydantic docs for more information. Needless to say that we will be using them quite a bit in the upcoming sections.

This wrapper also leverages the performance boost associated with using connection pooling to execute multiple outgoing API requests, and the further boost coming with making these multiple requests asynchronous. This is, again, a bit of an overkill for this API, especially with the amount of data available through the API. However, it serves as a great example of a modular implementation of a performant API wrapper, that you can reuse later for your own wrapper as this project is licensed under the MIT licence.

So why did I build this wrapper? I wanted to combine the modest knowledge and experience I had in dealing with APIs using python to provide my own perception of a perfect API wrapper that doesn't just summarize API calling boilerplate in a non-prohibitive manner, but allows the user to leverage more advanced concepts such as connection pooling and async api calls from a high-level perspective. I also wanted to start a Python community in the EgyTech ecosystem. Finally, I wanted to document my routine for using pydantic in my Python projects. I believe using pydantic is essential for most reusable Python code. This project provides a usage example that could be regarded as the reverse of FastAPI (having pydantic models or schemas for validating incoming API calls), making for a niche, but very interesting use-case for pydantic. This is also my first somewhat full Python project. I wanted to make an actually reusable package that serves as a enriching learning experience at the same time. That said, let's dive right in!

This article will implement the basic concepts. We will first inspect the API, and understand how it works. Next, we will implement simple client classes based on pydantic models, which receive user input configuring the API call parameters, validate this input, and if valid, make an API call with the specified parameters. Finally, they will enable the user to export the retrieved data as a pandas DataFrame, or as a file.

In the next article, we will address more advanced concepts such as connection pooling and asynchronous API calls, and will allow out high-level users to leverage these more advanced features easily.

I. Planning Phase

Studying The API

Accessing the EgyTech API SwaggerUI, we can know that the API is a RESTful API that has only two endpoints, each of which supporting only GET requests.

The first endpoint is /participants, which allows you to retrieve a list of individual survey responses which you can compile however you want.

The second endpoint is /stats, which allows you to access aggregated statistical data about subsets of the survey dataset. It essentially uses an SQL command on the participants table to bucket the total compensation into preset buckets. You can view the neat SQL command in the API source code here.

It is worth noting that this can be done with pandas using pandas.cut on the compensation column of a pandas.DataFrame of individual participants. This would allow you to create custom bins.

`/participants`

Diving deeper into this endpoint, we see that it receives a GET request with the following headers:

headers = {
"accept": "application/json"
}

It also accepts some additional query parameters, all of which are optional and have particular enums or limits. The parameters are:

title: job title of retrieved participants.
level: career level of retrieved participants.
yoe_from_included: minimum years of experience of retrieved participants. With some tinkering on the API, or reviewing the SwaggerUI scheme from the source code, we find out that it has a minimum of 0 and a maximum of 20.
yoe_to_excluded: maximum years of experience of retrieved participants. Also, using either of the aforementioned methods, we can find out that it has a minimum of 1 and a maximum of 26.
gender: gender of retrieved participants.
cs_degree: whether the retrieved participants have a computer science degree.
business_market: the market scope of the retrieved participants' companies.
business_size: retrieved participants company size.
business_focus: primary focus of participants' companies.
business_line: retrieved participants' company business line.
include_relocated: include those who have relocated.
include_remote_abroad: include those who work remotely from Egypt for companies abroad.

A successful request returns a JSON response with the following keys:

success: whether the request was successful.
meta: response metadata.
results: a list of individual survey responses as dictionaries.

Since this is a high-level wrapper, we'll be more concerned with the results key. Of course, we will use the success key to identify server-side errors, but for now our main concern is results.

`/stats`

While this endpoint is less relevant for python users who are also data analysts or data scientists, we will include in the wrapper to provide our users with the full API experience.

Similarly, this endpoint requires one header and has the exact same optional parameters available for /participants, with the addition of one parameter:

programming_languages: the programming language of retrieved participants.

A GET request on the /stats endpoint will result in a JSON response with the following keys:

stats: some useful statistics about the selected subset of data. It is a dictionary that includes the total count of the queried subset of survey responses, the median compensation of the subset as well as the 20th, 75th, and 90th percentiles of the queried subset's salaries.
buckets: this includes the returned buckets or bins. It consists of a list of dictionaries, each of which has 2 keys:

bucket: specifies the compensation bin or bucket.
count: number of survey participants whose compensation falls into this bucket.

Planning The Implementation

First, let's start with our essential dependencies. We will be using pydantic for input validation, serialization and deserialization. For making our API calls, we will be using httpx. Do check out their docs aside from this project, they're both very useful python packages. Some additional typing functionality was added in python 3.12. If you're working with an older version, do make sure typing_extensions is also installed in your environment. Finally, we will be using pandas for outputting the retrieved data in the form of a pandas DataFrame. We will also be using it to make quick functions that export the data to a .csv file or a .xlsx file locally.

I would recommend that you use a virtual environment for installing the dependencies, but this concept is beyond the scope of this tutorial, so we'll just install our dependencies with pip. There are many options for virtual environments. I like poetry as well as pipenv. Either of which will be very beneficial to learn in your python toolkit.

pip install pydantic httpx pandas

Since we want to validate user input for constructing an API call, we can simply create a pydantic BaseModel that accepts inputs. Let's call these ParticipantsQueryParams, and StatsQueryParams. Since both endpoints receive almost the same parameters, we'll construct our code from scratch for the /participants endpoint, and make our /stats parameters inherit from it with the addition of programming_languages.

from pydantic import BaseModel

class ParticipantsQueryParams(BaseModel):
    # TODO: define possible input parameters
    pass

class StatsQueryParams(ParticipantsQueryParams):
    #TODO: define programming_languages over inherited parameters
    pass

II. Coding Phase

Let's get to writing code!

Enumerations (enums)

Since we know that each parameter has specific enumerations (choices if you may), we will be declaring those for our pydantic model to be able to identify and validate them. There are also the boolean type parameters and the integers that have limits. We will start with the string parameters that have enums.

The python standard library provides support for enumerations, which pydantic can validate. We will be using enum which doesn't require special dependencies.

from enum import Enum

# Create an enum for possible title values
class TitleEnum(str, Enum):
    ai_automation = "ai_automation"
    backend = "backend"
    crm = "crm"
    data_analytics = "data_analytics"
    data_engineer = "data_engineer"
    data_scientist = "data_scientist"
    devops_sre_platform = "devops_sre_platform"
    embedded = "embedded"
    engineering_manager = "engineering_manager"
    executive = "executive"
    frontend = "frontend"
    fullstack = "fullstack"
    hardware = "hardware"
    mobile = "mobile"
    product_manager = "product_manager"
    product_owner = "product_owner"
    research = "research"
    scrum = "scrum"
    security = "security"
    system_arch = "system_arch"
    technical_support = "technical_support"
    testing = "testing"
    ui_ux = "ui_ux"

We will not be addressing all string parameters that have enums as that would be repetitive. You can check out the code for them here.

Special Serializers

Now, onto our second challenge. Despite the parameter cs_degree having two possible values being "yes" or "no", we can treat it as boolean. We will be implementing some special logic that converts it from python's bool; True or False to the string format "yes" or "no". Thankfully, pydantic natively provides this functionality by creating a special serializer for a model field. For the purposes of this tutorial, serialization will be the process of converting a group of parameters into a dictionary that's compatible with an API call.

from pydantic import Field, PlainSerializer
from typing_extensions import Annotated

# Create a special type for cs_degree that is a boolean
# which translates to "yes" or "no" instead of regular
# boolean values
DegreeType = Annotated[
    bool, PlainSerializer(lambda x: "yes" if x else "no")
]

You could also notice that include_relocated and include_remote_abroad accept either "true" or "false" in all-lowercase. While this shouldn't matter in a URL, we will be implementing a special serializer for these types as well in order to store them as bool until we actually build the query which would need them to be string. Always remember, "Explicit is better implicit", as The Zen of Python states.

from pydantic import Field, PlainSerializer
from typing_extensions import Annotated

# Create a special type for include_relocated and include_remote_abroad
# that is a boolean which translates to "true" or "false"
IncludeType = Annotated[
    bool, PlainSerializer(lambda x: "true" if x else "false")
]

Constricted Types

Another user input validation procedure that needs handling is validation of input values for yoe_from_included and yoe_to_excluded, both of which have specific ranges for their integer inputs. For this we, will be using pydantic's special field type conint:

from pydantic import conint

# Create a custom integer type that only accepts
# integers between 0 and 20 and another that only
# accepts those between 1 and 26
min_yoe = conint(strict=True, ge=0, le=20)
max_yoe = conint(strict=True, ge=1, le=26)

While the pydantic documentation discourages the usage of conint in favor of using Field restrictions, we will be using conint to showcase constricted types, a feature offered by pydantic.

Type Annotations

When writing code, a linter checks your library and suggests improvements for your code. You can definitely code without a linter, but we will be helping our users' linters with static type checking. Our custom types are basically special variants of defacto python-included types. For example, our min_yoe is simply an integer that can only be in a specific range. However, a linter can't understand this unless we explicitly specify that it should expect this type to be an integer, but with extra steps :)

This can be achieved using type annotations on custom types as follows:

from pydantic import conint
from typing import Annotated

# Constricted integer with type annotations
min_yoe = Annotated[int, conint(strict=True, ge=0, le=20)]
max_yoe = Annotated[int, conint(strict=True, ge=1, le=26)]

You can replace typing with typing_extensions in python versions older than 3.12.

Optional Attributes

Our API's parameters are all optional! Making an API call with no parameters simply retrieves all the data offered by the API. There's also the fact that not all API calls have to use all the parameters offered by the API, since they are optional. We must tell code linters that these parameters are optional as follows:

from pydantic import conint
from typing import Optional, Annotated

# Optional constricted integer with type annotations
min_yoe = Optional[Annotated[int, conint(strict=True, ge=0, le=20)]]
max_yoe = Optional[Annotated[int, conint(strict=True, ge=1, le=26)]]

In pydantic, a model field is optional if it has a default value. We will be handling this shortly.

Putting It Together

Having made a pretty decent setup for our custom types to be validated with pydantic, we can now actually create the model that our users will initialize on constructing their API call parameters. This model would validate their input, provide linter and auto-completion support, and even help with the API call building process by creating a dictionary that's ready for an API call.

In this compiled model code, we will be adding some information:

Giving all fields a default value of None, making them optional for pydantic.
Preventing extra inputs that don't correspond to a valid parameter as it could simply be a typo that wouldn't translate to an actual parameter, and the user would only know by reviewing the data and concluding that the filter parameter didn't go through. This can be done by changing the model configuration.
Creating an alias for some fields, to make it more concise and user-friendly. The user can still use the same parameter name provided by the API.

from typing import Optional, Annotated
from pydantic import BaseModel, ConfigDict, Field

class ParticipantsQueryParams(BaseModel):
    model_config = ConfigDict(extra="forbid")

    title: Optional[Annotated[str, TitleEnum]] = None
    level: Optional[Annotated[str, LevelEnum]] = None
    min_yoe: Optional[Annotated[int, conint(strict=True, ge=0, le=20)]] = Field(
        default=None, alias="yoe_from_included"
    )
    max_yoe: Optional[Annotated[int, conint(strict=True, ge=1, le=26)]] = Field(
        default=None, alias="yoe_to_excluded"
    )
    gender: Optional[Annotated[str, GenderEnum]] = None
    cs_degree: Optional[Annotated[str, DegreeType]] = None
    business_market: Optional[Annotated[str, BusinessMarketEnum]] = None
    business_size: Optional[Annotated[str, BusinessSizeEnum]] = None
    business_focus: Optional[Annotated[str, BusinessFocusEnum]] = None
    business_line: Optional[Annotated[str, BusinessLineEnum]] = None
    include_relocated: Optional[Annotated[bool, IncludeType]] = None
    include_remote_abroad: Optional[Annotated[bool, IncludeType]] = None

class StatsQueryParams(ParticipantsQueryParams):
    model_config = ConfigDict(extra="forbid")

    programming_language: Optional[Annotated[str, ProgrammingLanguageEnum]] = None

Note that we don't repeat the parameters for StatsQueryParams, but rather make it inherit from ParticipantsQueryParams and add the field programming_languages.

Until now, we've created placeholders for the outgoing query parameters. We haven't yet implemented actual API calling functionality.

Implementing API Calling Functionality

In this section, we will create a client class that is built on top of the query parameters model we created before. Let's create a client for each endpoint, since each endpoint has a specific format of retrieved data, as well as a specific endpoint.

Let's start with the participants client. We will build it on top of the ParticipantsQueryParams model. Why? Because it receives the same input parameters as our query parameter placeholder. However, it has additional functionality. It makes the API call for you, formats the data as a pandas DataFrame, and allows you to even export it quickly.

Let's start with the API calling part. Here, we will be disallowing extra fields as before. We will also configure the model to store enum inputs as their values (in our case, strings) instead of instances of the enum. For example, inputting the title field as "backend" would store this field as a string which is "backend" instead of a TitleEnum.backend instance. This is more fitting for our use case which will eventually output this field as a string value to the key title in a dictionary.

We will also use the BaseModel.model_dump() function to produce the parameters dictionary needed for the API call. We will be configuring it to serialize using JSON-compatible values, rather than python-compatible values, as this dictionary will be used in an API call, which is language-agnostic. We will also exclude fields that have a none value from the dictionary, as they essentially have no use for us.

We will also define a field that will act as a placeholder for the created pandas DataFrame. This field will be a private field that can only be accessed with a function, and not accessible as an attribute. In order to define fields with custom types (types that are not supported in pydantic such as pandas.DataFrame), we need to enable arbitrary_types_allowed.

Finally, we will convert the retrieved list of participants to a pandas.DataFrame by constructing it with the pandas.DataFrame.from_records() function, then saving it to our _participants field.

from pydantic import BaseModel, ConfigDict
import httpx
import pandas as pd

class Participants(ParticipantsQueryParams):
    model_config = ConfigDict(
        arbitrary_types_allowed=True, use_enum_values=True, extra="forbid"
    )
    _participants: Optional[pd.DataFrame] = None

    def make_api_call(self):
        url = "https://api.egytech.fyi/participants"
        headers = {"accept": "application/json"}

        response = httpx.get(
            url,
            headers=headers,
            params=self.model_dump(mode="json", exclude_none=True),
        )
        participants_list = response.json()["results"]
        self._participants = pd.DataFrame.from_records(participants_list)

Great work! We defined a function that makes the API call, converts the retrieved data to a pandas DataFrame, and saves the data. Unfortunately, this function has to be called by the user explicitly on their instance of Participants for the API call to be made. Pydantic provides a neat class method model_post_init() that you can override for extra initialization steps of your model. In our case, we want the API calling to be part of the initialization process of our model. This can be done as follows:

from pydantic import BaseModel, ConfigDict
import httpx
import pandas as pd

class Participants(ParticipantsQueryParams):
    model_config = ConfigDict(
        arbitrary_types_allowed=True, use_enum_values=True, extra="forbid"
    )
    _participants: Optional[pd.DataFrame] = None

    def model_post_init(self, __context: Any):
        self.make_api_call()

    def make_api_call(self):
        url = "https://api.egytech.fyi/participants"
        headers = {"accept": "application/json"}

        response = httpx.get(
            url,
            headers=headers,
            params=self.model_dump(mode="json", exclude_none=True),
        )
        participants_list = response.json()["results"]
        self._participants = pd.DataFrame.from_records(participants_list)

Now, our client takes in the user input, validates it, and if valid, makes the API call automatically, converts the retrieved data to a pandas.DataFrame, and saves the data. However, we do need to provide a method for our users to access the created pandas DataFrame. Let's create a get_dataframe() class method.

from pydantic import BaseModel, ConfigDict
import httpx
import pandas as pd

class Participants(ParticipantsQueryParams):
    model_config = ConfigDict(
        arbitrary_types_allowed=True, use_enum_values=True, extra="forbid"
    )
    _participants: Optional[pd.DataFrame] = None

    def model_post_init(self, __context: Any):
        self.make_api_call()

    def make_api_call(self):
        url = "https://api.egytech.fyi/participants"
        headers = {"accept": "application/json"}

        response = httpx.get(
            url,
            headers=headers,
            params=self.model_dump(mode="json", exclude_none=True),
        )
        participants_list = response.json()["results"]
        self._participants = pd.DataFrame.from_records(participants_list)

    def get_dataframe(self):
        return self._participants

Very nice! Now let's duplicate this for the /stats endpoint. We will just need to edit the _participants attribute and give it a more suitable title _buckets and create one more field _stats for the compiled statistics dictionary. We will also create a method that accesses this dictionary as we did before.

from pydantic import BaseModel, ConfigDict
import httpx
import pandas as pd

class Stats(StatsQueryParams):
    model_config = ConfigDict(
        arbitrary_types_allowed=True, use_enum_values=True, extra="forbid"
    )
    _buckets: Optional[pd.DataFrame] = None
    _stats: Optional[Dict[str, str]] = None

    def model_post_init(self, __context: Any):
        self.make_api_call()

    def make_api_call(self):
        url = "https://api.egytech.fyi/stats"
        headers = {"accept": "application/json"}

        response = httpx.get(
            url,
            headers=headers,
            params=self.model_dump(mode="json", exclude_none=True),
        )
        bucket_list = response.json()["buckets"]
        self._buckets = pd.DataFrame.from_records(bucket_list)

    def get_dataframe(self):
        return self._buckets

    def get_stats(self):
        return self._stats

Now, onto the final piece of our API wrapper. We want functions that can quickly export the data to .csv or .xlsx. This can be done with pandas.DataFrame.to_csv() and pandas.DataFrame.to_excel(), respectively.

from pydantic import BaseModel, ConfigDict
import httpx
import pandas as pd

from pydantic import BaseModel, ConfigDict
import httpx
import pandas as pd

class Participants(ParticipantsQueryParams):
    model_config = ConfigDict(
        arbitrary_types_allowed=True, use_enum_values=True, extra="forbid"
    )
    _participants: Optional[pd.DataFrame] = None

    def model_post_init(self, __context: Any):
        self.make_api_call()

    def make_api_call(self):
        url = "https://api.egytech.fyi/participants"
        headers = {"accept": "application/json"}

        response = httpx.get(
            url,
            headers=headers,
            params=self.model_dump(mode="json", exclude_none=True),
        )
        participants_list = response.json()["results"]
        self._participants = pd.DataFrame.from_records(participants_list)

    def get_dataframe(self):
        return self._participants

    def save_csv(self):
        self._participants.to_csv(index=False)

    def save_excel(self):
        self._participants.to_excel(index=False)

Clone the past step to the Stats client and there you go! We have created a simple, pythonic API wrapper for the EgyTech API. It's extremely simple yet a very powerful showcase of how you can use pydantic to make a modular implementation of your favorite API. If you're interested, there are some extra features implemented in this article. These features are a bit of an overkill for this particular API, but the concepts they're built on are very interesting to know and implement.

You can check the full source code for this article here. This code is licensed under the MIT license, which is very permissive. You can reuse it however you want.

Note: I've put the code in 3 separate files for readability. The files in the repo are:

enums.py: contains the string enums for the field created above.

models.py: contains the regular client models we created in this article.

models_advanced.py: contains the PoolingClient & AsyncPoolingClient which are covered in the next article in the series.

Streamlining API Interactions: Building a Pythonic Wrapper for EgyTech API

Table of contents