Introducing Table Faker: Simplifying Synthetic Data Generation for Developers

Necati ArslanNecati Arslan
2 min read

In the realm of software development, one of the common challenges developers face is the need for realistic test data. Crafting such data manually can be time-consuming and prone to errors. Enter Table Faker, a Python package designed to streamline the process of generating synthetic yet authentic-looking table data for various applications. In this article, we'll delve into the features and capabilities of Table Faker, empowering developers to efficiently create test data for their projects.

Key Features

  • Schema Definition: With Table Faker, defining the structure of your tables is a breeze. Utilizing a straightforward YAML format, you can specify table schemas, column names, data generation code, and even relationships between tables.

  • Faker and Randomization: Leveraging the powerful Faker library, Table Faker enables the creation of fake data that closely resembles real-world scenarios. Randomization ensures diversity and authenticity in the generated data.

  • Multiple Output Formats: Table Faker offers flexibility in output formats, catering to diverse project requirements. Whether you need data in a Pandas DataFrame, SQL insert scripts, CSV, Parquet, JSON, or Excel format, Table Faker has you covered.

Installation

Getting started with Table Faker is as simple as running

pip install tablefaker

Sample YAML File

version: 1
config:
  locale: en_US
tables:
  - table_name: person
    row_count: 10
    columns:
      - column_name: id
        data: row_id
      - column_name: first_name
        data: fake.first_name()
      - column_name: last_name
        data: fake.last_name()
      - column_name: age
        data: fake.random_int(18, 90)
      - column_name: dob
        data: fake.date_of_birth()
        null_percentage: 0.20
      - column_name: salary
        data: None                # NULL
      - column_name: height
        data: "\"170 cm\""        # string
      - column_name: weight
        data: 150                 # number

Sample Code

import tablefaker

# Export to CSV
tablefaker.to_csv("test_table.yaml")

# Export to SQL insert scripts
tablefaker.to_sql("test_table.yaml")

# Export to JSON
tablefaker.to_json("test_table.yaml", "./target_folder")

# Export to Parquet
tablefaker.to_parquet("test_table.yaml", "./target_folder")

# Export to Excel
tablefaker.to_excel("test_table.yaml", "./target_folder/target_file.xlsx")

Custom Functions

Table Faker provides developers with the ability to define custom functions for generating column data. This advanced feature offers flexibility and control, allowing developers to incorporate logic that retrieves data from databases, APIs, or other sources.

from tablefaker import tablefaker
from faker import Faker

fake = Faker()
def get_level():
    return f"level {fake.random_int(1, 5)}"

tablefaker.to_csv("test_table.yaml", "./target_folder", custom_function=get_level)

Conclusion

With its intuitive schema definition, integration with Faker for realistic data generation, and support for multiple output formats, Table Faker emerges as a valuable tool for developers seeking to streamline test data generation. Whether you're working on software testing, data analysis, or application prototyping, Table Faker empowers you to efficiently create synthetic data that meets your project's needs.

For bug reports, feature requests, and further updates, check out the [Table Faker GitHub repository](https://github.com/necatiarslan/table-faker).

Happy coding!

Necati ARSLAN

necatia@gmail.com

0
Subscribe to my newsletter

Read articles from Necati Arslan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Necati Arslan
Necati Arslan

I'm a Senior Data Engineer with a proven record of transforming data into actionable insights. I excel in data engineering, working with top companies like Capital One, Facebook, and Verizon. My expertise spans AWS, Python, Airflow, Spark, and more. I thrive on complex challenges and actively contribute to open-source projects. Let's connect and explore new opportunities! https://github.com/necatiarslan