In the realm of software development, one of the common challenges developers face is the need for realistic test data. Crafting such data manually can be time-consuming and prone to errors. Enter Table Faker, a Python package designed to streamline the process of generating synthetic yet authentic-looking table data for various applications. In this article, we'll delve into the features and capabilities of Table Faker, empowering developers to efficiently create test data for their projects.

Key Features

Schema Definition: With Table Faker, defining the structure of your tables is a breeze. Utilizing a straightforward YAML format, you can specify table schemas, column names, data generation code, and even relationships between tables.
Faker and Randomization: Leveraging the powerful Faker library, Table Faker enables the creation of fake data that closely resembles real-world scenarios. Randomization ensures diversity and authenticity in the generated data.
Multiple Output Formats: Table Faker offers flexibility in output formats, catering to diverse project requirements. Whether you need data in a Pandas DataFrame, SQL insert scripts, CSV, Parquet, JSON, or Excel format, Table Faker has you covered.

Installation

Getting started with Table Faker is as simple as running

pip install tablefaker

Sample YAML File

version: 1
config:
  locale: en_US
tables:
  - table_name: person
    row_count: 10
    columns:
      - column_name: id
        data: row_id
      - column_name: first_name
        data: fake.first_name()
      - column_name: last_name
        data: fake.last_name()
      - column_name: age
        data: fake.random_int(18, 90)
      - column_name: dob
        data: fake.date_of_birth()
        null_percentage: 0.20
      - column_name: salary
        data: None                # NULL
      - column_name: height
        data: "\"170 cm\""        # string
      - column_name: weight
        data: 150                 # number

Sample Code

import tablefaker

# Export to CSV
tablefaker.to_csv("test_table.yaml")

# Export to SQL insert scripts
tablefaker.to_sql("test_table.yaml")

# Export to JSON
tablefaker.to_json("test_table.yaml", "./target_folder")

# Export to Parquet
tablefaker.to_parquet("test_table.yaml", "./target_folder")

# Export to Excel
tablefaker.to_excel("test_table.yaml", "./target_folder/target_file.xlsx")

Custom Functions

Table Faker provides developers with the ability to define custom functions for generating column data. This advanced feature offers flexibility and control, allowing developers to incorporate logic that retrieves data from databases, APIs, or other sources.

from tablefaker import tablefaker
from faker import Faker

fake = Faker()
def get_level():
    return f"level {fake.random_int(1, 5)}"

tablefaker.to_csv("test_table.yaml", "./target_folder", custom_function=get_level)

Conclusion

With its intuitive schema definition, integration with Faker for realistic data generation, and support for multiple output formats, Table Faker emerges as a valuable tool for developers seeking to streamline test data generation. Whether you're working on software testing, data analysis, or application prototyping, Table Faker empowers you to efficiently create synthetic data that meets your project's needs.

For bug reports, feature requests, and further updates, check out the [Table Faker GitHub repository](https://github.com/necatiarslan/table-faker).

Happy coding!

Necati ARSLAN

necatia@gmail.com

Introducing Table Faker: Simplifying Synthetic Data Generation for Developers