Introducing Table Faker: Simplifying Synthetic Data Generation for Developers

In the realm of software development, one of the common challenges developers face is the need for realistic test data. Crafting such data manually can be time-consuming and prone to errors. Enter Table Faker, a Python package designed to streamline the process of generating synthetic yet authentic-looking table data for various applications. In this article, we'll delve into the features and capabilities of Table Faker, empowering developers to efficiently create test data for their projects.
Key Features
Schema Definition: With Table Faker, defining the structure of your tables is a breeze. Utilizing a straightforward YAML format, you can specify table schemas, column names, data generation code, and even relationships between tables.
Faker and Randomization: Leveraging the powerful Faker library, Table Faker enables the creation of fake data that closely resembles real-world scenarios. Randomization ensures diversity and authenticity in the generated data.
Multiple Output Formats: Table Faker offers flexibility in output formats, catering to diverse project requirements. Whether you need data in a Pandas DataFrame, SQL insert scripts, CSV, Parquet, JSON, or Excel format, Table Faker has you covered.
Installation
Getting started with Table Faker is as simple as running
pip install tablefaker
Sample YAML File
version: 1
config:
locale: en_US
tables:
- table_name: person
row_count: 10
columns:
- column_name: id
data: row_id
- column_name: first_name
data: fake.first_name()
- column_name: last_name
data: fake.last_name()
- column_name: age
data: fake.random_int(18, 90)
- column_name: dob
data: fake.date_of_birth()
null_percentage: 0.20
- column_name: salary
data: None # NULL
- column_name: height
data: "\"170 cm\"" # string
- column_name: weight
data: 150 # number
Sample Code
import tablefaker
# Export to CSV
tablefaker.to_csv("test_table.yaml")
# Export to SQL insert scripts
tablefaker.to_sql("test_table.yaml")
# Export to JSON
tablefaker.to_json("test_table.yaml", "./target_folder")
# Export to Parquet
tablefaker.to_parquet("test_table.yaml", "./target_folder")
# Export to Excel
tablefaker.to_excel("test_table.yaml", "./target_folder/target_file.xlsx")
Custom Functions
Table Faker provides developers with the ability to define custom functions for generating column data. This advanced feature offers flexibility and control, allowing developers to incorporate logic that retrieves data from databases, APIs, or other sources.
from tablefaker import tablefaker
from faker import Faker
fake = Faker()
def get_level():
return f"level {fake.random_int(1, 5)}"
tablefaker.to_csv("test_table.yaml", "./target_folder", custom_function=get_level)
Conclusion
With its intuitive schema definition, integration with Faker for realistic data generation, and support for multiple output formats, Table Faker emerges as a valuable tool for developers seeking to streamline test data generation. Whether you're working on software testing, data analysis, or application prototyping, Table Faker empowers you to efficiently create synthetic data that meets your project's needs.
For bug reports, feature requests, and further updates, check out the [Table Faker GitHub repository](https://github.com/necatiarslan/table-faker).
Happy coding!
Necati ARSLAN
Subscribe to my newsletter
Read articles from Necati Arslan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Necati Arslan
Necati Arslan
I'm a Senior Data Engineer with a proven record of transforming data into actionable insights. I excel in data engineering, working with top companies like Capital One, Facebook, and Verizon. My expertise spans AWS, Python, Airflow, Spark, and more. I thrive on complex challenges and actively contribute to open-source projects. Let's connect and explore new opportunities! https://github.com/necatiarslan