How to Extract Data Models from an Existing Website or System

As a software engineer, you’ll often find yourself in situations where documentation is missing, legacy code is cryptic, and yet, you're expected to rebuild, revamp, or integrate a system. In such cases, reverse-engineering data models from the existing interface or site becomes not just helpful, but essential.

This post will walk you through why, when, and how to extract data models from a site or application and what to watch out for.

For context, I am currently extracting data models for a digital farmers system, so some examples may be related to that.

Why Extract Data Models?

Whether you're rebuilding a system, creating APIs, planning a data migration, or just organizing a chaotic legacy backend, understanding the underlying data structure is critical. Data models tell the story of how the business thinks. They:

  • Define the core entities of the system

  • Guide your database schema design

  • Inform API contracts and integrations

  • Clarify feature boundaries and relationships

  • Help identify redundancies, gaps, and scalability bottlenecks

When Might You Need This?

  • Revamping a legacy system with poor documentation

  • Building a mobile or API layer for an old web app

  • Migrating from spreadsheets to a relational database

  • Building dashboards or analytics tools

  • Integrating third-party systems where only the UI is available


How to Approach It

1. Start with the UI – What Do Users See?

Every page or view is often powered by a data model behind the scenes. Break the UI into logical modules like:

  • Dashboard

  • Profiles

  • Transactions

  • Reports

  • Forms (especially valuable!)

Tip: Pages that allow editing or adding data (forms) are goldmines for identifying fields and relationships.

2. List Out the Entities

From each page, start identifying distinct entities. For example, from a page like “Soil Test Results,” you might derive a model like:


  "soil_test": {
      "test_id": "",
      "test_date": "",
      "location": "",
      "results": {
        "pH": "",
        "nitrogen_ppm": "",
        "phosphorus_ppm": "",
        "potassium_ppm": "",
        "organic_matter_percent":""     
 }

3. Capture Relationships

Does one user have multiple profiles? Are crops tied to a farm? These relationships matter. Represent them as nested data or separate tables depending on your approach.

4. Think in JSON First

Model data in JSON before deciding on a database. JSON forces you to think structurally and flexibly. Later, you can map it to SQL, NoSQL, or GraphQL types as needed.

5. Identify Optional, Derived, and Calculated Fields

Look out for:

  • Fields that only appear in some conditions (e.g., "Harvest Quality" only if yield > 0)

  • Computed fields like "Growing Period"

  • Hidden metadata like timestamps, creator ID, etc.

6. Ask: What Actions Happen Here?

For each page, identify the verbs:

  • Is the user submitting something?

  • Editing records?

  • Viewing analytics?

This helps you figure out which fields are editable, which are read-only, and which are derived from logic.


⚠️ Things to Watch Out For

PitfallWhat to Do
Implicit relationshipsClarify them explicitly in your model
Overlapping fields across pagesNormalize where appropriate
Magic numbers or unitsAlways annotate fields (e.g., “UGX/kg”)
Static UI labels vs. actual dataIgnore headings; focus on actual content

🧩 Real-World Example: From Interface to Model

Imagine a “Market Forecast” page that shows:

Crop Type: Maize
Forecast Date: May 29th, 2025
Forecast Details: "Maize prices expected to rise due to demand surge..."

You could extract this:


  "market_forecast": {
    "crop_type": "Maize",
    "forecast_date": "2025-05-29",
    "forecast_details": "Prices expected to rise..."
    "created_at" : "2025-05-29",
    "updated_at" : "2025-05-29"
  }

By repeating this across all modules, you’ll soon build a full system schema even if the original developers are long gone.

Tip: Metadata like timestamps, created_at, and updated_at are always important to have.


✅ Benefits of Fleshing Out Your Data Models

  • Improved clarity for backend and frontend teams

  • Faster API development

  • Better test coverage and edge case handling

  • Enables data validation, form generation, and documentation automation

  • Supports migration planning and third-party integrations


📌 Final Thoughts

Extracting data models isn't just an academic exercise; it’s foundational. Whether you're working in a startup, upgrading a cooperative's farming platform, or launching a SaaS tool, data is the backbone of your system.

By carefully reverse-engineering models from existing interfaces, you gain deep control over how your platform functions, scales, and evolves.


Next Steps:
Try this exercise on any admin dashboard or legacy site you're currently working on. Start building the models page-by-page, and soon you'll have a blueprint that your whole team can build confidently upon.

0
Subscribe to my newsletter

Read articles from Mayimuna Kizza Lugonvu directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mayimuna Kizza Lugonvu
Mayimuna Kizza Lugonvu

Hi, I’m Mayimuna, but you can call me Muna. I am a Software Engineer from Uganda with a passion for solving real-world problems through code, creativity, and storytelling. I've started my writing journey, and I hope to write about systems design, development workflows, what it’s like building tech in emerging markets, and everything software-related. Currently, I’m exploring cloud-native technologies, digital empowerment in agriculture, and AI. My other interests include digital marketing and language learning (안녕하세요 — I’m learning Korean!).