How to Extract Data Models from an Existing Website or System


As a software engineer, you’ll often find yourself in situations where documentation is missing, legacy code is cryptic, and yet, you're expected to rebuild, revamp, or integrate a system. In such cases, reverse-engineering data models from the existing interface or site becomes not just helpful, but essential.
This post will walk you through why, when, and how to extract data models from a site or application and what to watch out for.
For context, I am currently extracting data models for a digital farmers system, so some examples may be related to that.
Why Extract Data Models?
Whether you're rebuilding a system, creating APIs, planning a data migration, or just organizing a chaotic legacy backend, understanding the underlying data structure is critical. Data models tell the story of how the business thinks. They:
Define the core entities of the system
Guide your database schema design
Inform API contracts and integrations
Clarify feature boundaries and relationships
Help identify redundancies, gaps, and scalability bottlenecks
When Might You Need This?
Revamping a legacy system with poor documentation
Building a mobile or API layer for an old web app
Migrating from spreadsheets to a relational database
Building dashboards or analytics tools
Integrating third-party systems where only the UI is available
How to Approach It
1. Start with the UI – What Do Users See?
Every page or view is often powered by a data model behind the scenes. Break the UI into logical modules like:
Dashboard
Profiles
Transactions
Reports
Forms (especially valuable!)
Tip: Pages that allow editing or adding data (forms) are goldmines for identifying fields and relationships.
2. List Out the Entities
From each page, start identifying distinct entities. For example, from a page like “Soil Test Results,” you might derive a model like:
"soil_test": {
"test_id": "",
"test_date": "",
"location": "",
"results": {
"pH": "",
"nitrogen_ppm": "",
"phosphorus_ppm": "",
"potassium_ppm": "",
"organic_matter_percent":""
}
3. Capture Relationships
Does one user have multiple profiles? Are crops tied to a farm? These relationships matter. Represent them as nested data or separate tables depending on your approach.
4. Think in JSON First
Model data in JSON before deciding on a database. JSON forces you to think structurally and flexibly. Later, you can map it to SQL, NoSQL, or GraphQL types as needed.
5. Identify Optional, Derived, and Calculated Fields
Look out for:
Fields that only appear in some conditions (e.g., "Harvest Quality" only if yield > 0)
Computed fields like "Growing Period"
Hidden metadata like timestamps, creator ID, etc.
6. Ask: What Actions Happen Here?
For each page, identify the verbs:
Is the user submitting something?
Editing records?
Viewing analytics?
This helps you figure out which fields are editable, which are read-only, and which are derived from logic.
⚠️ Things to Watch Out For
Pitfall | What to Do |
Implicit relationships | Clarify them explicitly in your model |
Overlapping fields across pages | Normalize where appropriate |
Magic numbers or units | Always annotate fields (e.g., “UGX/kg”) |
Static UI labels vs. actual data | Ignore headings; focus on actual content |
🧩 Real-World Example: From Interface to Model
Imagine a “Market Forecast” page that shows:
Crop Type: Maize
Forecast Date: May 29th, 2025
Forecast Details: "Maize prices expected to rise due to demand surge..."
You could extract this:
"market_forecast": {
"crop_type": "Maize",
"forecast_date": "2025-05-29",
"forecast_details": "Prices expected to rise..."
"created_at" : "2025-05-29",
"updated_at" : "2025-05-29"
}
By repeating this across all modules, you’ll soon build a full system schema even if the original developers are long gone.
Tip: Metadata like timestamps, created_at, and updated_at are always important to have.
✅ Benefits of Fleshing Out Your Data Models
Improved clarity for backend and frontend teams
Faster API development
Better test coverage and edge case handling
Enables data validation, form generation, and documentation automation
Supports migration planning and third-party integrations
📌 Final Thoughts
Extracting data models isn't just an academic exercise; it’s foundational. Whether you're working in a startup, upgrading a cooperative's farming platform, or launching a SaaS tool, data is the backbone of your system.
By carefully reverse-engineering models from existing interfaces, you gain deep control over how your platform functions, scales, and evolves.
Next Steps:
Try this exercise on any admin dashboard or legacy site you're currently working on. Start building the models page-by-page, and soon you'll have a blueprint that your whole team can build confidently upon.
Subscribe to my newsletter
Read articles from Mayimuna Kizza Lugonvu directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Mayimuna Kizza Lugonvu
Mayimuna Kizza Lugonvu
Hi, I’m Mayimuna, but you can call me Muna. I am a Software Engineer from Uganda with a passion for solving real-world problems through code, creativity, and storytelling. I've started my writing journey, and I hope to write about systems design, development workflows, what it’s like building tech in emerging markets, and everything software-related. Currently, I’m exploring cloud-native technologies, digital empowerment in agriculture, and AI. My other interests include digital marketing and language learning (안녕하세요 — I’m learning Korean!).