Semi-Automatic Georeferencing: My Approach


Georeferencing is an essential step in transforming scanned maps into usable geospatial data. The Survey of India’s 1:50,000 toposheets remain a valuable source of geographical information. However, most of these sheets are available as scanned raster images without embedded geographic metadata.
In this article, I will explain the logic and methodology behind a semi-automatic georeferencing tool I developed. While I will not be sharing the full code, I will walk through the core ideas and workflow that make the process efficient, accurate, and reproducible.
The Problem
Working with historical or scanned maps poses several challenges:
No embedded geospatial reference: The images are simply raster scans, with no connection to real-world coordinates.
Manual georeferencing is tedious: Clicking numerous ground control points (GCPs) manually in GIS software is time-consuming and prone to human error.
A solution was required that balances automation with human input, allowing quick and accurate georeferencing with minimal manual effort.
The Core Idea
The approach I developed can be summarized in one sentence:
Allow the user to select the four corners or the bounding box of the toposheet scan, match it to known geospatial extents from a pre-existing index, and automatically generate a georeferenced output.
This semi-automatic design minimizes the tedious aspects of georeferencing while ensuring the user remains in control of key decisions.
The Workflow
Let us walk through the logic of the tool step-by-step.
1. Load and Display the Image
The tool opens a scanned toposheet image and displays it on a simple canvas. The user can zoom, pan, and navigate the image comfortably. This allows for precise corner selection.
2. User Clicks on Four Corners
The user clicks on the four corner points of the actual map within the scanned image. These corner points serve as reference locations in the pixel space of the image.
Note: Clicking on four corners simplifies the process because most Survey of India toposheets have a well-defined rectangular boundary, making corner-based referencing practical and highly accurate.
3. Input the Toposheet ID
After selecting the corners, the user is prompted to enter the unique ID of the toposheet (for example, 47B/13). This ID uniquely identifies the map sheet in the Survey of India indexing system for the 1:50,000 scale maps.
4. Lookup Geospatial Extent
The tool uses a pre-existing geospatial index file (stored in formats such as GeoJSON or shapefile) that contains the boundaries for each toposheet ID. Upon entering the ID, the tool searches for the ID in its database and fetches the geographic extent (bounding box) of that particular sheet.
5. Compute the Transformation
With both:
The pixel coordinates (from the user clicks)
The real-world coordinates (from the index file)
The image is then cropped to the user-defined bounding box (the clicked corners), eliminating another tedious and repetitive task faced by GIS students and professionals.
The tool will compute a GDAL-based transformation algorithm that maps the pixel space to geographic space which will correctly align the cropped image to real-world extent.
The geospatial transformation is applied to this cropped image, converting it into a properly georeferenced raster.
6. Export as GeoTIFF
Finally, the tool saves the georeferenced output as a GeoTIFF file. This file can now be directly imported into any GIS software for further analysis or overlay with other spatial data.
Technologies Utilized
While the focus here is on logic rather than implementation, it is worth mentioning the key technologies that power this tool:
Technology | Purpose |
PyQt5 | User interface, image display, and interaction |
GDAL | Processing georeferenced raster files |
Geopandas | Reading the toposheet index file |
NumPy & PIL | Image manipulation |
A big thank you to https://deeppradhan.heliohost.org/ for the 50K OSM Map Index.
Advantages of This Approach
Efficient: Georeferencing takes only a few clicks and a toposheet ID entry.
User-Controlled: The user visually selects corners, reducing automation errors.
Potential Extensions
While the current system balances automation and control, there are several possible future enhancements (some of which I have already started work on):
Automated corner detection using computer vision which would make bulk-processing toposheets a lot faster (Read more about my approach at this)
Enhanced distortion correction beyond simple affine transformations.
User-defined CRS options
Conclusion
Through this semi-automatic method, I have significantly streamlined the process of georeferencing toposheets. By combining simple user interaction with automated transformation calculations, it becomes possible to efficiently digitize large collections of historical maps while maintaining high accuracy.
This approach serves as a practical bridge between fully manual georeferencing and complex automated systems, making it highly suitable for academic, research, and professional GIS applications.
Subscribe to my newsletter
Read articles from Jason D'souza directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Jason D'souza
Jason D'souza
As a student geologist with a passion for tech, I love building with code and turning ideas into real-world projects.