Semi-Automatic Georeferencing

Georeferencing is an essential step in transforming scanned maps into usable geospatial data. The Survey of India’s 1:50,000 toposheets remain a valuable source of geographical information. However, most of these sheets are available as scanned raster images without embedded geographic metadata.

In this article, I will explain the logic and methodology behind a semi-automatic georeferencing tool I developed. While I will not be sharing the full code, I will walk through the core ideas and workflow that make the process efficient, accurate, and reproducible.

The Problem

Working with historical or scanned maps poses several challenges:

No embedded geospatial reference: The images are simply raster scans, with no connection to real-world coordinates.
Manual georeferencing is tedious: Clicking numerous ground control points (GCPs) manually in GIS software is time-consuming and prone to human error.

A solution was required that balances automation with human input, allowing quick and accurate georeferencing with minimal manual effort.

The Core Idea

The approach I developed can be summarized in one sentence:

Allow the user to select the four corners or the bounding box of the toposheet scan, match it to known geospatial extents from a pre-existing index, and automatically generate a georeferenced output.

This semi-automatic design minimizes the tedious aspects of georeferencing while ensuring the user remains in control of key decisions.

The Workflow

Let us walk through the logic of the tool step-by-step.

1. Load and Display the Image

The tool opens a scanned toposheet image and displays it on a simple canvas. The user can zoom, pan, and navigate the image comfortably. This allows for precise corner selection.

2. User Clicks on Four Corners

The user clicks on the four corner points of the actual map within the scanned image. These corner points serve as reference locations in the pixel space of the image.

Note: Clicking on four corners simplifies the process because most Survey of India toposheets have a well-defined rectangular boundary, making corner-based referencing practical and highly accurate.

3. Input the Toposheet ID

After selecting the corners, the user is prompted to enter the unique ID of the toposheet (for example, 47B/13). This ID uniquely identifies the map sheet in the Survey of India indexing system for the 1:50,000 scale maps.

4. Lookup Geospatial Extent

The tool uses a pre-existing geospatial index file (stored in formats such as GeoJSON or shapefile) that contains the boundaries for each toposheet ID. Upon entering the ID, the tool searches for the ID in its database and fetches the geographic extent (bounding box) of that particular sheet.

5. Compute the Transformation

With both:

The pixel coordinates (from the user clicks)
The real-world coordinates (from the index file)

The image is then cropped to the user-defined bounding box (the clicked corners), eliminating another tedious and repetitive task faced by GIS students and professionals.

The tool will compute a GDAL-based transformation algorithm that maps the pixel space to geographic space which will correctly align the cropped image to real-world extent.

The geospatial transformation is applied to this cropped image, converting it into a properly georeferenced raster.

6. Export as GeoTIFF

Finally, the tool saves the georeferenced output as a GeoTIFF file. This file can now be directly imported into any GIS software for further analysis or overlay with other spatial data.

GeoTIFF file displayed against a background of satellite imagery

Technologies Utilized

While the focus here is on logic rather than implementation, it is worth mentioning the key technologies that power this tool:

Technology	Purpose
PyQt5	User interface, image display, and interaction
GDAL	Processing georeferenced raster files
Geopandas	Reading the toposheet index file
NumPy & PIL	Image manipulation

A big thank you to https://deeppradhan.heliohost.org/ for the 50K OSM Map Index.

Advantages of This Approach

Efficient: Georeferencing takes only a few clicks and a toposheet ID entry.
User-Controlled: The user visually selects corners, reducing automation errors.

Potential Extensions

While the current system balances automation and control, there are several possible future enhancements (some of which I have already started work on):

Automated corner detection using computer vision which would make bulk-processing toposheets a lot faster (Read more about my approach at this)
Enhanced distortion correction beyond simple affine transformations.
User-defined CRS options

Conclusion

Through this semi-automatic method, I have significantly streamlined the process of georeferencing toposheets. By combining simple user interaction with automated transformation calculations, it becomes possible to efficiently digitize large collections of historical maps while maintaining high accuracy.

This approach serves as a practical bridge between fully manual georeferencing and complex automated systems, making it highly suitable for academic, research, and professional GIS applications.

Semi-Automatic Georeferencing: My Approach