What's the aim of this article? Firstly, to understand the various commonly used reference coordinate frames. Secondly, to understand the conversion from 3D world coordinate frame to image pixels. Let’s break it down in layman's terms.

Understanding the World Coordinate Frame

We see an object (e.g., a Red Volkswagen Polo GT) in reality. Now we want to know its position vector (Origin to center of mass of Polo). What is the origin (reference point) in our case? We need to decide that. We see a man sitting on a bench across the road. Let’s assume that he is the origin or the reference point for our 3D world coordinate system. Yes, we call it a 3D world coordinate system!

Let’s go into more detail about the origin. Where are the X, Y and Z axes of it? More assumptions: His left direction is the X-axis, the direction of his head facing (Front) is the Y-axis and the direction perpendicular to the ground is the Z-axis (why? To follow the Right-hand rule). Now technically we can find the position vector of the Red Polo GT with respect to the sitting man(origin in the world frame).

Understanding the Camera Coordinate Frame

Now where am I in this whole thing? Technically we will find that, but first, we'll make another assumption, i.e. I have a camera and I can see the world through it. I also take the liberty to assume that I know my position vector with respect to the sitting man/origin.

Now I am going to define another coordinate frame: the camera coordinate frame.

By the way, what does it mean to define a coordinate system? For easy understanding: it means that we define a new origin and its corresponding X, Y and Z axis (orientations). Then measure everything with respect to the defined origin. Do you know the exact difference between a coordinate system and a coordinate frame?

Now coming back to defining the camera coordinate system: The center of the camera lens - the principal point is the origin of it. The direction to the right of the camera is the X-axis, the direction to the down towards the ground is the Y-axis (assuming that I am almost standing perpendicular to the ground) and the direction towards the principal axis is the Z-axis as shown in the picture.

Understanding the Image Coordinate Frame

You may have a question that the images we get are in 2D pixels, but till now we only talked about 3D coordinate frames (world coordinate frame, camera coordinate frame). If not I suggest you think that and be more curious!

In our example, what do I see in the camera? I can see an image of a Red Volkswagen Polo GT on a road as I looked towards it and clicked a picture on my camera. As we all know this image is 2D, i.e., we get a matrix of pixels and its corresponding intensity (0-255 for grayscale images). This 2D is a projection of the real 3D world into a (M x N) 2D image. Let’s say we have an image of size 100 x 250 pixels (100 in height and 250 in width).

Now we can define another and final: image coordinate frame.

The left-top point of the image is defined as the origin. The right direction along with the width of the image is the X-axis and the direction to the bottom of the image along the height is considered as the Y-axis.

What is the goal to assume and understand these things?

Now can you tell me where this Polo GT is with respect to the sitting man? Yeah, that’s the goal⚽ or rather I’d say that’s a huge six🏏 as the whole match of computer vision has just started…

Trick to visualize what is a coordinate frame.

For a moment try to imagine yourself as Nani from the Eega movie. I mean in the fly body. Now if you want to comprehend visually what it means - the world coordinate frame that was defined earlier - the man sitting on a bench, then become an insect and fly towards the sitting man. Carefully sit on his nose facing away from his eyes. What do you see? Where is the Red Polo GT from me? Many computer vision applications want this type of answer.

what does it mean to define a coordinate system?

A coordinate system is a method for identifying the location of a point on a plane or in space. It uses two or more numbers, called coordinates, to specify the position of the point relative to a fixed reference point, called the origin. The first number represents the distance between the point and the origin along a horizontal axis, while the second number represents the distance between the point and the origin along a vertical axis. There are many different types of coordinate systems, each with its own set of rules for assigning coordinates to points. Some common examples include: Cartesian coordinate system, Polar coordinate system, Spherical coordinate system. A complete definition of a coordinate system requires specifying: (i) The projection that is used to draw the plane or space on which points are located. (ii)The location of the origin. (iii) The units that are used to measure distances from the origin.

Do you know the exact difference between a coordinate system and a coordinate frame?

In mathematics, a coordinate system is a set of mathematical rules used to assign numerical values to points in space. It is a mathematical concept that describes how to represent points in space using numbers. A coordinate frame is a set of basis vectors that are used to describe the position and orientation of an object in space. It is a physical concept that describes how to represent the position and orientation of an object in space using vectors. In other words, a coordinate system is a mathematical tool that assigns numerical values to points in space, while a coordinate frame is a physical tool that describes the position and orientation of an object in space using vectors.

World to Image - 1

Table of contents