How to digitally scan a line plot (with pictures)

Greetings everyone! My name is Mario and this is my first blog for this website. This blog was created for my Applied Physics 186 course under Dr. Maricor Soriano, and since I haven’t blogged for a while (not since 2011), I’m not really sure if I’m more excited or worried. I’ll just see as I write in this blog.

Our second activity (first for the blog) for the AP 186 course is to digitally scan a line plot. This means that we convert a scan of a line plot, which is composed of pixels, into meaningful physical variables. In other words, we are attempting to retrieve the actual data from a picture of the graph of the data.

I. Finding a hand drawn graph

Since this is only the second activity, we started out with something simple for the plot. The graph should only contain one plot, so as to remove confusion from many plots. Also, the plot must be from an old thesis or dissertation, old enough that the plot is drawn by hand or by an xy-plotter, to  remove the chance that the actual data is easily accessible.

After a quick search in my research laboratory’s library, I found a suitable graph from Ms. Linda Posada’s thesis from 1986, entitled “Modulation Characteristics of Bundle-Integrated-Guide Dynamic Single Mode (BIG-DSM) Semiconductor Lasers.”

Scan of the hand drawn plot as well as the cover of Dr. Linda Posadas' thesis that it was found in.
Scan of the hand drawn plot as well as the cover of Dr. Linda Posadas’ thesis that it was found in.

II. Conversion Factor

So the problem is – how do we extract or retrieve the actual values of the graph using this scan composed of pixels? In other words, how do we convert the pixel coordinates of a point on the graph into its meaningful physical values? Lucky for us, this graph has equally spaced tick marks; this means that the location of a point measured from the origin of the graph is directly related to the physical value of the point. This would not be the case if the tick marks were in log scale, as the location would be non-linearly related. This key property is depicted below.

Importance of equally spaced tick marks

The physical values of points A and B can easily be obtained since they are aligned with the tick marks. But what about point C that is located between tick marks? For the case of equally spaced tick marks, it can be inferred that since point C is farther by a factor of 2.5 than A, C is located roughly at 2.5*1 or simply 2.5. But for the case of a log scale, this reasoning would lead to a value of 2.5*10 = 25 if we use A as a reference, or 0.25*100 if we use B. In both cases, we obtain 25, which is obviously wrong since it should be greater than 100.

The method used above is called Ratio and Proportions, and is an excellent way to estimate the physical values of points that are between tick marks. The method works by deriving a conversion factor from a point with known values. In our case, we will derive a conversion factor from pixel coordinates to physical values using the origin and tick marks as reference. We count the number of pixels between all tick marks, and average, to get the conversion factor. This is to be done for the horizontal and vertical tick marks. The respective conversion factors I obtained are roughly 39.5 pixel/horizontal unit and 125.4 pixels/vertical unit. (The decimals are due to the averaging process.)

III. Plot Reconstruction

Equations to convert pixel coordinates to meaningful variables were obtained from the conversion factors (CF). These equations are namely:

x,y are meaningful physical values subscript 'pix' indicates pixel coordinates subscript 'o' indicates origin of the graph
x,y are meaningful physical values
subscript ‘pix’ indicates pixel coordinates
subscript ‘o’ indicates origin of the graph

Thus, to reconstruct the graph, I manually determined the pixel coordinates of various points along the plot using MS Paint. I also determined the pixel coordinates of the origin of the graph. Using this data, I used the equations above to determine the meaningful physical values.

To check the validity of the results, I plotted the physical values I calculated and connected the points smoothly using MS Excel. I then used the original image as the chart background and adjusted the size. A nice fit between the reconstructed graph and the original image means that the physical values obtained are accurate. Below is the picture of the overlaid graphs.

Calculated physical values overlaid the image.
Calculated physical values overlaid the image.

A good fit is observed, showing that the reconstruction is valid.

IV. Limitations of the Ratio and Proportion technique

As I performed the activity, I noticed some limitations. Firstly, it is very important that the axes of the graph are as horizontal and vertical as possible (i.e. aligned with the edges of the image), so as to simplify the conversion equation. It can still be compensated however by transforming to a rotated pixel coordinate system.

Secondly, it is very important that the tick marks are equally spaced. This simplifies the conversion into just a scaling problem, which is easily performed using just excel.

Lastly, a better fit is obtained if the pixel coordinates obtained from the curve are not equally spaced. That is, the sharper the curve, or the more rapidly the curve changes, the more points must be taken. Taking equally spaced points would result in the following reconstruction:

Overlaid2

It can be observed that the reconstruction is somewhat lacking for the sharp curves.

Thus, to get a good reconstruction, great care must be taken in making the axes square with the image, and many points must be taken to get a good reconstruction. All in all, the work is quite tedious, but I still had fun in doing this activity. Would be nice to semi-automate the process though.

2 thoughts on “How to digitally scan a line plot (with pictures)

Leave a comment