Extracting data from an image of plots

80 views
Skip to first unread message

Himanshu Mishra

unread,
Feb 25, 2016, 11:30:07 AM2/25/16
to scikit-image
Hello everyone,

We have a pdf page which contains one or more figures which are two-dimensional plots of experimental results. The figures may or may not be embedded in text. Each plot has the x and y axis with their labels and unit measurements marked in the plot. Inside each figure are one or more plots, each with a different color.

How can we convert the plot into a table of corresponding x and y values (say for 100 points) ?

Looking forward to a solution,

Thank you,
Himanshu Mishra

François Boulogne

unread,
Feb 25, 2016, 11:35:49 AM2/25/16
to scikit...@googlegroups.com

> How can we convert the plot into a table of corresponding x and y
> values (say for 100 points) ?
>

http://arohatgi.info/WebPlotDigitizer/

It's released under GPLv3. There is an instance hosted by the dev.

Best,

--
François Boulogne.
http://www.sciunto.org
GPG: 32D5F22F

Himanshu Mishra

unread,
Feb 26, 2016, 11:12:07 AM2/26/16
to scikit...@googlegroups.com
WebPlotDigitizer works very well for an image with single plot. But when I try it with a research paper with text on it and more than one plot, it works miserably. I don't know if such tool is available online, but I would love to hear suggestions on how to make one.

Thank you,
Himanshu Mishra

Josh Warner

unread,
Feb 26, 2016, 6:15:06 PM2/26/16
to scikit-image
You may want to look into the work of Matt Terry (@mrterry) from SciPy 2013 and earlier. There are a number of tools he created, using what may now be outdated matplotlib interfaces but which could save you time. One of which is yoink (https://github.com/mrterry/yoink), which is particularly good with rastered data behind a color map.

Using tools like Inkscape and/or the GIMP you should be able to crop any arbitrary figure out of a scanned paper/image and transform it so it's at least relatively rectilinear. Save that out as an image and these tools start to be useful - though there may be some customization left to go depending on what type of figure you're digitizing.

Matt gave a lightning talk at SciPy 2013 about yoink: https://youtu.be/ywHqIEv3xXg?t=1890

Josh

Stéfan van der Walt

unread,
Feb 26, 2016, 7:00:26 PM2/26/16
to scikit-image
On 26 February 2016 at 15:15, Josh Warner <silvertr...@gmail.com> wrote:
Using tools like Inkscape and/or the GIMP you should be able to crop any arbitrary figure out of a scanned paper/image and transform it so it's at least relatively rectilinear. Save that out as an image and these tools start to be useful - though there may be some customization left to go depending on what type of figure you're digitizing.

If you have a PDF that wasn't scanned (i.e., straight out of LaTeX), you can export the figures directly.

Stéfan
Reply all
Reply to author
Forward
0 new messages