How the Data loading works in mnist example: tfjs-examples/mnist/data.js

486 views
Skip to first unread message

Vignes

unread,
Apr 30, 2018, 6:06:42 AM4/30/18
to TensorFlow.js Discussion
Hi,

I'm trying to understand the data.js in the mnist example (https://github.com/tensorflow/tfjs-examples/blob/master/mnist/data.js).  I know it is loading a single PNG file that contains 65K images in the form of 2D array.  I need to load normal images from different folders using a JSON input file, which has the URL of each file and folder.

The problem is that I'm bit lost, how the data.js is working from the line 72 onward.  If any one add more comments on this file explaining how it work step by step would be great.

Thanks
Regards,
Vignes

emmanuel chappat

unread,
Apr 30, 2018, 6:15:18 AM4/30/18
to TensorFlow.js Discussion
Code on lines from 72 to 76 do 2 things: first it only keep the values form the red channel (since the samples are black an white) second it normalize the values to keep them in between 0 and 1 (instead of 0 to 255). Then the thing is stored into a large TypedArray.

The rest of the file deals with generating randomised mini batches from that TypedArray ( and labels which are also stored in a typed array)

Nikhil Thorat

unread,
Apr 30, 2018, 9:14:53 AM4/30/18
to emmanuel chappat, TensorFlow.js Discussion
What Emmanuel said is absolutely correct!

We also expose a fromPixels method here which you can load an image into a Tensor: https://js.tensorflow.org/api/0.10.0/#fromPixels

We don't use it in this demo because the entire dataset is too large to fit into memory.

We're also working on a data API now which will streamline this process so you don't have to manually do this type of parsing.

--
You received this message because you are subscribed to the Google Groups "TensorFlow.js Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfjs+unsubscribe@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/tfjs/.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfjs/d9e5d337-a7ff-4d3d-8642-8afb182c5909%40tensorflow.org.

Kevin Scott

unread,
May 29, 2018, 7:22:30 PM5/29/18
to TensorFlow.js Discussion, emmanuel...@gmail.com
I had some similar question to Vignes, in particular around how Typed Arrays work in Javascript.

In particular, I was confused as to how it looks like we set "datasetBytesView" on line 63, and then seem to discard that variable.

Turns out (apologies to those who already know this stuff!) this is leveraging the buffer "datasetBytesBuffer", which is set on line 55.

Armed with this knowledge, I was able to figure out how to write a custom parsing function to go directly from URL to array buffer, bypassing the DOM:

const imgRequest = fetch(MNIST_IMAGES_SPRITE_PATH).then(resp => resp.arrayBuffer()).then(buffer => {
 
return new Promise(resolve => {
   
const reader = new PNGReader(buffer);
   
return reader.parse((err, png) => {
     
const pixels = Float32Array.from(png.pixels).map(pixel => {
       
return pixel / 255;
     
});
     
this.datasetImages = pixels;
      resolve
();
   
});
 
});
});


Would love any insights into whether this approach is a good one or not. I'm looking forward to using the updated data APIs in the future!

- Kevin



On Monday, April 30, 2018 at 9:14:53 AM UTC-4, Nikhil Thorat wrote:
What Emmanuel said is absolutely correct!

We also expose a fromPixels method here which you can load an image into a Tensor: https://js.tensorflow.org/api/0.10.0/#fromPixels

We don't use it in this demo because the entire dataset is too large to fit into memory.

We're also working on a data API now which will streamline this process so you don't have to manually do this type of parsing.
On Mon, Apr 30, 2018 at 6:15 AM, emmanuel chappat <emmanuel...@gmail.com> wrote:
Code on lines from 72 to 76 do 2 things: first it only keep the values form the red channel (since the samples are black an white) second it normalize the values to keep them in between 0 and 1 (instead of 0 to 255). Then the thing is stored into a large TypedArray.

The rest of the file deals with generating randomised mini batches from that TypedArray ( and labels which are also stored in a typed array)



On Monday, April 30, 2018 at 12:06:42 PM UTC+2, Vignes wrote:
Hi,

I'm trying to understand the data.js in the mnist example (https://github.com/tensorflow/tfjs-examples/blob/master/mnist/data.js).  I know it is loading a single PNG file that contains 65K images in the form of 2D array.  I need to load normal images from different folders using a JSON input file, which has the URL of each file and folder.

The problem is that I'm bit lost, how the data.js is working from the line 72 onward.  If any one add more comments on this file explaining how it work step by step would be great.

Thanks
Regards,
Vignes

--
You received this message because you are subscribed to the Google Groups "TensorFlow.js Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfjs+uns...@tensorflow.org.

Nikhil Thorat

unread,
May 29, 2018, 7:24:37 PM5/29/18
to thekev...@gmail.com, TensorFlow.js Discussion, emmanuel chappat
Yes, that approach should work!

Just want you to know we are in the process of designing / building a data API that will make this type of streaming into a model much much saner and simpler.

Kevin Scott

unread,
May 29, 2018, 7:32:10 PM5/29/18
to TensorFlow.js Discussion, thekev...@gmail.com, emmanuel...@gmail.com
Very cool, looking forward to that API - thanks for all the hard work on this awesome library!
Reply all
Reply to author
Forward
0 new messages