Hi, all - going to try to bump this to plots-dev again -- sounds like you've made a lot of progress already :-)You can also hit the JSON/XML api on a per-tag basis: http://spectralworkbench.org/tag/cflThe main reason to develop tools in JavaScript is so that other people can easily use them on the site without downloading new software or anything.JeffOn Fri, Jul 25, 2014 at 1:16 PM, Ben Gamari <bga...@gmail.com> wrote:Bryan <btbo...@gmail.com> writes:I would be interested to hear more about what you are thinking of
> Btw I'm using "tarball" figuratively. Ben, I assumed you meant a literal
> tarball of data files. We can provide lumpsums of data in efficient ways
> without resorting to tarballs of data files which require preprocessing.
>
here. Please expand when you have a chance.
This is excellent, although I would argue that this isn't bulk _data_,
> So if you're arguing for lumpsum data, yeah, totally we should continue to
> support that (apparently it's already there).
>
it's bulk _metadata_. If I want the actual meat of the data (that is,
the images), I need to write a script to parse the metadata, figure
out the images contained in the corpus, and crawl them, taking care to
rate-limit my requests, handling errors, etc. This isn't by any means
_difficult_ in any language worth its salt but it is superfluous work
and poses another small barrier to entry for those seeking to work with
the corpus. After all, in the end all I need for my analysis is a
directory full of images and their associated metadata.
I'm not sure I understand your objection to the suggestion of a
> But a zip of images? yuck.
>
tarball. This is a common technique for distributing datasets, as
I pointed out earlier. The reason for this is simple:
If I'm trying to work with a data set, the interface to access it is the
last thing I want to worry about. The sooner I can get a directory of
files, the sooner I can move on to the actual problems I want to work
on. `curl $URL | tar -jx` is the quickest way I know of to make this
happen. Yes, the data then needs preprocessing but this would have been
necessary regardless and there is no shortage of tools for munging text,
indexing JSON, and the like.
I would be happy to contribute a script to generate these dumps if
others agree that this is a useful exercise.
Cheers,
- Ben
--
Post to this group at plots-sp...@googlegroups.com
Public Lab mailing lists (http://publiclab.org/lists) are great for discussion, but to get attribution, open source your work, and make it easy for others to find and cite your contributions, please publish your work at http://publiclab.org
---
You received this message because you are subscribed to the Google Groups "plots-spectrometry" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plots-spectrome...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "plots-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plots-dev+...@googlegroups.com.
{"spectrum":{"photo_position":"false","version":null,"title":"d3-rw-s0","id":30872,"updated_at":"2014-07-16T21:41:57Z","photo_file_name":"capture.png","parent_id":null,"control_points":null,"baseline_content_type":null,"notes":" -- (Cloned calibration from <a href='/spectra/show/26204'>test2</a>)","sample_row":1,"photo_content_type":"image/png","client_code":"","baseline_file_name":null,"user_id":3081,"slice_data_url":null,"reversed":true,"data":"{\"lines\":[{\"average\":22,\"b\":32,\"g\":13,\"r\":23,\"wavelength\":914.047383561644},{\"average\":21,\"b\":31,\"g\":12,\"r\":22,\"wavelength\":912.936793578767},
How about:
http://spectralworkbench.org/raw/<spectrum_id>.pngalthough sometimes they are jpgs.BTW, I'm in support of any and all API people want -- the main limitation on the API is developers actually building it out, or using it, or whatever... not a lot of developers on this codebase unfortunately, although the GSoC program is changing that this summer since Sreyanth is working on some exciting stuff. But the API is definitely neglected, so I would be very happy to accept pull requests that improve or expand it!The API (both server-side and client side) is the future of the platform - any time someone develops a new feature, a new way to clean, process, compare, analyze etc, my belief is that the best way to enable others to use it is to build the tools into the SW platform. The JavaScript API is a low-barrier way to do that, but if there are things that really can't be done there, let's do them server-side!Jeff
Yagiz makes an important point regarding calibration.
By chance just this morning in a coffee house conversation I was just reminded of a (naming no names) very well funded government program for automated recognition of very important things that worked very well at the stage of training the classifier but performed miserably in the field--if I remember, upon analysis it turned out something like the classifier distinguished light and shadow very well, and there were many more shadows in the field than in the training set.
My take home thoughts are 1) be careful, it's possible to accidentally build a classifier that says more about the calibration than about the environmental sample and 2) to aid interpretation of spectra from the spectral workbench it would be useful if the spectra could be linked to comprehensive metadata; Yagiz's studies of Olive Oil and Red Wine adulteration are good examples of this.
On Friday, July 25, 2014 7:31:53 AM UTC-7, ygzstc wrote:Hi Daniela,
I was thinking a classifier (SVM for example) application would be nice to classify different spectral data. Or may be before that, PCA or PLS regression would be nice to have as well. But the main problem is, data collected by different users have very different calibration, intensity etc related issues which makes designing a classifier difficult I guess.
On the other hand, once you have access the data form Public Lab's website, you can download and play with it in your own PC/Laptop as well.
Cheers,
Yagiz
On 7/25/2014 7:54 AM, Daniela Antonova wrote:
Post to this group at publicla...@googlegroups.comHi all :)--
I am looking to use my machine learning expertise to contribute some tools for automated analysis of data, probably as part of the workbench and I was hoping to get some opinions on what might be useful.
In particular, how could such tools be integrated with the workbench so that people get the most out of them?
Looking forward to hearing your views!
Daniela
Public Lab mailing lists (http://publiclab.org/lists) are great for discussion, but to get attribution, open source your work, and make it easy for others to find and cite your contributions, please publish your work at http://publiclab.org
---
You received this message because you are subscribed to the Google Groups "The Public Laboratory for Open Technology and Science" group.To unsubscribe from this group and stop receiving emails from it, send an email to publiclaborato...@googlegroups.com.
--
Post to this group at publicla...@googlegroups.com
Public Lab mailing lists (http://publiclab.org/lists) are great for discussion, but to get attribution, open source your work, and make it easy for others to find and cite your contributions, please publish your work at http://publiclab.org
---
You received this message because you are subscribed to the Google Groups "The Public Laboratory for Open Technology and Science" group.
To unsubscribe from this group and stop receiving emails from it, send an email to publiclaborato...@googlegroups.com.