existing open datasets for foods

100 views
Skip to first unread message

Chacha Sikes

unread,
Feb 18, 2012, 7:43:15 PM2/18/12
to Anthony Nicalo, Niles Brooks, open...@googlegroups.com

Langual
The USDA uses this food thesaurus framework for referring to foods.
http://www.langual.org/download/LanguaL2010/LANGUAL2010NUMERCALINDEX.TXT

and there's this from the USDA
"USDA National Nutrient Database for Standard Reference"


Library of Congress Subject Headers File
and this is the library of congress subject headers -- which does have some food stuff, but not specifically organized around food.


It will be interesting to compare the lists of foods to see what kinds of differences exist between government funded & commercially produced lists and crowd-contributed lists. And then to see what the gardeners and seedsavers care about. 

Other existing datasets that I have been looking at
  • Foodista
  • Freebase Foods, and Local Foods
  • Food Genome
And there is a reference to the PLU's which Anthony cleaned up from a Produce and Marking set

I suppose that I am sort of building crosswalks between these items -- pointing to references of a food in the different datasets.

Paolo Castagna

unread,
Mar 18, 2012, 12:26:14 AM3/18/12
to Open Food
Hi,
I work for Kasabi and, together with colleagues from the data
engineering team, I curated the data we have in the Kasabi food
dataset:

- http://kasabi.com/dataset/food/
- http://data.kasabi.com/dataset/food

You can find ~10 million statements there with data coming from:
Foodista, BBC Food, Cookbook, Cookipedia, Eat The Seasons, Get Me
Cooking, Jamie Oliver, Pong Cheese, River Cottage, Riverford, Tesco,
UKTV, ...

I've used this 'schema'/vocabulary: http://linkedrecipes.org/schema

I was working on something about cheese, you can find an initial
dataset (and code to generate that) here:
https://github.com/castagna/cheese/blob/master/data/cheeses-0.1.ttl
I am sorry ;-) but it's a lot of Java and RDF (but it can be quickly
converted/exported in other formats if needed).

Have a look.

Paolo

On Feb 19, 12:43 am, Chacha Sikes <chachasi...@gmail.com> wrote:
> *Langual*
> The USDA uses this food thesaurus framework for referring to foods.http://www.langual.org/download/LanguaL2010/LANGUAL2010NUMERCALINDEX.TXThttp://www.langual.org/download/LanguaL2010/LanguaL%202010%20Multilin...
>
> and there's this from the USDA
> "USDA National Nutrient Database for Standard Reference"http://www.ars.usda.gov/Services/docs.htm?docid=8964
>
> http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/asc/FOO...http://www.ars.usda.gov/Services/docs.htm?docid=22113
>
> *Library of Congress Subject Headers File*
> and this is the library of congress subject headers -- which does have some
> food stuff, but not specifically organized around food.http://id.loc.gov/download/
>
> It will be interesting to compare the lists of foods to see what kinds of
> differences exist between government funded & commercially produced lists
> and crowd-contributed lists. And then to see what the gardeners and
> seedsavers care about.
>
> Other existing datasets that I have been looking at
>
>    - Foodista
>    - Freebase Foods, and Local Foods
>    - Food Genome
>
> And there is a reference to the PLU's which Anthony cleaned up from a
> Produce and Marking sethttp://buzzdata.com/foodtree/produce-plus#!/overview
Reply all
Reply to author
Forward
0 new messages