Advice for Open Sourcing Scraped Data

38 views
Skip to first unread message

Murray Cox

unread,
Jan 1, 2015, 12:50:02 PM1/1/15
to betan...@googlegroups.com
Happy New Year everyone!

I'm working on a project where I've scraped data from a company's web-site that I think will be in the civic interest to make available widely. I'm also working on some visualisation tools that will provide an initial way of looking at the data (stay tuned for later in January).

Can anyone offer advice on the following:-

- What would be the best Open Source license for the data, and do I need one?

- I was just going to post the data in a GitHub repository with a data dictionary, and link to the repository. Is there a better place for it?

- Are there any legal disclaimers that I should include with the data, to protect myself and any users of the data? 

I'm not looking for legal advice (yet!), but note, the scraped data is all publicly available on the site, without logging in, however, there may be the possibility that the company that runs the site might point to the Terms of Use and try to restrict access to the data. I am not seeking financial or competitive benefit from the data, and am operating under a "Fair Use" assumption.


Thank you in advance...



--  
Murray Cox
Documentary Photographer

Noel Hidalgo

unread,
Jan 2, 2015, 3:10:41 PM1/2/15
to betan...@googlegroups.com
Responses:

2. For now, GitHub. We're working on a community run data portal and that should be up by the end of Jan.
3. The CC-0 takes are of that for you.

What data are you working on?

n

--
This is the BetaNYC Developers list. You can find projects at < projects.betanyc.us > or ideas at < betaNYC.ideascale.com >.
 
BetaNYC is committed to hosting safe and open spaces for all. By participating in this space you are committing yourself to BetaNYC's Code of Conduct and Anti-Harassment Policy. < bit.ly/betanyc-coc >
---
You received this message because you are subscribed to the Google Groups "BetaNYC-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to betanyc-dev...@googlegroups.com.
To post to this group, send email to betan...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/betanyc-dev/CA%2Bq1JxeDTNMGLNDS%2BC764Y9TPROMvweSKRmkE9Q_M49DgEyPLA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Murray Cox

unread,
Jan 2, 2015, 8:36:45 PM1/2/15
to betan...@googlegroups.com, mur...@murraycox.com
Hi Noel! (I'm posting against my original thread because I wasn't subscribed properly to the group; fixed now).

Thanks for the links to the creative commons license and advice about Github (can anyone else add to this?).

Re the data set I'm working - I spoke to you about it a few weeks ago at a BetaNYC event :-). I'd prefer not to discuss it publicly yet so that I can ensure the data gets released with the relevant context (but happy to talk about it privately if you are interested).


On Friday, January 2, 2015 3:10:41 PM UTC-5, Noel Hidalgo wrote:
Responses:

2. For now, GitHub. We're working on a community run data portal and that should be up by the end of Jan.
3. The CC-0 takes are of that for you.

What data are you working on?

n
On Thu, Jan 1, 2015 at 12:50 PM, Murray Cox <mur...@murraycox.com> wrote:
Reply all
Reply to author
Forward
0 new messages