Creative Commons genealogy data

176 views
Skip to first unread message

Robert Hoare

unread,
Dec 28, 2012, 5:20:14 PM12/28/12
to root...@googlegroups.com
Is anybody aware of any substantial genealogy data sets for SOURCE records that are licensed under Creative Commons (CC-BY-SA), or public domain?  I'm looking for test data for a genealogy application.

So birth, marriages, censuses (for any country) would all be good, in text, or any database format, or on crawlable web pages (not images).  Directories or electoral lists are not so useful (no relationships between people). Death records (like the SSDI, which I have) are only moderately useful unless there's also some other data for the same area or they mention another person.

Conclusion data sets like family trees or gedcoms are not needed for these tests (which involve linking up sources, rather than looking at previous links).  Ideally also not with a "NC" (no commercial use) limitation on the Creative Commons license as that would limit who I could share the results with.  

Best I've been able to come up with so far are the births, marriages and deaths of soap opera characters on pages like:
http://en.wikipedia.org/wiki/List_of_births,_marriages,_and_deaths_in_Coronation_Street .  Although using fictitious characters does avoid privacy problems, real data would be better. :-)

Rob

Wayne Pearson

unread,
Dec 28, 2012, 6:25:00 PM12/28/12
to root...@googlegroups.com
Because the site is so old, there's no typical license information here:


You might try contacting him and seeing what licensing he provides - it's a good-sized dataset, for sure.

--
  Wayne


--
 
 
 

Robert Hoare

unread,
Dec 28, 2012, 7:53:59 PM12/28/12
to root...@googlegroups.com
Thanks Wayne, but to just to clarify, I'm not looking for family tree data (people linked into trees).  

I'm searching for test datasets of unlinked source records.  In other words, lists of events such as births, with a date, name of the child, and one or more parents.  Or marriage records.  Ideally more than one type of data for the same area and era.

There's plenty of this data around of course, but I'm trying to locate something with a clear license. The South Dakota births have "The State has made certain information available to the public. Anyone may view, copy or distribute information found at such locations for personal or informational use without owing an obligation to the State. " which is sort-of OK, since most uses are "informational".  They only have births though, and tricky to download it.  

Maine has some nice downloadable marriage and death data (if I can convert from the ancient MS Access format) but no license other than a generic site copyright (I'll email them to find what the exact rights are).  I'll work through the various states (and Canada, Australia etc) and see if I can find anything with a modern license.

Rob

Dallan Quass

unread,
Dec 28, 2012, 9:02:15 PM12/28/12
to root...@googlegroups.com
If you're trying to generate family trees by matching relatives in
records, you may be interested in the Norwegian Historical Population
Register, which is a pilot project for generating Norwegian family trees
by matching people across censuses. Here are a couple of links:

http://www.nr.no/nb/projects/norwegian-historical-population-register-hpr

http://www.rhd.uit.no/nhdc/hpr.html

They don't have any data available in a CC license that I know of, but
they may be willing to share data with you.

-dallan
http://www.werelate.org/wiki/User:Dallan
> --
>
>
>

Robert Hoare

unread,
Dec 29, 2012, 12:00:06 AM12/29/12
to root...@googlegroups.com
Thanks Dallan but that Norway project is a few years away from releasing anything yet (the first chunk of 14 years will be ready in May 2014).  Still, good to know they're planning to use an open license for it (although that could mean "personal or academic use only").

I'm not generating trees, but I am looking at algorithms for suggesting matches between the source records on the record linking part of my site. It would be nice to have a substantial bit of test data that could be included on the site as example data during beta testing (I have plenty of data, finding something which could be made visible to the public is the problem).

So far, I've been unable to find any suitable data which clearly allows reuse, including on the various "open data" portals.  Even most (all?) US states and counties don't consider vital records "public data" (to make it public domain).  All the crowdsourcing transcription projects that I'm aware of don't permit any reuse of what their volunteers have produced.

Rob

Colin Spencer

unread,
Dec 29, 2012, 2:05:30 AM12/29/12
to root...@googlegroups.com
Rob,

Do you want to crawl an on-line dataset or download it?

If the latter then I can point you in the direction of some UK PR transcriptions in Excel format that may be suitable.

Colin

Robert Hoare

unread,
Dec 29, 2012, 2:25:00 AM12/29/12
to root...@googlegroups.com
Hi Colin,

I don't have any problem with crawling data but downloading is usually preferable, doesn't hit the other server so hard and is generally easier to parse.

I'm certainly interested in some UK parish record transcriptions, if the licence they can be used under is clear.  Thanks!

Rob

Enno Borgsteede

unread,
Dec 29, 2012, 6:44:22 AM12/29/12
to root...@googlegroups.com
Hi Robert,
> Is anybody aware of any substantial genealogy data sets for SOURCE
> records that are licensed under Creative Commons (CC-BY-SA), or public
> domain? I'm looking for test data for a genealogy application.
>
> So birth, marriages, censuses (for any country) would all be good, in
> text, or any database format, or on crawlable web pages (not images).
> Directories or electoral lists are not so useful (no relationships
> between people). Death records (like the SSDI, which I have) are only
> moderately useful unless there's also some other data for the same
> area or they mention another person.
Van Papier naar Digitaal (from paper to digital) is a Dutch project run
by genealogists. It contains pictures of lots of sources, mostly
available as PDF downloads, and is partly indexed. There is no license
mentioned on site, and it is Dutch only, but it looks like public domain.

http://vpnd.nl/

If you need help getting around, please let me know.

cheers,

Enno

Colin Spencer

unread,
Dec 29, 2012, 8:00:12 AM12/29/12
to root...@googlegroups.com
Robert

In that case go onto Yahoo Groups and select Somerset Family you will have to join the group, it is free. Go to the files section and read the licence agreement, I think it will be OK for what you need and you can then download the transcriptions you require. There are many other Somerset Groups who have similar PR transcriptions available under the same licence.

Hope it helps

Colin

Robert Hoare

unread,
Dec 29, 2012, 3:59:21 PM12/29/12
to root...@googlegroups.com
Hi Colin,

I've just had discussions with the people running that group and unfortunately there's a "no commercial use" clause imposed by their data provider (SHC).  I don't know if that's made clear on the files themselves as I didn't get as far as becoming a member.  I can't be sure my site would always be fully non-commercial (might need to accept donations or run a few affiliate ads to cover costs eventually), so I can't use that data.

Many thanks anyway for the suggestion, I do think it's one of the smaller groups that is likely to be moving towards more flexible licences, where their data providers permit.  Just need to find them!

Rob

Colin Spencer

unread,
Dec 29, 2012, 4:22:53 PM12/29/12
to root...@googlegroups.com
Rob

I hadn't realised that you wanted to charge for access to the data. The only other source that I can suggest is http://sortedbyname.com/ but this will be a scrape job and again as it is public domain I am not sure they will let you charge for it.

Colin

Robert Hoare

unread,
Dec 29, 2012, 5:04:16 PM12/29/12
to root...@googlegroups.com
As I said, any site that the data appears on may eventually have some method to keep it afloat, or the data may be used to beta test the software for a site that does that. Highly unlikely to be any form of subscription, but maybe some ads or wikipedia-like "donations" (but all access always free).  That's commercial use.  

Using "non-commercial use only" data to write an academic paper that then gets the author a better position at another university, or a speaking tour, I would also consider "commercial use", but the academic world doesn't notice the hypocrisy there. :-) 

Sortedbyname is scraped data itself, and skirts around the law to put it politely (ignores EU database directive).  That's exactly the sort of grey area I'm trying to avoid by looking for data that actually permits reuse by having a relevant clearly stated licence.

Just to clarify your final point, if data is (really) public domain, then that means that nobody has a copyright and nobody can control what is done with the data.  This is why licenses like Creative Commons were established, to allow use of data while still retaining copyright and control over the conditions under which it is used.

It's actually a key issue for genealogy software, since so much of what genealogy software does is process data that has come from a source that has rights attached.  It's something that needs to be considered right at the design (or even feasability) stage.

Rob

Ben Brumfield

unread,
Dec 29, 2012, 9:47:55 PM12/29/12
to root...@googlegroups.com
I can only speak to the technical aspects of the project, but I've
gathered that FreeUKGen is working on an open database license for
some of their efforts. You might ping the other Ben about the status
of that effort to see whether it's ready for you to test your
algorithm against.

My own observation from watching pieces of that effort is that coming
up with an open license for a database is a bit different from picking
an open license for software or a CC license for a creative work,
particularly if you want to be scrupulous about not claiming copyright
over un-copyrightable things and are outside of the US legal regime.

Ben

Ben Laurie

unread,
Dec 30, 2012, 7:57:01 AM12/30/12
to root...@googlegroups.com
On Sun, Dec 30, 2012 at 2:47 AM, Ben Brumfield <benw...@gmail.com> wrote:
> I can only speak to the technical aspects of the project, but I've
> gathered that FreeUKGen is working on an open database license for
> some of their efforts. You might ping the other Ben about the status
> of that effort to see whether it's ready for you to test your
> algorithm against.

The problem is we need to get the transcribers to agree to the new
licence before we can release data under it, which is why I have not
suggested our data.

The plan is to do that as part of the process of moving to the new sites.

Robert Hoare

unread,
Dec 30, 2012, 4:00:41 PM12/30/12
to root...@googlegroups.com
I do understand the difficulties you'll have with that, since the transcribers are also bound by conditions that were imposed on them by the image provider.  In the Somerset case, SHC let them have the images proving there was  "no commercial use".

What licence are you hoping to use, eventually?

Rob

Ben Laurie

unread,
Dec 31, 2012, 9:58:39 AM12/31/12
to root...@googlegroups.com
On Sun, Dec 30, 2012 at 9:00 PM, Robert Hoare <robert...@gmail.com> wrote:
> I do understand the difficulties you'll have with that, since the
> transcribers are also bound by conditions that were imposed on them by the
> image provider. In the Somerset case, SHC let them have the images proving
> there was "no commercial use".
>
> What licence are you hoping to use, eventually?

This is still somewhat under discussion, but my hope is for an open
data licence permitting commercial use.

Brooke Ganz

unread,
Jan 15, 2013, 7:15:12 PM1/15/13
to root...@googlegroups.com
Jumping into this discussion a bit late...

Does anyone know of any formal or organized efforts to get genealogical and archival organizations to release their raw data under Creative Commons licenses?

If not, would anyone would like to start one with me?  :-)

(I am thinking that BY-NC-SA -- which is Attribution-NonCommercial- ShareAlike -- would be the best choice for most purposes.  I do not think most record providers would agree to a license that allows broad commercial use and would only want to grant that right on a case-by-case basis.)

Is a movement to push for Creative Commons licensing of genealogical data something that Roots-Dev as a group would be interested in pursuing?  If not, what's a good name for a group like this?  Google has already claimed "Data Liberation Front" as one of their internal groups.  :-)


- Brooke Schreier Ganz

GeneJ

unread,
Jan 15, 2013, 7:27:28 PM1/15/13
to root...@googlegroups.com
I'm interested, but I suspect the standard cookie cutter options might need a bit of work for this to be feasible for genealogy. "Non-commerical" is probably too broad a marker. ShareAlike will take a little work too. 

--
 
 
 

Ben Brumfield

unread,
Jan 15, 2013, 8:44:32 PM1/15/13
to root...@googlegroups.com
Thanks for asking such a great question, Brooke. I wish I had better answers.

In the UK, there is the Open Genealogy Alliance, which is affiliated
with the Open Rights Group (kind of a UK version of our EFF), FreeBMD,
and the Open Knowledge Foundation.
http://www.opengenalliance.org/

There's also the Open Digitization Project, which is a bit more
focused on cultural heritage digitization but has some obvious
overlaps:
http://opendigitisation.org/

I'm not aware of US-based equivalents, but then again our need is a
bit smaller. Here in the US, most of our records are Public Domain
either by virtue of age or by being government works. Better, the
scans and transcripts of those records are (usually) governed by
Bridgeman v. Corel, so are also Public Domain. Such fruitful
conditions do not/may not apply in Europe or the UK. I can produce a
spittle-flecked, red-faced rant at great length if you get me a beer
or a cup of coffee and prompt me, but I wouldn't recommend it.

Let me join the bandwagon against NonCommercial licenes.
Non-commercial use is quite hard to pin down. As an example, imagine
that you were a non-profit dedicated to providing free access to
census records on your website, and that you were funded by donations
and the occasional banner ad. Can you use material scanned by/hosted
on the Internet Archive? Sorry -- according to
http://archive.org/post/111591/what-is-non-commercial-use those banner
ads turn your non-profit, free-access website into 'commercial use'.

You can find an alternative explanation of the perils of CC-NC by my
friends at the Open Siddur Project here:
http://opensiddur.org/2011/03/why-to-choose-a-free-creative-commons-license-or-say-no-to-nc/

Ben

Robert Hoare

unread,
Jan 15, 2013, 10:47:49 PM1/15/13
to root...@googlegroups.com
On Tue, Jan 15, 2013, at 05:44 PM, Ben Brumfield wrote:
> You can find an alternative explanation of the perils of CC-NC by my
> friends at the Open Siddur Project here:

Thanks Ben for the pointer to that excellent argument for why Non-Commercial licenses are to be avoided. And for pointing out that even voluntary "non-profits" will still often have commercial use.  Saved me having to write something similar!

Interesting to see FreeBMD are/were involved in that (dormant?) Open Genealogy Alliance, as their data set is one of the most useful sets in England and Wales to create derivatives from, such as: name analysis (frequency, geographic, changes over time); identifying which of the two marriage partners; marriage witnesses, full maiden names, informants etc entered by users; linking of death records to births where possible ... and a lot of other uses that I haven't thought of. Ironic that FreeBMD currently have a more restrictive license than Ancestry though.

Most of those new uses are not possible if the license has a "no commercial use" restriction (since most projects will have some overheads that need to be covered, even for a "non profit").  With a share-alike license those derivative projects can feed data back to FreeBMD (if they want it) to enhance their data.  That's where the value comes in sharing.

Unfortunately, I haven't yet found a single country (or city, state, province, library, genealogy society) that has put any significant vital records or census information online with an open data license (without an NC restriction), anywhere in the world.  Not one.

In the US in particular it'll be tough to get the idea accepted, because the trend has been to reduce access ("in case of identity theft"), plus you're dealing with thousands of governments (from towns upwards) rather than a national registry office.  So the US doesn't have it easier than other countries, quite the reverse really.

Rob

Brooke Ganz

unread,
Jan 16, 2013, 12:13:11 AM1/16/13
to root...@googlegroups.com

On Tuesday, January 15, 2013 7:47:49 PM UTC-8, Robert Hoare wrote:

Unfortunately, I haven't yet found a single country (or city, state, province, library, genealogy society) that has put any significant vital records or census information online with an open data license (without an NC restriction), anywhere in the world.  Not one.

Here's the thing: I don't think asking US cities/counties/states to put their data under Creative Commons licenses is the way to go -- because, as Ben pointed out, in the US those vital records are already in the public domain by virtue of having been created by the US government with taxpayer money.  Asking them to declare their records to be bound by a CC license would therefore seem to be both unnecessary and counter-productive.  In those cases, energy would be better spent getting those cities/counties/states release their data to the public in the first place, not in dealing with licensing of that data.  Obviously, this endeavor would be pushing back against the recent awful trend of closing off public access to public data, and would not be easy to do.

A better and probably more fruitful initial focus would be identifying the important *non*-governmental parts of the archival world -- like genealogy societies, historical societies, libraries, for-profit websites (like Ancestry.com or Find-A-Grave), not-for-profit websites (like FamilySearch.org, JewishGen.org), etc. -- that have already done the laborious work of locating paper-copy (or headstone) records, scanning or photographing the records, indexing the records, and making databases out of the indices -- and asking *them* to please release their completed datasets, and release them under a CC license at that.


- Brooke Schreier Ganz

Robert Hoare

unread,
Jan 16, 2013, 12:57:13 AM1/16/13
to root...@googlegroups.com

Here's the thing: I don't think asking US cities/counties/states to put their data under Creative Commons licenses is the way to go -- because, as Ben pointed out, in the US those vital records are already in the public domain by virtue of having been created by the US government with taxpayer money.  

Most states do not consider vital records to be public records in the public domain (otherwise they wouldn't be able to restrict access as they are increasingly doing).  Here's an example of the law for Arizona:

Illinois: http://www.idph.state.il.us/vitalrecords/deathinfo.htm "death records are not public records"

New Jersey: http://www.state.nj.us/health/vital/ "New Jersey law protects and restricts the release of vital records, as such, vital records are not public records".

So the US does have perhaps the biggest problem in getting this data available, since it's spread around so many governments.  It's not just a case of asking them to make "public domain" data available (since it isn't), there are quite valid privacy laws involved.
 
Having said that, the state archives (etc) that do have historical data already online probably just automatically put a copyright notice on it (or just use the website template).  Asking whether the terms for existing online data can be relaxed or restated might be worthwhile (and, as a second step, whether the raw data can be made available for download).  I've asked one State Archive tonight, one that has an open data initiative already, I'll see how that goes.

A better and probably more fruitful initial focus would be identifying the important *non*-governmental parts of the archival world -- like genealogy societies, historical societies, libraries, for-profit websites (like Ancestry.com or Find-A-Grave), not-for-profit websites (like FamilySearch.org, JewishGen.org), etc. -- that have already done the laborious work of locating paper-copy (or headstone) records, scanning or photographing the records, indexing the records, and making databases out of the indices -- and asking *them* to please release their completed datasets, and release them under a CC license at that.

It would be nice, but I don't see the commercial organisations having any interest at all!  It's their business, and their investment, makes no commercial sense, in the short term, their shareholders would sue them.  In the long term they'll get bypassed if they don't adapt, but public companies can't think long term.
 
Familysearch.org are actually no different, putting copyright conditions on data that is often freely provided to them.  However, they, like many organisations, are often only given access to the data subject to restrictions (such as the stifling "no commercial use"), and rather than separate out which datasets have restrictions and which don't, they just put the same terms and conditions on everything.

But I do agree a good place to start would be volunteer-produced data.  Some of them are spending money to host their data, they would perhaps welcome others sharing the burden, and producing new and better ways to access the data (and sharing it back).  The main barrier to overcome is the feeling that others will be "making money" off all their hard work.

I'm certainly interested in working with others (like you, Brooke) on methodically searching out and trying to improve the access terms for existing data from public, commerical, and non-profit sources (and helping process, improve and host such data).   Maybe a project on something like Github/Bitbucket (or an open project manager) would be a way to co-operate, by using issue tracking for who's approaching who, example emails, wiki for legal issues, example data, data handling techniques, etc.  It's all very relevant to the Open Genealogy software I'm working on.

Rob

Dallan Quass

unread,
Jan 16, 2013, 11:05:48 AM1/16/13
to root...@googlegroups.com, Ben Brumfield
> I'm not aware of US-based equivalents, but then again our need is a
> bit smaller. Here in the US, most of our records are Public Domain
> either by virtue of age or by being government works.

I believe this only applies to Federal government works, not state/local
government works. So federal census records are public domain, but not
state census or vital records. See

http://en.wikipedia.org/wiki/Copyright_status_of_work_by_the_U.S._government

Better, the
> scans and transcripts of those records are (usually) governed by
> Bridgeman v. Corel, so are also Public Domain.

This ruling was made by a district court, not the US Supreme Court, so I
believe it applies only to the Southern New York district. It has been
influential in other cases, but it's not yet a matter of law. I believe
that if you were to download and start publishing Ancestry's census
images for example, and they were to sue you to get you to stop (which
they probably would), you would have to hire a lawyer and defend
yourself. You'd probably win in the end, but you'd need to have enough
money to cover your legal costs in the meantime. See

http://en.wikipedia.org/wiki/Bridgeman_Art_Library_v._Corel_Corp.#Subsequent_jurisprudence

> Let me join the bandwagon against NonCommercial licenes.
> Non-commercial use is quite hard to pin down. As an example, imagine
> that you were a non-profit dedicated to providing free access to
> census records on your website, and that you were funded by donations
> and the occasional banner ad. Can you use material scanned by/hosted
> on the Internet Archive? Sorry -- according to
> http://archive.org/post/111591/what-is-non-commercial-use those banner
> ads turn your non-profit, free-access website into 'commercial use'.

I agree that the line between commercial and non-commercial use is
fuzzy, but this statement is Brewster Kahle's opinion, not a statement
from a lawyer trying to interpret what the courts are ruling is
commercial vs non-commercial use.

In my opinion, the real benefit of saying non-commercial is that it
gives genealogy organizations, who are often *loathe* to think that
anyone else would profit from their work, a better feeling about making
their data available to others. It's a marketing/positioning issue for
them to say: our intent is that this content not be used commercially.
Then we'll have to let the courts decide exactly what constitutes
commercial use.

Dallan

DallanQ

unread,
Jan 16, 2013, 11:20:19 AM1/16/13
to root...@googlegroups.com
I'm also interested in working with others in this area BTW.  WeRelate data is all cc-by-sa, but most of it is conclusion data, although one of our members recently got permission to upload Savage's Genealogical Dictionary of the First Settlers of New England, which is pretty interesting.

It would be nice to have primary records available in an open-content license.  

Dallan

GeneJ

unread,
Jan 16, 2013, 11:21:19 AM1/16/13
to root...@googlegroups.com
I'd only add that the notion of "minimal creativity" comes from Feist v
Rural (1991), and that was a ruling by the U.S. Supreme Court.

http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Telephone_Service

Ben Brumfield

unread,
Jan 16, 2013, 11:23:58 AM1/16/13
to root...@googlegroups.com
On Wed, Jan 16, 2013 at 10:05 AM, Dallan Quass <dal...@gmail.com> wrote:
> Better, the
>> scans and transcripts of those records are (usually) governed by
>> Bridgeman v. Corel, so are also Public Domain.
>
> This ruling was made by a district court, not the US Supreme Court, so I
> believe it applies only to the Southern New York district. It has been
> influential in other cases, but it's not yet a matter of law. I believe
> that if you were to download and start publishing Ancestry's census images
> for example, and they were to sue you to get you to stop (which they
> probably would), you would have to hire a lawyer and defend yourself. You'd
> probably win in the end, but you'd need to have enough money to cover your
> legal costs in the meantime. See
>
> http://en.wikipedia.org/wiki/Bridgeman_Art_Library_v._Corel_Corp.#Subsequent_jurisprudence
>
I agree that Bridgeman is not entirely settled, globally applicable
law, but any time you deal with copyright you find you can't rely what
seems to be the clearest of uses. To quote a university copyright
librarian of my acquaintance, "the only way to know whether what
you're doing is allowed is to be sued and win."

Ancestry wouldn't need to make a copyright challenge at all if I were
to do that. They are forthright enough to state in their TOS that
they do not claim copyright over public domain works, but they
restrict users to download/republication of some number (200 the last
time I looked) of their documents per year.

>> Let me join the bandwagon against NonCommercial licenes.
>> Non-commercial use is quite hard to pin down. As an example, imagine
>> that you were a non-profit dedicated to providing free access to
>> census records on your website, and that you were funded by donations
>> and the occasional banner ad. Can you use material scanned by/hosted
>> on the Internet Archive? Sorry -- according to
>> http://archive.org/post/111591/what-is-non-commercial-use those banner
>> ads turn your non-profit, free-access website into 'commercial use'.
>
>
> I agree that the line between commercial and non-commercial use is fuzzy,
> but this statement is Brewster Kahle's opinion, not a statement from a
> lawyer trying to interpret what the courts are ruling is commercial vs
> non-commercial use.
>
True, but as you've written above, the opinion of someone who thinks
they can sue you and win is extremely important to most of us.

> In my opinion, the real benefit of saying non-commercial is that it gives
> genealogy organizations, who are often *loathe* to think that anyone else
> would profit from their work, a better feeling about making their data
> available to others. It's a marketing/positioning issue for them to say:
> our intent is that this content not be used commercially. Then we'll have to
> let the courts decide exactly what constitutes commercial use.
>
So you're arguing that practically speaking, sharing under an NC
license shouldn't be compared to a free license alternative, but
rather to the data not being shared at all? I fear that you may be
right.

Ben

Dallan Quass

unread,
Jan 16, 2013, 11:26:18 AM1/16/13
to root...@googlegroups.com, GeneJ
On 1/16/2013 10:21 AM, GeneJ wrote:
> I'd only add that the notion of "minimal creativity" comes from Feist v
> Rural (1991), and that was a ruling by the U.S. Supreme Court.
>
> http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Telephone_Service

I agree, and the fact that Bridgeman and Corel went to court even after
the Feist ruling says to me that if you were to do something similar,
you'd be going to court as well. I believe that you'd win in the end
(there's enough case law in your favor), but the copyright owner would
do everything that they could to stop you.

Dallan

Dallan Quass

unread,
Jan 16, 2013, 11:33:10 AM1/16/13
to root...@googlegroups.com
>> In my opinion, the real benefit of saying non-commercial is that it gives
>> genealogy organizations, who are often *loathe* to think that anyone else
>> would profit from their work, a better feeling about making their data
>> available to others. It's a marketing/positioning issue for them to say:
>> our intent is that this content not be used commercially. Then we'll have to
>> let the courts decide exactly what constitutes commercial use.
>>
> So you're arguing that practically speaking, sharing under an NC
> license shouldn't be compared to a free license alternative, but
> rather to the data not being shared at all? I fear that you may be
> right.

I'd certainly prefer a free license alternative. WeRelate data is
cc-by-sa. But yes, as a practical matter, I believe that not having NC
would be a show-stopper for many genealogy organizations and it's better
to have data under NC than not at all.

It's interesting to note that FamilySearch Wiki started off as NC and
then changed their license a couple of years later to remove it. So
maybe we could think of NC as a "gateway" license to eventual adoption
of a fully free license.

Dallan

Ben Brumfield

unread,
Jan 16, 2013, 11:40:36 AM1/16/13
to root...@googlegroups.com, GeneJ
On Wed, Jan 16, 2013 at 10:26 AM, Dallan Quass <dal...@gmail.com> wrote:
>
> I agree, and the fact that Bridgeman and Corel went to court even after the
> Feist ruling says to me that if you were to do something similar, you'd be
> going to court as well. I believe that you'd win in the end (there's enough
> case law in your favor), but the copyright owner would do everything that
> they could to stop you.
>
That may be true for some copyright owners--if "owners" is the correct
term for someone asserting ownership over public domain data.
However, I've found Bridgeman v. Corel to be an argument which many
people within the archives community respect.

Last year, two hours of discussion on Twitter was enough to get the
Wisconsin Historical Society to remove their website's claim to
copyright over their scans of pre-1923 material. In that case, the
claim had been placed on the website years before, and simply bringing
it to the staff's attention (and perhaps supplying them with the
ammunition of _Bridgeman_ and the Twitter outcry) was sufficient to
make a change.

Not everyone we deal with is a litigious greed-head -- many people
within institutions believe strongly in public access and want to
share their collections.

Ben

Dallan Quass

unread,
Jan 16, 2013, 11:41:22 AM1/16/13
to root...@googlegroups.com, Ben Brumfield
On 1/16/2013 10:23 AM, Ben Brumfield wrote:
> Ancestry wouldn't need to make a copyright challenge at all if I were
> to do that. They are forthright enough to state in their TOS that
> they do not claim copyright over public domain works, but they
> restrict users to download/republication of some number (200 the last
> time I looked) of their documents per year.

Good point Ben. Here is a section from FamilySearch's terms -- they're
even more restrictive than Ancestry's. You can't post any information
from their site on a website, your home computer network, or share it
with others in any way from my reading :-)

https://familysearch.org/terms/

All material found on this site (including visuals, text, icons,
displays, databases, media, and general information) is owned or
licensed by us. You may view, download, and print material from this
site only for your personal, noncommercial use unless otherwise
indicated. In addition, material may be reproduced by media personnel
for use in traditional public news forums unless otherwise indicated.
You may not post material from this site on another website or on a
computer network without our permission. You may not transmit or
distribute material from this site to other sites. You may not use this
site or information found at this site (including the names and
addresses of those who have submitted information) to sell or promote
products or services, to solicit clients, or for any other commercial
purpose.

Notwithstanding the foregoing, we reserve sole discretion and right to
deny, revoke, or limit use of this site, including reproduction of site
content. It is not our responsibility to determine what "fair use" means
for persons wishing to use materials from this site. That remains wholly
a responsibility of the user. Furthermore, we are not required to give
additional source citations. Also, in no case do we guarantee that
materials on this site are legally cleared for any use beyond personal,
noncommercial use. Such responsibility also ultimately remains with the
user. However, we do maintain the right to prevent infringement of our
materials and to interpret "fair use" as we understand the law.

Dallan

Dallan Quass

unread,
Jan 16, 2013, 11:44:21 AM1/16/13
to root...@googlegroups.com, Ben Brumfield, GeneJ
On 1/16/2013 10:40 AM, Ben Brumfield wrote:
> Last year, two hours of discussion on Twitter was enough to get the
> Wisconsin Historical Society to remove their website's claim to
> copyright over their scans of pre-1923 material. In that case, the
> claim had been placed on the website years before, and simply bringing
> it to the staff's attention (and perhaps supplying them with the
> ammunition of _Bridgeman_ and the Twitter outcry) was sufficient to
> make a change.

I didn't know that. That's awesome!! Thank you for sharing that.

Dallan

Ben Brumfield

unread,
Jan 16, 2013, 11:58:37 AM1/16/13
to root...@googlegroups.com
Those interested in the range of opinions among archivists might be
interested in skimming this thread on the Archives and Archivists
listserv:
http://forums.archivists.org/read/messages?id=60991
(a few straggler messages appear here:
http://forums.archivists.org/read/messages?id=60927 )

Another discussion thread on the subject among the digital humanities
crowd is here:
http://digitalhumanities.org/answers/topic/who-follows-best-practices-re-restrictions-on-digitized-public-domain-works
(caveat lector: I shoot my mouth off a bit in that one.)

Ben

Brooke Ganz

unread,
Jan 16, 2013, 2:24:58 PM1/16/13
to root...@googlegroups.com
On Wednesday, January 16, 2013 8:33:10 AM UTC-8, DallanQ wrote:

I'd certainly prefer a free license alternative.  WeRelate data is
cc-by-sa.  But yes, as a practical matter, I believe that not having NC
would be a show-stopper for many genealogy organizations and it's better
to have data under NC than not at all.

I agree.  It is going to be hard enough to convince volunteer groups and organizations and societies to license their data *at all*.  It is going to be extremely hard to get them to agree to a license that could hypothetically (if extremely, extremely unlikely) lead to someone unscrupulous "taking" copies of their indices and sticking them behind a paywall and selling monthly access.  So getting organizations to at least agree to a sharealike but non-commercial license is a good step in the right direction, considering that we are essentially starting from scratch here.
 
It's interesting to note that FamilySearch Wiki started off as NC and
then changed their license a couple of years later to remove it.  So
maybe we could think of NC as a "gateway" license to eventual adoption
of a fully free license.

I like that metaphor.  :-) 


- Brooke

Brooke Ganz

unread,
Jan 16, 2013, 2:48:57 PM1/16/13
to root...@googlegroups.com
On Tuesday, January 15, 2013 9:57:13 PM UTC-8, Robert Hoare wrote:

Most states do not consider vital records to be public records in the public domain (otherwise they wouldn't be able to restrict access as they are increasingly doing).

Yes, the actual individual BMD records may be protected by state privacy laws.  But isn't it debatable that the *indices* to US state vital records -- indices which were collated and published by the states, sometimes in paper/book form and sometimes in electronic form -- are public domain, and not copyrighted?

For example, several years ago New York City published in book form and in microfilm form (available at your local FHL, of course :-) ) the annual indices to its BMD records from the mid and late twentieth century.  And yet you cannot get ahold of those indices today from the NYC Department of Health, due to their post-9/11 rule changes, where in the name of "privacy" they actually claimed HIPAA (!) as an excuse to restrict access to these formerly-public records.  Aren't the old, already-published indices still in the public domain, no matter what the Department of Health would like to claim today?

I'm certainly interested in working with others (like you, Brooke) on methodically searching out and trying to improve the access terms for existing data from public, commerical, and non-profit sources (and helping process, improve and host such data).   Maybe a project on something like Github/Bitbucket (or an open project manager) would be a way to co-operate, by using issue tracking for who's approaching who, example emails, wiki for legal issues, example data, data handling techniques, etc.  It's all very relevant to the Open Genealogy software I'm working on.

I like this idea.

But before getting the ball rolling here, do most people reading this feel okay about such a effort being included under the general Roots-Dev quasi-organizational umbrella, or should this effort branch off into some new unnamed sister group?  Personally, I would like to see this become a long-term program, one of many that Roots-Dev takes on, in addition to its more traditional "let's talk about and build some cool software" focus.  Thoughts?


- Brooke

GeneJ

unread,
Jan 16, 2013, 2:53:37 PM1/16/13
to root...@googlegroups.com
I would probably be advancing any involvement I had under FHISO (but creating genealogy friendly creative commons licensing has been on my wish list for quite some time).    

From: Brooke Ganz <aspar...@gmail.com>
Reply-To: <root...@googlegroups.com>
Date: Wednesday, January 16, 2013 12:48 PM
To: <root...@googlegroups.com>
Subject: Re: [rootsdev] Re: Creative Commons genealogy data


But before getting the ball rolling here, do most people reading this feel okay about such a effort being included under the general Roots-Dev quasi-organizational umbrella, or should this effort branch off into some new unnamed sister group?  Personally, I would like to see this become a long-term program, one of many that Roots-Dev takes on, in addition to its more traditional "let's talk about and build some cool software" focus.  Thoughts?


- Brooke

--
 
 
 

Robert Hoare

unread,
Jan 16, 2013, 2:54:43 PM1/16/13
to root...@googlegroups.com
On Wednesday, January 16, 2013 12:24:58 PM UTC-7, Brooke Ganz wrote:

It is going to be extremely hard to get them to agree to a license that could hypothetically (if extremely, extremely unlikely) lead to someone unscrupulous "taking" copies of their indices and sticking them behind a paywall and selling monthly access.  So getting organizations to at least agree to a sharealike but non-commercial license is a good step in the right direction, considering that we are essentially starting from scratch here.

This unfortunately is the common misconception about "commercial use".  As the earlier links pointed out, commercial use is much broader, broad enough to make most data with a "non commercial" restriction unusable.

Take for example a blog posting, that shows some items from a "NC' restricted dataset (more items than would be allowed under fair use).  If that blog has any ads, it's commercial use.  

If a non-profit uses the existence of a copy of the (free, unrestricted access) data set on their site for publicity to get more paying members, that's (more arguably) commercial use. 

If somebody uses the data as an important element of their PhD thesis then I would also say that's commercial use (or is a PhD of no value?).

I do agree that there's a common feeling that the free data will then be hidden behind a paywall if commercial use is allowed, but it's the fact it is free to all (not just Ancestry or Familysearch) that stops that (there's no commercial reason to do it, won't make money).  

How many paid subscription versions of Wikipedia exist?

Rob

Ben Brumfield

unread,
Jan 16, 2013, 3:04:53 PM1/16/13
to root...@googlegroups.com
On Wed, Jan 16, 2013 at 1:48 PM, Brooke Ganz <aspar...@gmail.com> wrote:
> But before getting the ball rolling here, do most people reading this feel
> okay about such a effort being included under the general Roots-Dev
> quasi-organizational umbrella, or should this effort branch off into some
> new unnamed sister group? Personally, I would like to see this become a
> long-term program, one of many that Roots-Dev takes on, in addition to its
> more traditional "let's talk about and build some cool software" focus.
> Thoughts?
>
Not to make the best the enemy of the good, but I'd really like to
know more about similar efforts that are under way, both in the US and
elsewhere. For one thing, it seems likely that legal and policy work
on public access to public records is already being done by the EFF,
Creative Commons, the Open Access movement, those behind the SSDI
petitions, journalists, Wikimedia, the Internet Archive and others.
If there's an organization already funding lawyers to draft licenses,
file FOIA requests, or litigate test cases, I'm happy to contribute to
them rather than reinventing the wheel.

So who's working on public access to public records of interest to genealogists?

Ben

Robert Hoare

unread,
Jan 16, 2013, 3:22:29 PM1/16/13
to root...@googlegroups.com
On Wednesday, January 16, 2013 1:04:53 PM UTC-7, Ben Brumfield wrote:
I'd really like to
know more about similar efforts that are under way, both in the US and
elsewhere.  ...


So who's working on public access to public records of interest to genealogists?

I agree that finding out what's already in progress is an important and essential first step.  Most of the existing initiatives are on a national (or more local) level, may not be widely known.  

Some may not be active, that Open Genealogy Alliance for example has tweeted a couple of times last May, and did a blog post over a year ago, but otherwise no public progress (there may be something behind the scenes).

Genealogy is unusually global, has more in common with getting access to meteorology or GIS data than with many more local open data interests (like traffic, crime, property).  So a listing of what's relevant, across countries, will be very useful, will also help to show the gaps.

I do agree with you that existing foundations with money are the best people for the test cases! :-)  I'm thinking more of asking, getting permission with good arguments, rather than aggressively forcing change legally.  Hopefully in due course it'll seem the normal thing to do, to make historically data freely accessible and re-usable, as is already happening with GIS data.

Rob

GeneJ

unread,
Jan 16, 2013, 3:25:20 PM1/16/13
to root...@googlegroups.com
Thank you, Robert.  I'm a blogger and do other creative work.  For me a creative commons license with the "non-commercial" tag is saying, "stay away or suffer the consequence."  Very few genealogical bloggers turn a profit on their blogs, even though many display ads. Separately I license some articles so they can be reproduced by societies and commercial news organizations.  While I/we might make a mis-step from time to time, most genealogy bloggers will go out of their way to avoid being on the wrong side of licensing or IP issues. :-)  
 
Down the road, suppose you publish a book about your family and sell it to a few libraries. I suggest there is no chance you'll actually "profit" from the work , but it is still commercial. 












From: Robert Hoare <robert...@gmail.com>
Reply-To: <root...@googlegroups.com>
Date: Wednesday, January 16, 2013 12:54 PM
To: <root...@googlegroups.com>
Subject: Re: [rootsdev] Re: Creative Commons genealogy data

This unfortunately is the common misconception about "commercial use".  As the earlier links pointed out, commercial use is much broader, broad enough to make most data with a "non commercial" restriction unusable.

Take for example a blog posting, that shows some items from a "NC' restricted dataset (more items than would be allowed under fair use).  If that blog has any ads, it's commercial use.  

-- 
 
 
 

Brooke Ganz

unread,
Jan 16, 2013, 3:28:18 PM1/16/13
to root...@googlegroups.com
On Wednesday, January 16, 2013 12:04:53 PM UTC-8, Ben Brumfield wrote:

Not to make the best the enemy of the good, but I'd really like to
know more about similar efforts that are under way, both in the US and
elsewhere.  For one thing, it seems likely that legal and policy work
on public access to public records is already being done by the EFF,
Creative Commons, the Open Access movement, those behind the SSDI
petitions, journalists, Wikimedia, the Internet Archive and others.

Unfortunately, none of these groups, as far as I know, has any deep knowledge of, nor specific interest in, expanding or maintaining public access to genealogical records.  :-(  To be fair, why *would* they care?  The push for open access to genealogical files is primarily going to have to be fought by genealogists, because if we don't care about it, who will?

RPAC is probably the one existing group that would best take up the mantle for something like this.  (RPAC is the Records Preservation and Access Committee -- a joint committee of the Federation of Genealogical Societies [FGS], the National Genealogical Society [NGS], and the International Association of Jewish Genealogical Societies [IAJGS].)  But a lot of what they do is just notifying the broader genealogy community of suddenly-closed-off records, after the fact.  Their main focus is on publicizing and addressing threats.  They have been particularly active and helpful in the fight over the threatened SSDI closure, a situation which has still not been resolved.  I personally called one of the heads of RPAC about a year ago to discuss the SSDI fight and told her to get in touch with the EFF, which she had not heard of.  I don't know if they did manage to make contact, and if so, whether the EFF was any help at all.  I don't see any mention of the SSDI on the EFF's website.

RPAC has a nice white paper online here from 2009-2010, "Open Access to Public Records: A Genealogical Perspective":

 
If there's an organization already funding lawyers to draft licenses,
file FOIA requests, or litigate test cases, I'm happy to contribute to
them rather than reinventing the wheel.

Me too.  But I really don't think one exists yet, alas.  And I don't think we need to think about litigation before we even start *asking* people to voluntarily change how their data is handled!  Let's try catching flies with honey first.


- Brooke

Justin York

unread,
Jan 16, 2013, 9:37:10 PM1/16/13
to root...@googlegroups.com
I would help with an effort like this.

Enno Borgsteede

unread,
Jan 17, 2013, 9:05:29 AM1/17/13
to root...@googlegroups.com
I'd certainly prefer a free license alternative.  WeRelate data is
cc-by-sa.  But yes, as a practical matter, I believe that not having NC
would be a show-stopper for many genealogy organizations and it's better
to have data under NC than not at all.
I see their point, but I believe that in the end free as defined by the FSF will benefit most. I mean, when you look at http://distrowatch.com/ you can see that some think that they can earn a small profit distributing free software. And if it can work for that, why not free data too?
 
File ‘sharing’ has changed the music industry, and it looks quite ridiculous to me that in daily reality it’s easier to download a full version of FTM than it is to get genealogy data.
 
cheers,
 
Enno
 

Robert Hoare

unread,
Jan 17, 2013, 8:00:35 PM1/17/13
to root...@googlegroups.com
On Wednesday, January 16, 2013 7:37:10 PM UTC-7, Justin York wrote:
I would help with an effort like this.

Justin, would this be via your role in FamilySearch, or separately?

FamilySearch are very well placed to start a trend towards historical data being available under open licenses.  I appreciate there are (many) cases where the original provider has put restrictions on the data which make it impossible to share. 

But there will also be other cases (such as some crowdsourced transcriptions) where FamilySearch has "all rights" and can open it up under a copyright license that allows reuse (and share-alike, so that FamilySearch also benefits). 

What's needed is a process inside FamilySearch to identify the license (how it was obtained) for each particular data set, and then change the terms and conditions for those data sets where it's possible to relax the terms.  Having open data will be one of the major factors to help developers come up with new applications that haven't even been thought of yet.

What makes FamilySearch uniquely important for this: no subscription revenue to protect, a global set of data, and a mission to improve genealogy research (rather than pay shareholders). 

Rob

Justin York

unread,
Jan 17, 2013, 10:10:28 PM1/17/13
to root...@googlegroups.com
It would have to be unofficially. I don't have anything to do with records. Though I agree that FamilySearch is in a unique position to lead in this effort.


--
 
 
 

Ben Laurie

unread,
Jan 18, 2013, 7:20:56 AM1/18/13
to root...@googlegroups.com
On 16 January 2013 03:47, Robert Hoare <robert...@gmail.com> wrote:
> Ironic that FreeBMD currently have a more restrictive license than Ancestry
> though.

Rather late to this discussion ... FreeBMD has a restrictive licence
for historical reasons - at the time it was started Broderbund had
given the whole notion of uploading free content for free use a rather
bad name, and FreeBMD was treated with great suspicion. Unfortunately,
in retrospect, we decided at the time that the way to allay those
fears was to promise no commercial use.

I now think this is a rather bad idea, and hope in the long run to
change FreeBMD's (and FreeREG's and FreeCEN's) licence - and I agree
that unless commercial use is permitted the data is mostly useless.

I also agree that this does _not_ mean the data will end up behind
paywalls because no paywall can compete with free. It does, however,
raise the interesting question of how "free" is sustained as, sadly,
bandwidth, servers, disks and scanners are not free - nor are
sysadmins and coders reliably free.

Doug Blank

unread,
Jan 18, 2013, 8:40:41 AM1/18/13
to root...@googlegroups.com
This has been a very interesting, and very detailed, thread. I thank
rootsdev for making this conversation possible.

Although the original question was about getting test data to include
in a redistributed commercial manner, we see how this question lies at
the center of creating sustainable open genealogy.

Much of this conversation reminds by of the original motivation for
the free software licenses, especially those that protect the ability
to redistribute, and to ensure continued access, regardless of use
(commercial, web portal, etc.). I wondered if the GNU GPL people had
such a license for data, and asked the question:

http://stackoverflow.com/questions/14364231/gnu-gpl-for-data

There are a couple of interesting links there, including a story about
a group moving away from Creative Commons. I'm not against CC, but I
think I would need some advice if I were to use CC By-SA (which seems
to be the most appropriate CC license for our uses): how does one
share the data? does it need to be provided in an easy to access
manner? how does one attribute the data? can it be a field in XML
data? can each entry have a different license?

It would be great if this discussion could be summarized on the
rootsdev website as best practices for creating and sharing
genealogical datasets, both from scratch and combining data from
others.

-Doug
> --
>
>
>

Dallan Quass

unread,
Jan 18, 2013, 11:08:02 AM1/18/13
to root...@googlegroups.com, Doug Blank
On 1/18/2013 7:40 AM, Doug Blank wrote:
> There are a couple of interesting links there, including a story about
> a group moving away from Creative Commons. I'm not against CC, but I
> think I would need some advice if I were to use CC By-SA (which seems
> to be the most appropriate CC license for our uses): how does one
> share the data? does it need to be provided in an easy to access
> manner? how does one attribute the data? can it be a field in XML
> data? can each entry have a different license?

Wikipedia and WeRelate make their data available online as an XML file
with the page content.

You can access different versions of Wikipedia from
http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia


You can access yesterday's version of WeRelate and some software to
process the xml file from https://github.com/DallanQ/WeRelateData

Is it a definite requirement to make the data available via a public
URL? I don't know. FamilySearch wiki is cc-by-sa for example, and I'm
not sure if they make their database downloadable via a public URL.

For attribution it seems to be sufficient to include a link back to the
page where you got the information, and possibly also a link to the
history view for the list of page authors.

Dallan

Ben Brumfield

unread,
Jan 18, 2013, 3:21:29 PM1/18/13
to root...@googlegroups.com
On Friday, January 18, 2013 7:40:41 AM UTC-6, Doug Blank wrote:
It would be great if this discussion could be summarized on the
rootsdev website as best practices for creating and sharing
genealogical datasets, both from scratch and combining data from
others.

The public policy page of the wiki ( https://github.com/rootsdev/rootsdev.github.com/wiki/Public-Policy ) might be the perfect place for this. As the open source folks say, "PDI".

Ben

Enno Borgsteede

unread,
Jan 19, 2013, 7:22:35 AM1/19/13
to root...@googlegroups.com
Hi Ben,
> Rather late to this discussion ... FreeBMD has a restrictive licence
> for historical reasons - at the time it was started Broderbund had
> given the whole notion of uploading free content for free use a rather
> bad name, and FreeBMD was treated with great suspicion.
Right, and to tell you the truth, I think that this is happening again.
I mean, when I browse the web, download free software, I see lots of ads
promising free downloads, and from those ads I learned that free means
fraud. They bundle true free software with download managers, toolbars,
all sorts of things to earn money on us, and I really don't like that.
And Ancestry does the same by offering free content from Dutch archive
sites, just to attract visitors to their site, and My Heritage is no
better either.
> Unfortunately, in retrospect, we decided at the time that the way to
> allay those fears was to promise no commercial use. I now think this
> is a rather bad idea, and hope in the long run to change FreeBMD's
> (and FreeREG's and FreeCEN's) licence - and I agree that unless
> commercial use is permitted the data is mostly useless.
IMO, it depends on the sort of commercial use. I mean, I think that the
companies mentioned above actually pray on me, and at the same time I
understand that software like Ubuntu can only survive when it's backed
by a multi millionaire like Mark Shuttleworth, or a foundation with the
right amount of funds.
> I also agree that this does _not_ mean the data will end up behind
> paywalls because no paywall can compete with free. It does, however,
> raise the interesting question of how "free" is sustained as, sadly,
> bandwidth, servers, disks and scanners are not free - nor are
> sysadmins and coders reliably free.
The sad thing is that those companies that offer free downloads prove
that deceiving consumers is big business now, and Google ads make sure
that their message reaches everyone of us. Do you have an idea on how we
can compete with that on the data front?

To me it looks like sponsorship is the answer, not free data that can be
abused like software is today.

regards,

Enno

Ben Laurie

unread,
Jan 19, 2013, 1:04:15 PM1/19/13
to root...@googlegroups.com
On 19 January 2013 12:22, Enno Borgsteede <enno...@gmail.com> wrote:
> Hi Ben,
>
>> Rather late to this discussion ... FreeBMD has a restrictive licence for
>> historical reasons - at the time it was started Broderbund had given the
>> whole notion of uploading free content for free use a rather bad name, and
>> FreeBMD was treated with great suspicion.
>
> Right, and to tell you the truth, I think that this is happening again. I
> mean, when I browse the web, download free software, I see lots of ads
> promising free downloads, and from those ads I learned that free means
> fraud. They bundle true free software with download managers, toolbars, all
> sorts of things to earn money on us, and I really don't like that. And
> Ancestry does the same by offering free content from Dutch archive sites,
> just to attract visitors to their site, and My Heritage is no better either.

I think the answer to this kind of thing is to do better than they do, for less.

>> Unfortunately, in retrospect, we decided at the time that the way to allay
>> those fears was to promise no commercial use. I now think this is a rather
>> bad idea, and hope in the long run to change FreeBMD's (and FreeREG's and
>> FreeCEN's) licence - and I agree that unless commercial use is permitted the
>> data is mostly useless.
>
> IMO, it depends on the sort of commercial use. I mean, I think that the
> companies mentioned above actually pray on me, and at the same time I
> understand that software like Ubuntu can only survive when it's backed by a
> multi millionaire like Mark Shuttleworth, or a foundation with the right
> amount of funds.

I do kind of agree, but I have no idea how you'd make a licence that
captured the difference.

>> I also agree that this does _not_ mean the data will end up behind
>> paywalls because no paywall can compete with free. It does, however, raise
>> the interesting question of how "free" is sustained as, sadly, bandwidth,
>> servers, disks and scanners are not free - nor are sysadmins and coders
>> reliably free.
>
> The sad thing is that those companies that offer free downloads prove that
> deceiving consumers is big business now, and Google ads make sure that their
> message reaches everyone of us. Do you have an idea on how we can compete
> with that on the data front?

Charities can get a certain amount of free advertising. And, indeed,
can make money from advertising.

> To me it looks like sponsorship is the answer, not free data that can be
> abused like software is today.

I work a lot in free software, and from where I sit it seems like
there's a lot more use as we intended than there is abuse.

Sponsorship is certainly part of the picture.

Robert Hoare

unread,
Jan 20, 2013, 3:43:02 AM1/20/13
to root...@googlegroups.com
On Friday, January 18, 2013 5:20:56 AM UTC-7, Ben Laurie wrote:

I now think this is a rather bad idea, and hope in the long run to
change FreeBMD's (and FreeREG's and FreeCEN's) licence - and I agree
that unless commercial use is permitted the data is mostly useless.

That's great to hear.  I'm curious though how the advertising on FreeBMD.org.uk fits with being non-commercial - does the charity status help?  I've been considering whether setting up a non-profit (not the same as non-revenue) is worthwhile.  I'm surprised the deal with Ancestry.com for them to host a copy behind their paywall was considered non-commercial though.
 
It does, however,
raise the interesting question of how "free" is sustained as, sadly,
bandwidth, servers, disks and scanners are not free - nor are
sysadmins and coders reliably free.

True.  One of my websites (not genealogy) gets half a million human page views a day (plus about ten times that from robots, thousands of database queries a second), so I'm well aware of the challenge and cost of keeping a busy free-access site running.

When comparing that with FreeBMD though I see your searches per day have been steadily declining for the past few years.  Do you think that's partly because the index (the early part) became available on Ancestry as well?  That would be an argument for sharing data: when one organisation can't afford to keep it online, it's likely another will take their place, and/or having multiple places to search the data will ease the load (and reduce costs) on each site, plus increase reliability and provide alternate ways to search.

That still leaves the cost of producing the data.  Currently, Ancestry and Brightsolid (and some others in mainly the US) have good reason to invest money in scanning and transcription: they hope it'll help them gain/retain paying subscribers.  So totally free access to everything they've done won't ever happen. 

But they both have deals with Familysearch (so far) to provide some transcriptions and Familysearch then sends users to them to subscribe to get access to the images.  So even their very commercial business model can and does sustain free access to some transcriptions (as a teaser, it's excellent advertising), so they wouldn't have any reason to stop.

Familysearch has a large sponsor paying the bills, and they don't get revenue from their site (just costs) so there's apparently little for them to lose by making the data more widely available - production costs are the same, distribution and promotion costs go down.

And then there's smaller scale or local projects.  If there's a nice easy way to transcribe (your new FreeReg/FreeCen are doing that?) and store data in a standard format (FHISO?) then there's likely to be several sites that will offer free hosting for community produced data, either as part of a larger site (to promote it), or a freemium model like Github.  Actually makes it easier and cheaper to create data (and more fun!), instead of each group having to maintain and pay for databases and websites.

Finally, I do agree with others that Creative Commons may not be a perfect fit, an Open Data license makes more sense.  The Open Data Commons Open Database License (ODbL) license used by OpenStreetMap.org does look ideal (apart from the convoluted name!), especially the share-alike (can't add restrictions) and keep-open (can't put solely behind a paywall) clauses.  Plus how a data source is attributed can be clearly specified by that source.

Rob

Ben Laurie

unread,
Jan 20, 2013, 6:32:26 AM1/20/13
to root...@googlegroups.com
On 20 January 2013 08:43, Robert Hoare <robert...@gmail.com> wrote:
> On Friday, January 18, 2013 5:20:56 AM UTC-7, Ben Laurie wrote:
>>
>>
>> I now think this is a rather bad idea, and hope in the long run to
>> change FreeBMD's (and FreeREG's and FreeCEN's) licence - and I agree
>> that unless commercial use is permitted the data is mostly useless.
>
>
> That's great to hear. I'm curious though how the advertising on
> FreeBMD.org.uk fits with being non-commercial - does the charity status
> help?

The licence applies to other users, not ourselves.

> I've been considering whether setting up a non-profit (not the same
> as non-revenue) is worthwhile. I'm surprised the deal with Ancestry.com for
> them to host a copy behind their paywall was considered non-commercial
> though.

It was not behind their paywall.

Robert Hoare

unread,
Jan 20, 2013, 12:28:40 PM1/20/13
to root...@googlegroups.com
On Sunday, January 20, 2013 4:32:26 AM UTC-7, Ben Laurie wrote:

> I'm surprised the deal with Ancestry.com for
> them to host a copy behind their paywall was considered non-commercial
> though.

It was not behind their paywall.

Ah, thanks Ben I hadn't noticed that as I'm usually logged in.  It requires their free login to see it, and those pages don't contain external advertising.  But they do have other links on the same page to promote their paid services. 

So freemium is considered non-commercial in this particular case.  A lot of possible uses could live within that limitation (keep access to the free data totally free, upsell to other data/services, including derived data such as the data match sidebar on Ancestry).

Rob
Reply all
Reply to author
Forward
0 new messages