Fwd: [open-science] [SCHOLCOMM] Libre open access, copyright, patent law, and, other intellectual property matters

30 views
Skip to first unread message

Bryan Bishop

unread,
Mar 22, 2012, 11:41:49 AM3/22/12
to diybio, Bryan Bishop
Peter Murray-Rust
From: Peter Murray-Rust <pm...@cam.ac.uk>
Date: Thu, Mar 22, 2012 at 3:06 AM
Subject: Re: [open-science] [SCHOLCOMM] Libre open access, copyright, patent law, and, other intellectual property matters
To: john wilbanks <j...@del-fi.org>
Cc: Heather Morrison <hgmo...@sfu.ca>, open-science <open-s...@lists.okfn.org>


On Wed, Mar 21, 2012 at 11:58 PM, john wilbanks <j...@del-fi.org> wrote:
I'm going by the BBB declarations.


Thanks John, [and Klaus] and so am I.
 
I'm happy to see robust discussion on this list - we should avoid flame wars.

It's somewhat unfortunate that there seems an operational division between science and humanities. It would be nice to have a one-size-fits all for "Open Access" but the reality may evolve to be different. The Harnad-Morrison-Thatcher approach could be summed up as:
* the primary goal is that humans can somehow find a Gratis copy of the work to read with their eyes. It is of secondary importance whether the community has any rights.

The science community on the other hand wishes to make complete use of the complete scholarly literature using modern technology to discover, index, extract, re-use, recompute, re-assemble in whatever way their imagination and technology runs to. (I wish to build an artificially intelligent chemical amanuensis by semantic analysis of the complete literature, for example).
* ANY licence other than BBB-compliant prevents this ABSOLUTELY. Any publisher's contract prevents this absolutely.

It is profoundly unhelpful to this cause to have people pontificating about absolute author's rights and quasi-religious approaches to solving the problem. Harnad and Morrison know nothing about high-throughput textmining, data extraction, eigenvector-based indexing, etc. If they wish to publish their own work under NC I shan't fight it.

UK/PubMedcentral is crippled by the lack of explicit full-libre permission to re-use it. 20 million scientific articles of which about 1% are legally minable and those are extremely difficult to discover. I spent my "research" effort trying to find these, rather than actually DOING the science from them. Last week my tools read 500,000 chemical reactions from the patent literature, better as well as infinitely faster than any human on the planet. Those reactions can help to find new drugs, new ways of making drugs, new insights into chemistry.

The reality is that science can operate extremely well with CC-BY. I am yet again preparing a clutch of articles for Biomed Central (a special issue with 17 APC-based articles). BMC have been running for 10 years. As far as I know there have been no serious misuse of the literature so there is no need to "protect" CC-BY.

On a related point, institutional repositories are almost completely useless for modern literature analysis. They do not carry explicit machine-readable libre licences so we cannot by right use any of their content. They are fragmented - instead of the UK having ONE repository (say in the BL) which would be the rational thing that any scientist would do they are fragmented over 200 universities at great additional cost.

Al that leads up to me thanking the RCUK for insisting on CC-BY and - with other scientific organizations such as Wellcome, and the Libre science publishers - making BBB-OpenAccess a reality. There is a great deal more to do, but at least we have a model that works and that politicians are listening to.


--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

_______________________________________________
open-science mailing list
open-s...@lists.okfn.org
http://lists.okfn.org/mailman/listinfo/open-science




--
- Bryan
http://heybryan.org/
1 512 203 0507

Bryan Bishop

unread,
Mar 22, 2012, 11:42:06 AM3/22/12
to diybio, Bryan Bishop

From: john wilbanks <j...@del-fi.org>
Date: Thu, Mar 22, 2012 at 10:35 AM

Subject: Re: [open-science] [SCHOLCOMM] Libre open access, copyright, patent law, and, other intellectual property matters
To: Peter Murray-Rust <pm...@cam.ac.uk>

Cc: Heather Morrison <hgmo...@sfu.ca>, open-science <open-s...@lists.okfn.org>


I realize that I didn't make my point clear enough actually.

And I don't lump Heather in with Harnard. Heather asked a good question that I answered obliquely. For that I apologize.

I do not just want the ability for academics to text mine. I want there to be a robust market for text mining that includes companies who mine open access content for their own reasons as well as academics, and I want there to be a robust market of startups who provide those text mining services (and thus must make and distribute copies of corpuses as validation sets, as part of collaborations with academics that improve algorithms, and who also produce and sell the outputs of text mining). Right now text mining pretty much sucks, frankly, compared to what it ought it be.

Non commercial licenses are not just a way to prevent other publishers from reselling content, which is often the focus of the conversation, but a tax on startups and companies who want to treat the literature as data. Here's a short list of companies trying to do just that who are being hamstrung by closed access, and who would be blocked under NC terms: Personalized Medicine (providing auto-annotation of genotypes to doctors' offices), Selventa (providing auto-created hypotheses explaining high throughput experimental biological data), Ingenuity (providing large databases of assertions specific to diseases or tissues). Those three are simply the first ones that jump to mind in startup land. There's ~20 more I know of, and many more that I don't.

The uncertainty around content chills venture investment, to boot. If the web had been NC licensed, we would not have google, and pagerank would have remained where it started as an academic theory experiment. That would suck, in my opinion. And the big publishers know this, which is precisely why they add clauses that ban mining to existing licenses and want commercial restrictions. I don't think they're worried about resale. I think they're worried about getting their lunch eaten by new entrants who see the market differently, as Apple did with music, as Google did to Microsoft (and in turn Facebook did to Google). That's why Elsevier has an entire unit devoted to this stuff, run by extremely smart people.

Then there's all of big pharma and biotech, who all maintain libraries and subscriptions, but are often absent from these discussions because of their position on patents.

Non commercial restrictions have *side effects* that are bad for innovation and bad for science. We need entrepreneurs and not just academics.

This is not nearly as much of a problem in the humanities on first blush, but the reality is that as text mining gets better, faster, cheaper, and more subtle in the hard sciences, it will bring amazing tools to the humanities as well.

jtw
--
------
john wilbanks
@wilbanks
http://del-fi.org


_______________________________________________
open-science mailing list
open-s...@lists.okfn.org
http://lists.okfn.org/mailman/listinfo/open-science

Bryan Bishop

unread,
Mar 22, 2012, 2:35:45 PM3/22/12
to diybio, Bryan Bishop

From: Peter Murray-Rust <pm...@cam.ac.uk>
Date: Thu, Mar 22, 2012 at 12:12 PM

Subject: Re: [open-science] [SCHOLCOMM] Libre open access, copyright, patent law, and, other intellectual property matters
To: john wilbanks <j...@del-fi.org>

Cc: Heather Morrison <hgmo...@sfu.ca>, open-science <open-s...@lists.okfn.org>

On Thu, Mar 22, 2012 at 3:35 PM, john wilbanks <j...@del-fi.org> wrote:
I realize that I didn't make my point clear enough actually.

And I don't lump Heather in with Harnard. Heather asked a good question that I answered obliquely. For that I apologize.

I was probably simplistic as well.  But, unfortunately, there is a large section of academia that creates "Open Access" policy implicitly and explicitly and very little of it is informed by scientists. There are 1000+ institutional repositories (Peter Suber figure) with ca 2 FTE per repo == > 2000 FTEs and very little of this investment is informed by scientist needs.

In my submission to the Hargreaves process (http://blogs.ch.cam.ac.uk/pmr/2012/03/21/my-response-to-hargreaves-on-copyright-reform-i-request-the-removal-of-contractual-restrictions-and-independent-oversight/) I have said very much the same as John

I do not just want the ability for academics to text mine.

Agreed
 
I want there to be a robust market for text mining that includes companies who mine open access content for their own reasons as well as academics, and I want there to be a robust market of startups who provide those text mining services (and thus must make and distribute copies of corpuses as validation sets, as part of collaborations with academics that improve algorithms, and who also produce and sell the outputs of text mining). Right now text mining pretty much sucks, frankly, compared to what it ought it be.

Totally agreed. I have wasted 3 years of my research life.

Non commercial licenses are not just a way to prevent other publishers from reselling content, which is often the focus of the conversation, but a tax on startups and companies who want to treat the literature as data. Here's a short list of companies trying to do just that who are being hamstrung by closed access, and who would be blocked under NC terms: Personalized Medicine (providing auto-annotation of genotypes to doctors' offices), Selventa (providing auto-created hypotheses explaining high throughput experimental biological data), Ingenuity (providing large databases of assertions specific to diseases or tissues). Those three are simply the first ones that jump to mind in startup land. There's ~20 more I know of, and many more that I don't.

Exactly. The whole point is that a bright small group can create great tools and information in months. 3 of our group have done this - but none can be properly deployed as we have to fight restrictive practices.

The uncertainty around content chills venture investment, to boot. If the web had been NC licensed, we would not have google, and pagerank would have remained where it started as an academic theory experiment. That would suck, in my opinion. And the big publishers know this, which is precisely why they add clauses that ban mining to existing licenses and want commercial restrictions. I don't think they're worried about resale. I think they're worried about getting their lunch eaten by new entrants who see the market differently, as Apple did with music, as Google did to Microsoft (and in turn Facebook did to Google). That's why Elsevier has an entire unit devoted to this stuff, run by extremely smart people.

Yes. It is a serious mistake to assume Elsevier is stupid or incompetent. I suspect they are unprepared in some technical areas as they hope to avoid having to deploy

Then there's all of big pharma and biotech, who all maintain libraries and subscriptions, but are often absent from these discussions because of their position on patents.

Non commercial restrictions have *side effects* that are bad for innovation and bad for science. We need entrepreneurs and not just academics.

And they lead to bad  decision making at all levels of science - the information doesn't get through. People often need the literature to be push'ed to them, not just pull'ed. You cannot push NC material and you cannot push Green material. It's only used by those who know

This is not nearly as much of a problem in the humanities on first blush, but the reality is that as text mining gets better, faster, cheaper, and more subtle in the hard sciences, it will bring amazing tools to the humanities as well.

Yes. Actually some of the linguistic groups were among the very early users of computing in the 60's and 70's. Building classical corpora. But they never went outside a small group, for valid technical constraints - punched cards.
 


--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

_______________________________________________
open-science mailing list
open-s...@lists.okfn.org
http://lists.okfn.org/mailman/listinfo/open-science

drllau

unread,
Mar 23, 2012, 3:04:37 PM3/23/12
to diy...@googlegroups.com, Bryan Bishop
Copyright has been one of the more evolved IPR regimes ... the intent was to allow the contents to enter public domain after a respectable period. However, it seems to me that some interests want to leverage patents (20 yr) into copyright (life+30) and then brand it so that essentially it will be encumbered in some form or another. Creative Commons was an attempt to simplify contractual matters (between capital intensive publishers) into attribution, commercial (or not), fungible (mitosis v meiosis). Different combinations give different business models. Legally, courts have accepted space-shifting and (more reluctantly) time-shifting. Publishers are different from dinosaurs in enough foresight to hunker down against Cretaceus bolt from blue.

Taking a deep breath and trying to work out why resistance
1) Publishers are aware that only a small fraction of works are profitable (whether best-sellers, scraping headlines, or the key molecules/secret sauce).
2) They use surplus from fat end of power distribution curve to subsidise the rest
3) Economically comes down to a small numbers of high quality or mass replication (forgotten the biological strategem)

On the other hand, text miners, mashups etc are viewing it as the economic equivalent of the Cambrian explosion looking at new combinations or specialisation. And because the existing strategem of IPR holders (think of the energy in the ecosystem) prevails, the innovators are finding it hard to crawl out of the primorveal soup (apologies for mixing up biological metaphors). Alternating the legislation may or may not change the fitness landscape as what favors one group (say media) would offend others (archival/database). What is the solution? Well, I'm not Jesus Christ here to create fishes and loaves but perhaps there may be an entry vector.

Space v time shifting. Accept that there are tradeoffs ... if you have extensive geographical coverage, then any derived rights be short in temporal resolution. If you want long terms of protection, allow some localisation (within say a single institution). Try to create orthogonal rights to minimise conflicts. Separate content and distribution and encourage diversity. This concept motivated by recent Australia court case where one media giant pay for exclusive football rights, then the competitor offered a TiVo replay after 3 seconds .. to the entire country which effectively devalued the exclusivity arrangement. How might this work out for science publishing ... if you own huge publishing libraries, allow temporary borrowing on say 4 hr max. Data miners could use this to collect say aggregate stats but relinquish any copies after that 4hr period. They could retain some residual rights (eg rebuttal http://knowledgerights.org/group/access/forum/topics/promotor-v-inhibitor-or-right-or-rebuttal) but anyone wanting to cross-ref the original, will need to borrow from their own library (enforcing the geographical diversity).

I do believe that emergent knowledge can benefit society but trying to get past the head-butting and come up with win-win scenarios is hard.

Lawrence
http://www.linkedin.com/in/drllau

On Friday, 23 March 2012 07:35:45 UTC+13, Bryan Bishop wrote:

Bryan Bishop

unread,
Mar 26, 2012, 2:39:22 PM3/26/12
to drllau, open-science, Bryan Bishop, diy...@googlegroups.com
On Fri, Mar 23, 2012 at 2:04 PM, drllau <drlawr...@gmail.com> wrote:
How might this work out for science publishing ... if you own huge publishing libraries, allow temporary borrowing on say 4 hr max. Data miners could use this to collect say aggregate stats but relinquish any copies after that 4hr period.

They won't allow that, because they know someone would just torrent it immediately. There are ways to detect and remove watermarks. I don't think they'd go for that. DeepDyve is not really a solution... they just post up images of all their papers, and then charge you per-paper accessed anyway. But on top of that, they go ahead and make it into an image (rather than a file with words), which makes it even more useless without OCR or something. blah
Reply all
Reply to author
Forward
0 new messages