Gene patents, long-term status of NCBI data

63 views
Skip to first unread message

Bryan Bishop

unread,
Apr 17, 2013, 5:36:53 AM4/17/13
to diybio, Bryan Bishop
I was curious to imagine what might happen in the worst-case scenario
if SCOTUS happens to rule in favor of human gene patents. In
particular, what sort of guarantees are in place that the wonderful
resources at NCBI will continue to be maintained? What about the data
in GenBank or Entrez?

For example, what happens if there's an over-eager Bayh-Dole
Compliance Office that insists on removing records from these
databases?

As it turns out, the data on NCBI isn't in the public domain. Specifically:

"""
Databases of molecular data on the NCBI Web site include such examples
as nucleotide sequences (GenBank), protein sequences, macromolecular
structures, molecular variation, gene expression, and mapping data.
They are designed to provide and encourage access within the
scientific community to sources of current and comprehensive
information. Therefore, NCBI itself places no restrictions on the use
or distribution of the data contained therein. Nor do we accept data
when the submitter has requested restrictions on reuse or
redistribution. However, some submitters of the original data (or the
country of origin of such data) may claim patent, copyright, or other
intellectual property rights in all or a portion of the data (that has
been submitted). NCBI is not in a position to assess the validity of
such claims and since there is no transfer or rights from submitters
to NCBI, NCBI has no rights to transfer to a third party. Therefore,
NCBI cannot provide comment or unrestricted permission concerning the
use, copying, or distribution of the information contained in the
molecular databases.
"""

http://www.ncbi.nlm.nih.gov/About/disclaimer.html

Just to re-iterate:

"""
However, some submitters of the original data (or the country of
origin of such data) may claim patent, copyright, or other
intellectual property rights in all or a portion of the data (that has
been submitted). NCBI is not in a position to assess the validity of
such claims and since there is no transfer or rights from submitters
to NCBI, NCBI has no rights to transfer to a third party.
"""

In fact, there's a Bayh-Dole Compliance Office at NIH itself:

http://www.ott.nih.gov/licensing_royalties

"""
Each year, hundreds of new inventions are made in NIH and FDA
laboratories. The Office of Technology Transfer (OTT) transfers these
inventions – through licenses – to the private sector for further
research and development and eventual commercialization.
....
OTT generally seeks the broadest possible patent protection for
commercially valuable inventions and initiates this process by filing
an application for a patent in the U.S. Patent and Trademark Office
(USPTO).
....
"""

So anyway, I am a little startled about this. I thought that this data
was in the public domain. But instead, all of these companies doing
whole genome assembly and BLASTing based on this data, are probably
exposed to lots of unknown levels of legal risk. And this data isn't
free.

I find it a little strange that there's been no major push to focus
exclusively on public domain, commons or open source licensed data for
the world's genomic revolution. What's going on here??

http://en.wikipedia.org/wiki/Celera

"""
Celera sequenced the human genome at a fraction of the cost of the
public project, approximately $3 billion of taxpayer dollars versus
about $300 million of private funding. However, a significant portion
of the human genome had already been sequenced when Celera entered the
field, and thus Celera did not incur any costs with obtaining the
existing data, which was freely available to the public from GenBank.
Celera's use of the shotgun strategy spurred the public HGP to change
its own strategy, leading to a rapid acceleration of the public
effort.

...

Critics of initial efforts by Celera Genomics to hold back data from
sections of genome they sequenced for commercial exploitation felt
that it would retard progress in science as a whole. These critics
pointed to the open access policy for gene sequences from the publicly
funded Human Genome Project. Later, the company changed their policy
and made their sequences available for non-commercial use but set a
maximum threshold for amount of sequence data that a researcher could
download at any given time.

...

Celera initially announced that it would seek patent protection on
"only 200–300" genes, but later amended this to seeking "intellectual
property protection" on "fully-characterized important structures"
amounting to 100–300 targets. The firm eventually filed preliminary
("place-holder") patent applications on 6,500 whole or partial genes.
Celera also promised to publish their findings in accordance with the
terms of the 1996 "Bermuda Statement", by releasing new data annually
(the HGP released its new data daily), although, unlike the publicly
funded project, they would not permit free redistribution or
scientific use of the data.

...

In March 2000, President Clinton announced that the genome sequence
could not be patented, and should be made freely available to all
researchers. The statement sent Celera's stock plummeting and dragged
down the biotechnology-heavy Nasdaq. The biotechnology sector lost
about $50 billion in market capitalization in two days.
"""

So obviously, one of the biotech industry dreams was to license out
data about genes, genomes, proteins, epigenomics, etc. For the most
part, Big Pharma and Big Bio demonstrate (just like the rest of the
corporate world) high intrinsic motivation to make sure their data is
legal. For the most part, all of the assembly companies I've been
hearing about are just using raw gene information from NCBI's
servers... so basically, everyone. Seems pretty shaky. Maybe I am
misinformed.

How about an actually public database of bioinformatics data? Until
recently, the only business model floating around has been "charge
licenses to bioinformatics data". But you could also choose to focus
on permissively licensed data, perhaps such an entity eventually
becoming a force of its own against legally grey territory like NCBI.
I could imagine a variety of services that sort of organization could
offer, like bioinformatics legal auditing (license compliance, or
figuring out how much unlicensed data you've been exposed to), plus
core lab services, etc. There's probably something viable in that
direction.

Failing that, I think we should get Archive Team to backup NCBI's data.

Failing that too, I think we should at least monitor which genes,
proteins and other molecules are being removed, if any, and if there
is any removal then whether or not that rate is increasing or
decreasing.

- Bryan
http://heybryan.org/
1 512 203 0507

Nathan McCorkle

unread,
Apr 17, 2013, 5:58:08 AM4/17/13
to diybio
starting an archiving effort might also be useful for the biobricks/partsregistry.org data

would we just start with scraping nucleotide and protein sequence data?

does NCBI/entrez/genbank have a 'how many gigabytes of space we currently use' statistic anywhere?

isn't NCBI/entrez/genbank all different portals to the same data? I could be wrong, but that was my impression.




--
-- You received this message because you are subscribed to the Google Groups DIYbio group. To post to this group, send email to diy...@googlegroups.com. To unsubscribe from this group, send email to diybio+un...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/diybio?hl=en
Learn more at www.diybio.org
---
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To unsubscribe from this group and stop receiving emails from it, send an email to diybio+un...@googlegroups.com.
To post to this group, send email to diy...@googlegroups.com.
Visit this group at http://groups.google.com/group/diybio?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.





--
-Nathan

Bryan Bishop

unread,
Apr 17, 2013, 6:17:10 AM4/17/13
to diybio, Bryan Bishop, Nathan McCorkle
On Wed, Apr 17, 2013 at 4:58 AM, Nathan McCorkle <nmz...@gmail.com> wrote:
> would we just start with scraping nucleotide and protein sequence data?

No, you would download it from their FTP server.

ftp://ftp.ncbi.nlm.nih.gov/

For example:

ftp://ftp-trace.ncbi.nih.gov/1000genomes/
http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/genomes/

Here's a partial overview:

http://www.ncbi.nlm.nih.gov/Ftp/

There are various mirrors already floating around but I am never quite
sure how stable they are, how updated they are, etc...

ftp://bio-mirror.net/biomirror/genbank/
http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/

The directory indexes seem to claim they were updated 2013-04-16.
Dunno if that means complete coverage.

> does NCBI/entrez/genbank have a 'how many gigabytes of space we currently
> use' statistic anywhere?

GenBank has doubled in size every 18 months since its creation, so you
could extrapolate from there. 1000genomes is over 280 terabytes.

This article seems to claim NCBI can accommodate up to 30
terabytes/day of new data:
https://biosci-batzerlab.biology.lsu.edu/Publications/Clarke_et_al_2012_Nature_Methods.pdf

Maybe a few petabytes?

Cathal Garvey

unread,
Apr 17, 2013, 6:44:38 AM4/17/13
to diy...@googlegroups.com
# --- Python code ensues
# Requires requests and beautiful soup four: install using
# "sudo pip-3.2 install bs4 requests" (preferably with "lxml" too)
import requests
import bs4

refseq_ftp = "http://ftp.ncbi.nih.gov/refseq/release/microbial/"
genomes = requests.get(refseq_ftp)
link_and_size = []
for x in genomes.text.splitlines():
x = x.strip()
if x[:8] == "<a href=":
xline = bs4.BeautifulSoup(x)
xsize = x.rsplit(None,1)[1]
xline = xline.find("a")
link_and_size.append((xline.attrs.get("href","not found"), xsize))

microbe_dict = {}
for entry in link_and_size:
microbe, data, format = entry[0].split(".")[:3]
size = entry[1]
if size[len(size)-1] == "M":
factor = 6
if "." in size: factor = 5
size = int(size.rstrip("M").replace(".","") + "0"*factor)
elif size[len(size)-1] == "K":
factor = 3
if "." in size: factor = 2
size = int(size.rstrip("K").replace(".","") + "0"*factor)
else: size = int(size)
microbe_subdict = microbe_dict.setdefault(microbe,{})
data_subdict = microbe_subdict.setdefault(data, {})
data_subdict[format] = {"link":refseq_ftp+entry[0],"size":size}

just_genbank_genomes = {}
for microbe in microbe_dict:
microbed = microbe_dict[microbe]
if "gbff" in microbed.get("genomic",{}):
just_genbank_genomes[microbe] = microbed['genomic']['gbff']

total_megabytes = sum([x['size'] for x in
just_genbank_genomes.values()]) / 1000000

import json
with open("all_refseq_ftp_links","w") as OutF:
json.dump(microbe_dict, OutF)
with open("all_genbank_refseq_genomes","w") as OutF:
json.dump(just_genbank_genomes, OutF)
print("You would need",total_megabytes,"of space to store the refseq
microbial genome database from NCBI, arguably one of the most important.")
# --- End Python Code
--
Please note my new email: cathal...@cathalgarvey.me
PGP Key: 988B9099
Bitmessage: BM-opSmZfNZHSzGDwdD5KzTnuKbzevSEDNXL
Twitter: @onetruecathal
Code: https://gitorious.org/~cathalgarvey
Blog: http://www.indiebiotech.com

Cathal Garvey

unread,
Apr 17, 2013, 6:48:20 AM4/17/13
to diy...@googlegroups.com
Allow me to spoil my own party with the answer, for those too
unfortunate to be using Linux with a well-loaded python installation:
The answer, for refseq microbe genomes at least, is currently 5540.76
megabytes, small enough to fit on a cheap USB drive.

Refseq genomes are the select "cool gang" of well annotated genomes,
and should include most/all species that we care most about. This
includes agri/biotech, medicine and common gut floral microbes.

There are also refseq databases for other domains of life, but I feel
the microbial databases are most vulnerable to
stupid-but-powerful-people deciding they are "too dangerous" for
general academic consumption.

On 04/17/2013 11:17 AM, Bryan Bishop wrote:

Cathal Garvey

unread,
Apr 17, 2013, 6:52:20 AM4/17/13
to diy...@googlegroups.com
Hmm, have noticed an error in my code, where some sets of genomes may
get accidentally crushed into one dict under numeric keys due to
inconsistent use of characters in the filenames. So, actual figure may
be higher than what I quoted, but probably not much. I'm still
guessing less than 8Gb.

Cathal Garvey

unread,
Apr 17, 2013, 7:02:13 AM4/17/13
to diy...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I stand corrected! The total microbial refseq database would in fact
take 14834.76 megabytes, requiring a capital investment of (in rural
Ireland) ~?30 of bandwidth and ?8 for a 16GB USB drive.

Updated code:

# === Python code Starts =
import requests
import bs4

refseq_ftp = "http://ftp.ncbi.nih.gov/refseq/release/microbial/"
genomes = requests.get(refseq_ftp)
link_and_size = []
for x in genomes.text.splitlines():
x = x.strip()
if x[:8] == "<a href=":
xline = bs4.BeautifulSoup(x)
xsize = x.rsplit(None,1)[1]
xline = xline.find("a")
link_and_size.append((xline.attrs.get("href","not found"), xsize))

microbe_dict = {}
for entry in link_and_size:
microbe, data, fformat = entry[0].rsplit(".",3)[:3]
size = entry[1]
if size[len(size)-1] == "M":
factor = 6
if "." in size: factor = 5
size = int(size.rstrip("M").replace(".","") + "0"*factor)
elif size[len(size)-1] == "K":
factor = 3
if "." in size: factor = 2
size = int(size.rstrip("K").replace(".","") + "0"*factor)
else: size = int(size)
microbe_subdict = microbe_dict.setdefault(microbe,{})
data_subdict = microbe_subdict.setdefault(data, {})
data_subdict[fformat] = {"link":refseq_ftp+entry[0],"size":size}

just_genbank_genomes = {}
for microbe in microbe_dict:
microbed = microbe_dict[microbe]
if "gbff" in microbed.get("genomic",{}):
just_genbank_genomes[microbe] = microbed['genomic']['gbff']

total_megabytes = sum([x['size'] for x in
just_genbank_genomes.values()]) / 1000000

import json
with open("all_refseq_ftp_links","w") as OutF:
json.dump(microbe_dict, OutF)
with open("all_genbank_refseq_genomes","w") as OutF:
json.dump(just_genbank_genomes, OutF)
print("It would take",total_megabytes,"MB of bandwidth/storage to
download a backup copy of the Microbial Refseq database from NCBI.")
# === Python code Ends ===

On 04/17/2013 11:17 AM, Bryan Bishop wrote:
- --
Please note my new email: cathal...@cathalgarvey.me
PGP Key: 988B9099
Bitmessage: BM-opSmZfNZHSzGDwdD5KzTnuKbzevSEDNXL
Twitter: @onetruecathal
Code: https://gitorious.org/~cathalgarvey
Blog: http://www.indiebiotech.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJRboEzAAoJEL0iNgSYi5CZ5yUQALrG4oiueKtAYxK+soEnWNEw
TyjhSaI9RQzqKKnRxljcYnF+dROfuWaGSblrxuvLP3nJD7TgIW56h6dfpqvywSaW
vK5whYxGxI4B2zZoiqeuUVhwZxMoRfcTbFnYQzHMHs3iTXKi+/RGu08e9cCbhtps
WlWLfbUjf6MQTv3FOUZPQH/KUs7NgerPpg1AEmrmzLHgDdH56VnJvfya4cHf34Pa
wB+vr7afzhKIXL51ld1NPUJnX2cGkeUHGAXEDgKNKCV+pwjPpm6YVdXqhk2uDgn/
NaczYIo9UybDFihy+MM/FwJ8hMg4HTONRRRwq6il5+IntrYsnGguNVSieSk69M3v
3INmOpc7HPrwDs44nYuPgZTrR9T4jtekc0zE/8SRaaR6l43bqSDxVqkSEv3ajxf5
CpOXhhcrlTXJpEFtJ4qBKx6DsxWQR/TUao/XVItirnix7WFgbxRhFNvIHA1WxzuP
DMQ4/ML88tzwgAYUdlAUK3SCsoFZWk0sZK7qLKDzxQPyMuprFFC1nO4geQwpTLXR
hq5VNyKCm0XibCitw6BJtX6yfDV8kuurVGcnI60LDQM+1By3aABmqO/scoEZZ5dF
gpYkawkDDiO05pkEqBJ13sLX4oFIpWEVt/VUeOo1gf1ChuVoiRV54hAekPj9Uhd0
FjZdxz0sNvEjI+2pqvm1
=OThz
-----END PGP SIGNATURE-----

Iván Esteban Araya

unread,
Apr 17, 2013, 7:05:12 AM4/17/13
to diy...@googlegroups.com

^.^ dont worry about it, if you dont know the EU and japan have an independent but each 24h updated copy of the NBCI database. For example, if I summited a sequence to the japanese database version, eventually it will be on the US and EU database.

And respect to the intellectual property issues, i think you miss understand how patents works. The core idea of this kind of protection is this: goverment giveme the legal right to lincense, exclusively profit from it and demand ppl who is using it for commercial and only commercial purpose without my consent, but goverment demand to make the information public on a document that we know as patent.

On summary, if any dna, protein, etc is patented, that information is public, the patent owner cant ban it for be used on the NBCI database (in the case of US).

lol, companies know that if they want to be the only ppl who know about a discover or invention that they developed, the only way is it to keep it as commertial secret because patents dont work that way.

If you arent using a patented invention for commercial purposes, you are free of use it

Cathal Garvey

unread,
Apr 17, 2013, 8:24:18 AM4/17/13
to diy...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

> ^.^ dont worry about it, if you dont know the EU and japan have an
> independent but each 24h updated copy of the NBCI database.

No guarantee there. Every time the USG goes nuts about some improbable
"terrorism" risk, the EU slavishly obeys; look at how similar
"security" theater is between the two continents. Likely the same
story for Japan. If the US said "B.cereus genome is a security threat,
and we'll be leery about trading/granting visas/expediting transport
with anyone who makes it public", I *guarantee* you my own simpering
idiot government would leap to block NCBI mirrors that still had said
genomes on file.

> If you arent using a patented invention for commercial purposes,
> you are free of use it

Not much of a guarantee either, sadly. For one thing, AFAIK this isn't
true at all in the US, patent violations include personal use of patents.

For another, "commercial use" is so poorly defined that it's been a
sticking point for Creative Commons licensing of cultural matter for
years now; if I want to do x, and x is vaguely or tangentially related
to the movement of money, does that count as commercial? Often the
answer is "you'll find out if the copyright holder decides to sue you".

I imagine it's a similar situation with patents; what if I make a
blogpost about something that would require a patent license for
commercial use, but it's my own personal project.. on a website with
ads? A website affiliated with or sponsored by companies? A website on
which I also discuss my commercial work (could therefore be argued
that I am drawing traffic with the unlicensed patent-related work to
commercial material, benefiting materially from it)?

Patents violate innovation, and whenever you are innovative, you must
be wary of patents. The only long-term solution is abolition: join a
Pirate Party today! ;)
> -- -- You received this message because you are subscribed to the
> Google Groups DIYbio group. To post to this group, send email to
> diy...@googlegroups.com. To unsubscribe from this group, send email
> to diybio+un...@googlegroups.com. For more options, visit
> this group at https://groups.google.com/d/forum/diybio?hl=en Learn
> more at www.diybio.org --- You received this message because you
> are subscribed to the Google Groups "DIYbio" group. To unsubscribe
> from this group and stop receiving emails from it, send an email to
> diybio+un...@googlegroups.com. To post to this group, send
> email to diy...@googlegroups.com. Visit this group at
> http://groups.google.com/group/diybio?hl=en. For more options,
> visit https://groups.google.com/groups/opt_out.
>
>

- --
Please note my new email: cathal...@cathalgarvey.me
PGP Key: 988B9099
Bitmessage: BM-opSmZfNZHSzGDwdD5KzTnuKbzevSEDNXL
Twitter: @onetruecathal
Code: https://gitorious.org/~cathalgarvey
Blog: http://www.indiebiotech.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJRbpRwAAoJEL0iNgSYi5CZhw4P/jt6Gt/zjvXykRKpwCnMnijU
BBIxJ5hVYCIk+rCPgthUl0ifiAxo7r7/OMPOJhlkEJEUoPZqC+WeG2dhxXOX3cs/
52tKhK4DeiTyuhlD2Jw8gDqIAWlGxK700uJjGi8IN7TZvrLhXBa3i401b7FSdYSm
BoWQEh2Gb2ezRIsri9BoVPjE8UnY1P98bknIbo7MEYBhLFzqUgy3UVc40t+uU8gI
BfeMWxmOyq2NnEsbq61Lxm5GMYFblIoTcOOzBTPttb4YM0oLancrReLhSCgcgJm5
f4GX8fOPgGa3EjzddqIleKDKOXeniSFn/pOmHDNNkEOxrNIAojlKXCNmKH5MYgog
4yalvEAI73Fl44RVqNWOZj8m2GKNectPnftrKk7IOQw9/nWW/vp5XCxPtGZ38AQR
aUktiDdvtx5SjckVzeT64CcKzPWaADnsbb1P0W7/njAI/w31UgJQhA3AMPBNdOxn
TlLigPZpgeRH8hFWq2Geiq0W4S9y6gXcNUuj7rhtZy55/2nC3KRDz1yNml6Gk9yW
9oEJaEOyuT8NI49/iibqCdZozIkY0+LytoVtXy6BAQ7YuepUS20tOz51MUAOtjl2
f/R4dZ2VE4VPTI1jsWr7tYxxA1Pv42r/q7cQOWL5ACpTKc07sJL+DBPfvOqmIZeT
RGoRSkmW8QiXem05iihy
=B6zn
-----END PGP SIGNATURE-----

SC

unread,
Apr 17, 2013, 9:35:22 AM4/17/13
to diy...@googlegroups.com, Bryan Bishop
Hi everyone,
 
A sequence doesn't have to be in public domain for someone to be able to use it.  NCBI also hosts PubMed, a huge collection of journal articles and references.   The articles are not in the public domain (they are copyright-protected), but anyone can use them like any other publication.  What you can't do it copy it verbatim and put your name as the author.  Sequence data is like that, too.  There is a separate patent division in Genbank for sequences with patents associated with them, but the only restriction is commercial use.  Also, patents don't last forever. I think the patent for Taq polymerase has already expired.
 
Regarding removal of sequences deemed "dangerous", I think the genie is already out of the bottle on that one.  NCBI data is backed up on hundreds, maybe thousands, of servers worldwide, both public and private.  It would be impossible to censor them all.
 
Anyone is welcome to doenload the data from the ftp site, although as Cathal mentioned, you may want to select the data types you are interested in for ease of download and storage. I'm a big Refseq fan.  While Genbank is a primary archive, Refseq is a curated set which has more consistant and accurate annotation.  Both are updated daily, but also have quarterly releases which I personally prefer.

Cathal Garvey

unread,
Apr 17, 2013, 10:27:33 AM4/17/13
to diy...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I would only dispute your use of the word "already" to describe the
expiry of one of modern mol-bio's oldest tools. That it's taken 20
years for biotech to be allowed to homebrew the most basic enzymes in
the cookbook is a huge hindrance.

There will be many more such bottlenecks, and wherever a bottleneck is
too onerous it will be circumvented; I know labs who illegally brewed
their own taq, as do many on this list. Might an ACTA-style treaty ban
access to sequence data to protect big biotech's profits? It's not an
unthinkable outcome these days.

I'd be amused to see Refseq on The Pirate Bay; a pre-emptive Streisand
Effect. :)
> -- -- You received this message because you are subscribed to the
> Google Groups DIYbio group. To post to this group, send email to
> diy...@googlegroups.com. To unsubscribe from this group, send email
> to diybio+un...@googlegroups.com. For more options, visit
> this group at https://groups.google.com/d/forum/diybio?hl=en Learn
> more at www.diybio.org --- You received this message because you
> are subscribed to the Google Groups "DIYbio" group. To unsubscribe
> from this group and stop receiving emails from it, send an email to
> diybio+un...@googlegroups.com. To post to this group, send
> email to diy...@googlegroups.com. Visit this group at
> http://groups.google.com/group/diybio?hl=en. To view this
> discussion on the web visit
> https://groups.google.com/d/msg/diybio/-/zcE_KJjULDsJ. For more
> options, visit https://groups.google.com/groups/opt_out.
>
>

- --
Please note my new email: cathal...@cathalgarvey.me
PGP Key: 988B9099
Bitmessage: BM-opSmZfNZHSzGDwdD5KzTnuKbzevSEDNXL
Twitter: @onetruecathal
Code: https://gitorious.org/~cathalgarvey
Blog: http://www.indiebiotech.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJRbrFUAAoJEL0iNgSYi5CZGCsQAM9eWIZY8JXJq8TLQgoHK6wJ
5S/ekA3vVQS2yAHD7ymgDfvbRRqi1gqJC9RobISQ0MCtgLyqwCirRAgWtJPLwypk
/Ag9c/7exdfbv512Xhw/AzpB7GLWWWKwTnFGEvhdACa5nRHjDYxm/DT27EtA/94U
KrDLv8c5cDKCmM0f48YjtuLWXX78L2lc0fG9gRAqNIZS319wcqEF59aujrGw6NkE
tLulm4jSdsNHInRVTjWSe3Rklxi/8ej+KYGzmACoGYhZKtj4iaQuOaxaBjL6PZBV
J6AZ++dyh6paJD7izjYIERC1pDdcp6q1QBeDqanwnwoHvgxwINnk4CZP0rnlzLwZ
t7Cme80CLYwodQwN9k1CrUVYd6ChuAlaltWVRwSa6w9tkgZ3N7XgT9vfvhJjVgzN
ZNjQoXayaI8ECrBxeN8cdyMLvoRmAfcJn8Te2QuyhVTVXrfBuJTQl36+Qdx7EsXS
3lUwqGJgOjVxOUWhHcnYFpnvT553i5hQ6w7gLB2jYJIt0QqhcplMSPgwMiWsYkCa
NMSrJZMykZzCE9Lx4KEAAxRvHfesv931Tglt0FAuf6eyMu/g/cTxj5yU/bJ2wakV
oM3+EyXinPoKwju6X1d6CrmGAvVP3VSPA95ogIWh2rHh8M6w5Cf48hIA5rfik9N0
EjiOvh0WWyrRd5xKRUwR
=YR4b
-----END PGP SIGNATURE-----

Iván Esteban Araya

unread,
Apr 17, 2013, 10:52:45 AM4/17/13
to diy...@googlegroups.com

I just have to add that industrial protection (patents) are different from copyright. Dont mix them >.<. I dont agree with must of the protection practices used in US, but come on, companies dont play to kill ants (individuals researchers/users like us) because thats ineffecient and usless, we can do all the development we want, and make public the information during the process that anybody can prohibit it. Patent owners only can say something if you came out with a product or service to the marketplace based on their patected invention.

But even more, patents are just territorial and US isnt the must atractive market for must of the biotechnology sectors.  lol ... me and hundreds of companies use patented inventions from US and others countries that the patent owner dont protect on ours countries, and we are free to us for commercial purposes on this markets, just look China, southeast asia, India, east europe, latinamerica ... ect .On this regions companies cant applied to a dna patent or any similar because the law prohibit.

Anyway, yes ... patent protection in US is a problems for companies but not for research purposes. Patent is a commercial tool, not a development limitation ....... that also is true on biotechnology and related ares that a lot of money is needed and the only long term to support it is by take a bit from the market as soon as possible (patents on dna, etc that are basic tools not products/services for me) ... thats a other story.

But well, this my point of view

Josiah Zayner

unread,
Apr 17, 2013, 1:28:53 PM4/17/13
to diy...@googlegroups.com, Bryan Bishop
Hey Cathal Try Perl it works better:

#!/usr/bin/perl
$url = "http://ftp.ncbi.nih.gov/refseq/release/microbial/";
`curl "$url" -o microbe`;
open(FILE, "microbe");
@file = <FILE>;

foreach $line (@file)
{
 if($line =~ m/href/g)
 {
  @stuff = split(/\s+/,$line);
  if($stuff[4] =~ m/M/g)
  {
    $stuff[4] =~ s/[a-z]//ig;
    $total += ($stuff[4] * 1000000);
  }
  elsif($stuff[4] =~ m/K/g)
  {
     $stuff[4] =~ s/[a-z]//ig;
     $total += ($stuff[4] * 1000);
  }
  else
  {
     $stuff[4] =~ s/[a-z]//ig;
     $total += ($stuff[4]);
  }
  $a++;
 }
}
$tot = $total / 1000000;
$av = $tot / $a;
print "\nTotal: $tot MB\nAverage FIle Size: $av MB\n";

--------------------------------------------------------

Sorry. I couldn't pass up a chance to take a shot at Python.

Cathal Garvey

unread,
Apr 17, 2013, 2:23:31 PM4/17/13
to diy...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Strength in diversity! Of course, I wasn't using Python's regex, so
it's not a fair match. For such a trivial scraping task, beautiful
soup was overkill.. ;)

Actually, the code you posted looks almost identical to the Python
regex code I'd have used; intepreted languages ftw.
> these inventions � through licenses � to the private sector for
> "only 200�300" genes, but later amended this to seeking
> "intellectual property protection" on "fully-characterized
> important structures" amounting to 100�300 targets. The firm
> -- -- You received this message because you are subscribed to the
> Google Groups DIYbio group. To post to this group, send email to
> diy...@googlegroups.com. To unsubscribe from this group, send email
> to diybio+un...@googlegroups.com. For more options, visit
> this group at https://groups.google.com/d/forum/diybio?hl=en Learn
> more at www.diybio.org --- You received this message because you
> are subscribed to the Google Groups "DIYbio" group. To unsubscribe
> from this group and stop receiving emails from it, send an email to
> diybio+un...@googlegroups.com. To post to this group, send
> email to diy...@googlegroups.com. Visit this group at
> http://groups.google.com/group/diybio?hl=en. To view this
> discussion on the web visit
> https://groups.google.com/d/msg/diybio/-/x5UwRlU3_I8J. For more
> options, visit https://groups.google.com/groups/opt_out.
>
>

- --
Please note my new email: cathal...@cathalgarvey.me
PGP Key: 988B9099
Bitmessage: BM-opSmZfNZHSzGDwdD5KzTnuKbzevSEDNXL
Twitter: @onetruecathal
Code: https://gitorious.org/~cathalgarvey
Blog: http://www.indiebiotech.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJRbuiiAAoJEL0iNgSYi5CZ8GUP/3X1W+I3g+jKjosAkZBup174
VNgIm7nYvkYKeYsx0fLTzT8Bxi2RY2hI7e/Y3fK1XNlgu008e/14Wb9C1ZB9cjvI
mEbYUd/WpgC/xjXpjJLzoqA3FevwPA1G1cr58NpBaVY7vGCG7N+3MihlUjC3tt1X
vHO5J1s3ItBF0ANc1W3G8O9yDHaHrLnBBqOmX8RRn//ydo805D+k09cPbHiBWDev
BPFPXJwQeO/asybqwTTDCpgWBok+0xA1pfLSvKEEfnPudYUpQkMiIUEUJExZkIz5
iI7uwWEiCKpUhr6UBpVk55jtoOKGOBcB6j41L21sISd2oMnHsq74HteA/s27DlWI
OKGimf5C2PjKZ4klEnFLHjoJTLjlJ02r4KEbB7oQofbjWnExu5DDZtFbZnvPF7h3
s1gia9XVguQOaWYUjrD9qpPrZGTV8YyuTuOCDk48kOOzyGA696t4K5GxtGtErBO7
dXU7Bkzkt4aTDma3M0xZxA8ejejqLB1r2/bU+YSjYtqyJFeWpQCDW2f1P+j94cJu
INezAltTmJ//KcG/nz+Zg3xwcfN/kSxak23p9r854pPI6WTxfl/87HU+358nLMqx
oZj9zaKX1P94y/8P2+agRk65bnS40o8DKGUAsxzAnmnDK9NM0k19l+jjhRQnOyvD
K2eD2udwguf8VngxLnRB
=QjKT
-----END PGP SIGNATURE-----

Patrik D'haeseleer

unread,
Apr 18, 2013, 2:59:37 AM4/18/13
to diy...@googlegroups.com


On Wednesday, April 17, 2013 4:05:12 AM UTC-7, Iván E. Araya wrote:

^.^ dont worry about it, if you dont know the EU and japan have an independent but each 24h updated copy of the NBCI database. For example, if I summited a sequence to the japanese database version, eventually it will be on the US and EU database.

And respect to the intellectual property issues, i think you miss understand how patents works. The core idea of this kind of protection is this: goverment giveme the legal right to lincense, exclusively profit from it and demand ppl who is using it for commercial and only commercial purpose without my consent, but goverment demand to make the information public on a document that we know as patent.

Yeah - what Iván said.

Keep in mind that patenting of gene sequences is *already* the status quo right now! So if the SCOTUS rules in favor of Myriad, exactly nothing will change.

Frankly, the sequester and similar budgeting shenanigans are a much greater threat to the survival of NCBI as a resource. But yes, all of that data is mirrored dozens of times over in genome centers across the world.

Patrik

Tom Randall

unread,
Apr 18, 2013, 11:32:53 AM4/18/13
to diy...@googlegroups.com

SRA alone is >1000 Tb (http://www.ncbi.nlm.nih.gov/Traces/sra/), where are you going to put it?

Bryan Bishop

unread,
Apr 18, 2013, 3:33:08 PM4/18/13
to diy...@googlegroups.com, Iván Esteban Araya, Bryan Bishop
On Wed, Apr 17, 2013 at 6:05 AM, Iván Esteban Araya wrote:
> ^.^ dont worry about it, if you dont know the EU and japan have an
> independent but each 24h updated copy of the NBCI database. For example, if

You might be referring to their UDP-packet-streaming mirroring
service. Are there any third party mirrors that are not in direct
collaboration with NCBI?

> I summited a sequence to the japanese database version, eventually it will
> be on the US and EU database.

Do you mean you uploaded it to the Japanese FTP mirror? or something else?

> And respect to the intellectual property issues, i think you miss understand
> how patents works. The core idea of this kind of protection is this:
> goverment giveme the legal right to lincense, exclusively profit from it and
> demand ppl who is using it for commercial and only commercial purpose

Nope, in the United States (and other countries) it also includes
rights to restrict individuals from using the invention or patented
material. The scope of monopoly that a patent grants is much greater
than commercial litigation.

"""
For example, if a patent is filed in the United States, then anyone in
the United States is prohibited from making, using, selling or
importing the patented item, while people in other countries may be
free to make the patented item in their country.
...
In United States law, an infringement may occur where the defendant
has made, used, sold, offered to sell, or imported an infringing
invention or its equivalent.[7]
...
In the United States, a patent provides its proprietor with the right
to exclude others from utilizing the invention claimed in that patent.
Should a person utilize that invention, without the permission of the
patent proprietor, they may infringe that patent.
...
Research for "purely philosophical" inquiry is not an infringement,
but research directed to commercial purposes is - unless the research
is directed toward obtaining approval of the Food and Drug
Administration (FDA) for introduction of a generic version of a
patented drug (see Research exemption and Hatch-Waxman Act).
"""
https://en.wikipedia.org/wiki/Patent_infringement
https://en.wikipedia.org/wiki/Patent_infringement_under_United_States_law

Also:

"""
The only thing you don't need a license for is for purely speculative
work, generally interpreted as nothing more than idle curiosity. Even
research (such as trying to build a new invention out of an old one)
is technically patent infringement unless you have a license.

Here's the actual law:

> Except as otherwise provided in this title [35 USCS Sects. 1 et
> seq.], whoever without authority makes, uses or sells any patented
> invention, within the United States during the term of the patent
> therefor, infringes the patent.

http://www.law.cornell.edu/patent/35uscs271.html

It says nothing about noncommercial use.

If you want confirmation, ask Duke University. It tried to argue that
non-commercial research use should be protected in the case Madey v.
Duke University. It lost. The Federal Circuit (the court that deals
with patent appeals) held that research use is still use of the
invention and still violates the law.
http://www.bakerbotts.com/infocenter/publications/detail.aspx?id=b7930f1d-b945-4f95-b825-fa9ac70c16af

Here's another good summary of the Duke case:

> The U.S. Court of Appeals for the Federal Circuit denied an
> "experimental use defense" in a patent infringement lawsuit against
> Duke University, signaling that academic researchers may be liable
> for use of patented equipment and processes even without use for
> commercial purposes. The court declared that the noncommercial
> character of the research in Madey v. Duke University was
> irrelevant. What matters is whether the research "is in keeping
> with the alleged infringer's legitimate business, regardless of
> commercial implications." In the case of a university,
> noncommercial research is "legitimate business," subject to the
> patent laws.

http://www.sciencemag.org/cgi/content/summary/299/5609/1018?siteid=sci&ijkey=kOiAnw9uhtbsM&keytype=ref
"""
https://news.ycombinator.com/item?id=162467

Here's some text about research "exemptions":
http://blog.patentology.com.au/2011/03/patent-reform-exposed-part-vi.html

But it is unclear whether or not research exemptions only apply to
institutional research. That last link notes that the "experimental
exemption" in UK law has been interpreted very narrowly to mean
"interrogating the invention" (like testing it) and not other uses of
the invention in a research setting. But most of us aren't in a
research institution anyway, so that doesn't even apply...

And yes, I agree that it's unlikely that a random patent's owner is
going to want to litigate against non-commercial non-research use, but
even a minor number of exclusive rights can do massive damage (see
examples in copyright litigation).

> On summary, if any dna, protein, etc is patented, that information is
> public, the patent owner cant ban it for be used on the NBCI database (in
> the case of US).

I don't think that's true, there's no indication that NCBI resists
takedown requests from patent owners. Additionally, someone that owns
copyright on some data could probably successfully DMCA takedown some
things if they bothered to try.

> If you arent using a patented invention for commercial purposes, you are
> free of use it

IANAL.
Reply all
Reply to author
Forward
0 new messages