Hi,
I'd like to offer a restructured version of your data as part of the examples for my library (see announcement on Biostars). The processing would likely resemble this code, though with more "wide"-transformations involved (notably joins), and the result would look something like this gist. The idea is to make more search-indexable, which then allows creating faceted searches such as ICGC's (a project I formerly worked on)
In light of this, I have a few questions:
- Is there any specific license agreement I should be aware of? Any attribution format you would prefer?
- Are all interactions symmetrical and if not, what is the best way to determine which ones are not?
- I initially wanted to use the "combined network" files, but I don't see how the two files could be reliably joined, as they only share the weight column (at least for the current version). Unlike the uncombined counterparts, the filenames cannot be leveraged to determine group/network.
Regards,
Anthony
Hi Anthony,
Sounds great. The license follows that of the original source - almost all are free to use. All the interactions are symmetric. The combined network is a new type of network that combines all the other networks using this method: https://academic.oup.com/bioinformatics/article/26/14/1759/177586
Best,
Gary
--
You received this message because you are subscribed to the Google Groups "genemania-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genemania-disc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genemania-discuss/CAAS0GwBHxhrqr-wy8Jf2HRJGskUtoux_5zcvrOJbjPz%3DbkT53w%40mail.gmail.com.
Hi Gary,
Thanks for clarifying what the combined network is, I had misunderstood it to be a naive concatenation. I ended up crawling the full set of files instead, and just pushed the resulting code: https://github.com/galliaproject/gallia-genemania
The result data file for Homo_sapiens is 3.5GB (gzipped) and each row basically looks like the following (dummified+prettified):
Would it be ok for me to post the full data somewhere? or a subset thereof?
Notes:
1- The code for gallia-genemania will switch to an Apache2 license once gallia-core switches to BSL (in the works). The result data could be Apache2-licensed right away however.Anthony
Hi Anthony - great! Sure, feel free to remix the data and share. FYI, we haven't updated the database in a while, but are working on an update now - definitely this year.
Best,
Gary
To view this discussion on the web visit https://groups.google.com/d/msgid/genemania-discuss/b57e33f1-314e-4ef7-ba21-c2114dfeb9f6n%40googlegroups.com.
Hi Gary,
I uploaded the full set of results: https://github.com/galliaproject/gallia-genemania/tree/init/result
I also included the first JSON object in a prettified form: https://github.com/galliaproject/gallia-genemania/tree/init/result/data/ENSG00000000003.pretty.json
For those unfamiliar with git-lfs:
clone repo
# requires git-lfs installed:
git lfs pull
git lfs pull --include "result/data/genemania.jsonl.gz" # if you only want the one file
I opting for CC-BY-4.0 in the end (for the data): https://github.com/galliaproject/gallia-genemania/blob/init/result/LICENSE-CC-BY-4.0.txt; in a nutshell this means everyone can share/adapt the data as long as they provide adequate credit (see details). I encourage anyone interested in re-using the data to reach out to me regardless.
A few more comments about the actual run:
- correction: the full run takes 4.5h on my machine, not 8h as previously said
- clarification: the hack pertaining to parallelizing Iterator processing is not being used, mostly due to the size of the grouped objects being quite big (which renders this sub-grouping too subject to OOM errors)
- wrapping GNU sort: I added more details to the documentation in this section
Regards,
Anthony
PS: happy to rerun it once the new data is available
To view this discussion on the web visit https://groups.google.com/d/msgid/genemania-discuss/5DAF86FB-5454-4C36-B3BD-BEA057506655%40utoronto.ca.
Hi Anthony - awesome! Thanks so much - I hope people will use it. We’ll announce the new data here when ready.
Best,
Gary
To view this discussion on the web visit https://groups.google.com/d/msgid/genemania-discuss/CAAS0GwB75UB1h_2enxbsPphK931HhmS5ceEQxC2SwPExduGuiQ%40mail.gmail.com.
Thanks Anthony. Hopefully people see it in this email forum (or elsewhere) and find it valuable.
Best,
Gary
To view this discussion on the web visit https://groups.google.com/d/msgid/genemania-discuss/370263f1-30bf-4f54-91ec-64562998afe9n%40googlegroups.com.