Why are we leaving it to Google?

48 views
Skip to first unread message

Peter Kraker

unread,
Sep 10, 2018, 9:29:10 AM9/10/18
to open-science, opencon-dis...@googlegroups.com
As you may have heard, Google is building a search engine for datasets
(https://www.blog.google/products/search/making-it-easier-discover-datasets/).
This means yet another proprietary index on top of our own data that
nobody can reuse - and another inferior list-based interface that will
be pushed onto scientists by Google's sheer market dominance.

At the same time, there is no funding for a community-driven open source
alternative. We should not leave the field to Google, or science will be
poorer for it! We finally need to invest in a true open science
contender for research data discovery. This would bring meaningful
competition to this space and drive innovation.

This is why we have launched the #DontLeaveItToGoogle campaign. I
encourage you to add your own voice to the discussion using this
hashtag. It’s time to change the way we discover research - not
perpetuate the same proprietary model time and time again!

https://twitter.com/PeterKraker/status/1039122274518007808
https://twitter.com/PeterKraker/status/1039122276623503360

Best,
Peter

--
Dr. Peter Kraker
Founder & Chairman, Open Knowledge Maps
Keilgasse 11/20, 1030 Vienna
https://openknowledgemaps.org

Skype: peter_kraker
Twitter: https://twitter.com/PeterKraker
Newsletter: http://eepurl.com/dvQeGP

It's time to change the way we discover research!


Peter Kraker

unread,
Sep 10, 2018, 1:01:21 PM9/10/18
to Thomas Krichel, Emanuil Tolev, open-science, opencon-dis...@googlegroups.com
I agree that DOAJ is great and has a great search engine. We even used
it as our default data source for Open Knowledge Maps for some time.
However, it lacks coverage. Researchers will always choose the search
engine with the largest (perceived) coverage, and that's Google Scholar.
And no one has the resources to beat them in that area. So as long as
Google doesn't allow others to reuse their index, no real innovation in
scholarly discovery will happen.

Aren't you upset by the fact that in every other area you can easily get
an overview of what exists and find exactly what you are looking for
(websites, products etc), but when it comes to research, you are left
with huge piles of papers with little to no context?

And now they are going to do the exact same thing with datasets. I don't
think that we should let that happen.

Best,
Peter

On 10/09/2018 18:39, Thomas Krichel wrote:
> Emanuil Tolev writes
>
>> Well, there's value in having people funded to think about those things
>> independently on medium and long term. That's why https://doaj.org exists.
> I am not disputing the value of DOAJ. DOAJ is a dataset. I'm disputing
> the value of putting resources into building a search engine when
> the same resources could be put into building datasets, and when
> Google already are building a search engine.
>
>> It probably provides a better search experience in some ways and worse in
>> other ways/contexts. Either way the org behind it is very useful and the
>> people on the editorial team have gathered quite a lot of interesting
>> knowledge that orgs like SCOSS and SPARC, foundations, funders etc. then
>> make use of.
> I agree.
>
>> Google eng wouldn't think of the problems in the same way as
>> the open science people involved in the open* mailing lists and DOAJ.
> I'm not sure what Google "eng" is but if you mean engineers, I for
> one would not stereotype them in this way.
>
> BTW, I would be grateful if a maintainer of opencon could add me to
> that group.
>

Peter Kraker

unread,
Sep 11, 2018, 6:18:59 AM9/11/18
to Peter Murray-Rust, Bosman, J.M. (Jeroen), Thomas Krichel, Emanuil Tolev, open-science, opencon-dis...@googlegroups.com
I am all for building a contender to Dataset Search! But let‘s build it on top of existing services, such as BASE that already index datasets. Then integrate it with Open Knowledge Maps, Hypothes.is, ContentMine, WikiCite, OSF and the rest of the open science ecosystem.
Then we would have true contender to the proprietary Googleverse.

The pieces are already there - but who will fund their integration?

Best,
Peter

> On 10.09.2018, at 22:49, Peter Murray-Rust <pm...@cam.ac.uk> wrote:
>
>
>
> I don't have a problem with Google indexing public datasets. I work with Crystallogrophy Open Database which has indexed 350K data sets. What I take exception to is the way that Big Corporations can buy privileged access to paywalled datasets and publications.
>
> I have a tool which will index science (chemistry, crystallography, phylogenetic trees etc.) much better than Google (which doesn't do these) but I am not allowed to use it. So Google and Clarivate are handed a monopoly on indexing the literature even though I can do it better. What is even worse is the way that some publishers (Elsevier) take public data (crystallography) and put it behind the accessWall of the Cambridge Crystallographic Data Centre. Authors think they are making there data Open, They are not, It's being monopolised by CCDC who sells it by subscription and lets 1% or less out to the rest of the world.
>
> I am sure there are many more of these cartels and monopolies.
>
> I am happy to hear from others who want to build an alternative search engine to closed monopolists of the scholarly literature because we can do it better.
>
>
>
> --
> Peter Murray-Rust
> Reader Emeritus in Molecular Informatics
> Unilever Centre, Dept. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069

Naomi Penfold

unread,
Sep 11, 2018, 6:54:30 AM9/11/18
to Peter Kraker, Peter Murray-Rust, Bosman, J.M. (Jeroen), Thomas Krichel, Emanuil Tolev, open-science, opencon-dis...@googlegroups.com
Hi Peter, and all,

I mirror your concerns about Google's market dominance potentially steering researchers to a product that is not helpful for increasing transparency and access to science.

When considering an alternative, DataCite and DataVerse seem to me to be well positioned in this space.

Is there anything about https://www.re3data.org/search that you find insufficient for these goals? What are your requirements for improving discoverability of open research data? Which of these are met and not met with Google dataset search and re3data?

Best,
Naomi

--
OpenCon is empowering the next generation to advance Open Access, Open Data and Open Education. This group is for participants of the conference and community, either in person, at satellite events or through the webcasts, videos (etc) to interact, share updates, ideas and discuss relevant topics. So please, join in the discussion! We would love to hear about projects you're working on, questions you've got, news and events. 

New to the list? Say hello: http://bit.ly/SayHelloToOpenCon. Your first post will be moderated (only because otherwise we get spam).
OpenCon code of conduct: opencon2015.org/code_of_conduct

Want even more? Check out our other community initiatives: OpenCon2015.org/community
---
You received this message because you are subscribed to the Google Groups "OpenCon Discussion List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencon-discussio...@googlegroups.com.
To post to this group, send email to opencon-dis...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencon-discussion-list.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencon-discussion-list/3D43ECFC-12D2-4D37-B8F7-3CE2F36B920D%40openknowledgemaps.org.
For more options, visit https://groups.google.com/d/optout.
--
-- 

Naomi Penfold
Innovation Officer

Explore open-source tools and technologies for research communication at elifesciences.org/labs

Have ideas for new tools and resources to improve the research workflow?
Find me @eLifeInnovation on Twitter and @npscience on Gitter.


eLife Sciences Publications, Ltd is a limited liability non-profit non-stock corporation incorporated in the State of Delaware, USA, with company number 5030732, and is registered in the UK with company number FC030576 and branch number BR015634 at the address First Floor, 24 Hills Road, Cambridge CB2 1JP.

Peter Kraker

unread,
Sep 11, 2018, 9:46:53 AM9/11/18
to Naomi Penfold, Peter Murray-Rust, Bosman, J.M. (Jeroen), Thomas Krichel, Emanuil Tolev, open-science, opencon-dis...@googlegroups.com
Hi Naomi,

I do agree that many of the building blocks are already in place. But what researchers are eventually looking for is an interface where they can discover all of the datasets in the dataverse. 

However, a list-based search alone will never be sufficient to get a useful overview of an exponentially growing corpus of scholarly outputs. Even now, only a third are satisfied with current discovery engines. To change this, we need advanced techniques, such as visualization, as we do at Open Knowledge Maps, recommendation, collaborative features etc. 
And whatever is developed should of course be reusable for the rest of the ecosystem.

For that, we also have building blocks. Bur there is a big risk that they get stopped in their tracks, as there is currently no funding for this and after two years of lobbying for a project in this direction, I am convinced that many are indeed content to just leave it to Google.

Best,
Peter

Peter Kraker

unread,
Sep 27, 2018, 11:57:58 AM9/27/18
to open-science, opencon-dis...@googlegroups.com
Dear all,

thank you for the thought-provoking discussion, shows of support and practical proposals around #DontLeaveItToGoogle. In the meantime, I have been interviewed by Elephant in the Lab about the campaign's motivations and goals. You can find it here: http://elephantinthelab.org/google-and-research-data/

I'd be interested in your feedback and comments!

Best,
Peter

Peter Kraker

unread,
Sep 27, 2018, 4:58:06 PM9/27/18
to Peter Murray-Rust, open-science, opencon-dis...@googlegroups.com

Thanks Peter! And I completely agree - this creates a huge barrier to market entry that makes it harder - and in some cases impossible - for open infrastructures to catch up.

Best,
Peter


Am 27/09/2018 um 18:14 schrieb Peter Murray-Rust:
Excellent article - I tried to reply but the Elephant trashed it.
The worst aspect for me is the differential access to create indexes of publisher and other sites. Google gets treated like royalty where we (who are often technically better) get banned or get threats. For example I could index much of today's chemistry but I am forbidden.

P.


_______________________________________________
open-science mailing list
open-s...@lists.okfn.org
https://lists.okfn.org/mailman/listinfo/open-science
Unsubscribe: https://lists.okfn.org/mailman/options/open-science
Reply all
Reply to author
Forward
0 new messages