View this page "EC-TEL database"

1 view
Skip to first unread message

Sten

unread,
Aug 13, 2009, 11:51:34 AM8/13/09
to SciTEL2.0
the xml's in a relational database!

Click on http://groups.google.com/group/scitel20/web/ec-tel-database -
or copy & paste it into your browser's address bar if that doesn't
work.

tillnm

unread,
Aug 15, 2009, 3:18:24 PM8/15/09
to SciTEL2.0
Sweet. Great work!

I'd love to discuss the schema, but for that I need to get my hands on
the database, first. Stupidly, I don't manage to load it into my
postgres db. While trying to restore (with pg_restore) I get
"unsupported version (1.11) in file header" ... any ideas on what I'm
doing wrong?
(I wonder if it has something to do with the Google groups file
download - as it seems no mime-type is set)

Sten

unread,
Aug 15, 2009, 3:27:45 PM8/15/09
to SciTEL2.0
I will ask Bram to make a SQL dump and post it here. This will be more
easy to import. I cannot do that, because Postgresql is picky and does
not let me make a plain dump, because of some version conflict. It has
to be done on the server and I need someone with root.

I also plan to add more data to it during next week, especially
location data, because I need that for my map. Will also try to post
the Hibernate files early next week, than you can play in Java with it.

Sten

unread,
Aug 17, 2009, 5:56:20 AM8/17/09
to SciTEL2.0
Dear SciTELers,

I uploaded a new backup of the database in plain SQL. This will make
it easier to import into more versions of PosgreSQL.

You can find it here http://scitel20.googlegroups.com/web/scitel2_backup_20090817.rar

Enjoy!

Sten

Sten

unread,
Aug 17, 2009, 11:30:17 AM8/17/09
to SciTEL2.0
Hi all,

I just uploaded the hibernate package to manipulate the database. You
can find it here:

http://scitel20.googlegroups.com/web/scitel-hibernate.zip

The only thing I have to add is convenience mappings to get bi-
directional mapping out of the ternary relation between paper -
citation - context.

Regards from Aachen,

Sten

tillnm

unread,
Aug 18, 2009, 7:59:50 AM8/18/09
to SciTEL2.0
Sten, thanks for the Hibernate mapping! I played around with it, and
could integrate some of my cleaning and geocode mechanisms.

You can find an altered affiliation (with additional data) table at
http://groups.google.com/group/scitel20/web/new-affiliation.sql?hl=en
It contains cleaned up names, as well as lat,lng positions (315 out of
332 could be geocoded, some possibly wrong, due to insufficient
original data)

Also, I uploaded a new updated project, with all necessary libraries
(if having troubles with Sten's, download this):
http://groups.google.com/group/scitel20/web/scitel-db-alllibs.zip?hl=en


As it was expected the data is not perfect, thus some relations do not
work properly, yet. For instance there are no relations between
authors and affiliations as the footnote-relationship mechanism was
not parsed, and some fields contain wrong or no data. Some of these
may be solvable.

How do we want to proceed here? Everybody fixes what he/she needs, and
sends around new tables / data?


I am going to work on unifying affiliations to get publication /
author distributions, next, as I need this for my visualization.

Find a fastly done world map with positioned affiliations. It is very
rough, but already you can see the rather euro-centric orientation of
EC-TEL.
http://groups.google.com/group/scitel20/web/scitel-map1.png?hl=en
http://groups.google.com/group/scitel20/web/scitel-map2.png?hl=en

ggmendez

unread,
Aug 24, 2009, 4:58:02 PM8/24/09
to SciTEL2.0
Hi all,

I just have imported and took a look at the database uploaded by
tillnm and, as it was expected, there are some problems with the data.
In the following link , you will find a screenshot of the data output
that I got when executing a SQL query:

http://scitel20.googlegroups.com/web/screenshot.png?gsc=NW4MEgsAAAA-wnemEnl_J60jko3QMpOE

I was looking for names like "Xavier" and "Duval". The main problem is
that "Erik Duval", for instance, appeared in five records of the table
author, but with five different ID's. The same thing happens with
"Xavier Ochoa", which appears in two different records (actually in
three ones if you look for "Ochoa"). A solution for this issue should
be implemented. I do not encourage manual cleaning of this kind of
errors because, based on Ed-Media experience, this task would be a
very hard one.

Sten

unread,
Aug 24, 2009, 5:10:36 PM8/24/09
to SciTEL2.0
Hi,

There are multiple authors because I did not want to loose data, like
email addresses, because there can be alot of Peter Smiths for
example.

I am currently working on this issue. I try to identify the authors by
their DBLP page. DBLP does some smart stuff with the names. For
example, if you search for E Duval, you will end up on the same page
as searching on Erik Duval. As we speak my computer should be
collecting these URLs. I will try to clean it a bit up tomorrow and
post a new db dump when it is done...

Cheers,

Sten


On Aug 24, 10:58 pm, ggmendez <gonzalo.mendez.cob...@gmail.com> wrote:
> Hi all,
>
> I just have imported and took a look at the database uploaded by
> tillnm and, as it was expected, there are some problems with the data.
> In the following link , you will find a screenshot of the data output
> that I got when executing a SQL query:
>
> http://scitel20.googlegroups.com/web/screenshot.png?gsc=NW4MEgsAAAA-w...

Sten

unread,
Sep 4, 2009, 11:50:55 AM9/4/09
to SciTEL2.0
Hi,

I put a new dump of the database in the files section, there is a
plain SQL version and a compressed PostgreSQL version.

The authors are cleaned up and DBLP-urls are added. The double authors
are removed based on the DBLP-url as id. I will make a small REST api
on top of the database next week.

Cheers,
Sten

Sten

unread,
Sep 10, 2009, 12:14:02 PM9/10/09
to SciTEL2.0
Hello,

I just uploaded the latest hibernate binding to the files section. You
can find it here: http://groups.google.com/group/scitel20/web/scitel-hibernate-cleaned-authors.zip

The authors are cleaned up and contain the DBLP-link.

If there are problems, please let me know. I used the cleaned up libs
from Till, if you are missing some let me know and I will send you my
lib dir.

Good luck with the merging,

Sten
Reply all
Reply to author
Forward
0 new messages