Updated GCD DBDD

21 views
Skip to first unread message

Daniel Nauschuetz

unread,
Aug 31, 2012, 10:29:26 AM8/31/12
to gcd-...@googlegroups.com
I added some an introduction and some basic database information.  Let me know if I missed anything, stated anything incorrectly or need to rethink some portion.  

I am still missing some details about the data dump.  Alexandros mentioned that rows marked for deletion aren't added to the dump (which makes sense to me). 
(1)  Are each of the 13 tables listed in the DBDD dumped or just the 10 data tables?  Are the more added or or some deleted?
(2)  Is the table structure of the dump tables identical to their parent?  Or are is the data tweaked?  I've seen dumps of both varieties, so this is not an obvious answer to me.

Daniel 
GCD_DBDD_v02.pdf

Tony Rose

unread,
Aug 31, 2012, 11:13:31 AM8/31/12
to gcd-...@googlegroups.com
typo in first paragraph:  "understand as well as easy contributors" should be "understand as well as easy for contributors"

Absolutely lovely work, Daniel.  Lovely.  And despite the tech guys ambivalence, I think documentation of this sort is always worthwhile.






tony


From: "Daniel Nauschuetz" <nausc...@yahoo.com>
To: gcd-...@googlegroups.com
Sent: Friday, August 31, 2012 9:29:26 AM
Subject: [gcd-tech] Updated GCD DBDD
--
GCD-Tech mailing list - gcd-...@googlegroups.com
To unsubscribe send email to gcd-tech+u...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/gcd-tech

Lionel English

unread,
Aug 31, 2012, 12:58:24 PM8/31/12
to gcd-...@googlegroups.com
Not sure if you want to make the addition, but per our recent board vote genre is targetted to become an enumerated drop down like Type.  Though that change has not been made yet, so at present you're correct.  And now I'm not sure we *did* make that specific vote, so I'll have to go review.
 
I do not have an answer for you on the dump questions; I think Jochen or Henry will have to; or perhaps Ralf, Alexandros, or Peter (I think they have made use of the dumps).

--
GCD-Tech mailing list - gcd-...@googlegroups.com
To unsubscribe send email to gcd-tech+u...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/gcd-tech



--
Lionel English
San Diego, CA
lio...@beanmar.net

Peter Croome

unread,
Aug 31, 2012, 5:31:06 PM8/31/12
to gcd-...@googlegroups.com
I never really used the dumps myself (or at least not in any meaningful way). I had a read-only username/password directly into the sql to find bad data but that was back before we migrated to the Django website, so I'm not going to be any help.

I have been fascinated by the discussions around this though... please carry on...

Peter



From: Lionel English <lio...@beanmar.net>
To: gcd-...@googlegroups.com
Sent: Friday, August 31, 2012 12:58:24 PM
Subject: Re: [gcd-tech] Updated GCD DBDD

Henry Andrews

unread,
Aug 31, 2012, 6:11:39 PM8/31/12
to gcd-...@googlegroups.com
As far as I can remember and tell from the script (which you can read here- it's in Perl which I tend to use for shell-script-ish things http://grandcomic-book.svn.sourceforge.net/viewvc/grandcomic-book/database/scripts/gcd-export.pl?revision=1487&view=markup ) the only change to the contents of the included tables is filtering out deletes.  You can see the list of tables the script selects from the constants near the top.

There's also a name-value.py script in that same directory which we used for eBay's data feed (and theoretically could use for something else, except it's such a trim-down of things that I doubt anyone else would ever care).

You want the "pydjango" and (to a lesser degree) "database" directories.  Ignore the "gcd" directory- it was an alternate approach that never got off the ground, and we should probably svn delete it.  The branches are or were used for various stable releases or temporary projects, and won't tell you anything you can't already figure out from "pydjango" (which was the only part of any branch that ever had any meaningful difference).

It's also worth considering if we want to move the code over to github.  Soruceforge has become something of a backwater in recent years, and interested tech people would be much more likely to stumble across us on github.

thanks,
-henry


From: Peter Croome <pbibl...@rogers.com>
To: "gcd-...@googlegroups.com" <gcd-...@googlegroups.com>
Sent: Friday, August 31, 2012 2:31 PM

Daniel Nauschuetz

unread,
Sep 1, 2012, 8:32:35 PM9/1/12
to gcd-...@googlegroups.com
Comments incorporated!  Thank you for taking the time to get back to me.

Daniel


From: Tony Rose <tonyr...@comcast.net>
To: gcd-...@googlegroups.com
Sent: Friday, August 31, 2012 11:13 AM

Subject: Re: [gcd-tech] Updated GCD DBDD

Daniel Nauschuetz

unread,
Sep 1, 2012, 9:02:25 PM9/1/12
to gcd-...@googlegroups.com
Thank you for the information.  It was very helpful, and I have nearly all I need: we straight data dump 14 tables where the "Delete = 0" (just like you and Alexandro said).

I only have 13 tables identified.  I am missing "gcd_classification".  What is "gcd_classification"?  It seems important enough to add to the Data Dump, but I am not sure how it fits into the rest of the database.

Daniel


From: Henry Andrews <hh...@cornell.edu>
To: "gcd-...@googlegroups.com" <gcd-...@googlegroups.com>
Sent: Friday, August 31, 2012 6:11 PM

Henry Andrews

unread,
Sep 1, 2012, 9:41:27 PM9/1/12
to gcd-...@googlegroups.com
The Classification concept was one that I added but was unable to ever sell to the rest of the GCD.  It's best ignored and dropped.  Most people either didn't like it, or didn't understand it (arguably a failure on my part to effectively communicate the concept), or both.  I'd prefer not to beat that particular dead horse any further.

-henry


From: Daniel Nauschuetz <nausc...@yahoo.com>
To: "gcd-...@googlegroups.com" <gcd-...@googlegroups.com>
Sent: Saturday, September 1, 2012 6:02 PM

Daniel Nauschuetz

unread,
Sep 1, 2012, 9:53:36 PM9/1/12
to gcd-...@googlegroups.com
I am all for simplification!  Should it be removed from the weekly data dumps?  It seems that somebody might later ask the same question.

Daniel


Sent: Saturday, September 1, 2012 9:41 PM

Fred Wilson

unread,
Sep 4, 2012, 3:03:40 AM9/4/12
to gcd-...@googlegroups.com

Trivial typo.

 

Table C should be gcd_indicia_publisher.

 

Worried that without linking these you’ll have a bear maintaining this.

 

 

Have also wonder of what use the first_issue_id and last_issue_id fields in the series are.

 

Previously they contained “human readable” information regarding the number of the first and last published issues.  Sine it got changed to an index, is it useful?

 

From: gcd-...@googlegroups.com [mailto:gcd-...@googlegroups.com] On Behalf Of Daniel Nauschuetz
Sent: Friday, August 31, 2012 10:29 AM
To: gcd-...@googlegroups.com
Subject: [gcd-tech] Updated GCD DBDD

 

I added some an introduction and some basic database information.  Let me know if I missed anything, stated anything incorrectly or need to rethink some portion.  

--

Henry Andrews

unread,
Sep 4, 2012, 10:12:58 AM9/4/12
to gcd-...@googlegroups.com
first/last issue id is very useful to speeding up certain pages of the web site, including series search results.  An index is more useful for this.
-henry


From: Fred Wilson <fwil...@gmail.com>
To: gcd-...@googlegroups.com
Sent: Tuesday, September 4, 2012 12:03 AM
Subject: RE: [gcd-tech] Updated GCD DBDD

Daniel Nauschuetz

unread,
Sep 4, 2012, 10:44:44 AM9/4/12
to gcd-...@googlegroups.com
Updated.  Thank you for taking the time to read and provide feedback.

This will absolutely be a bear to maintain.  Documenting the database (and the Interfaces) was the primary reasons I spend time attempting to documenting the GCD Tech Process.  Ideally, I would simply inject the documentation requirement into that process.  No dice, so I've decided focus on developing a set of baseline documents.  

Daniel

Sent: Tuesday, September 4, 2012 3:03 AM

Subject: RE: [gcd-tech] Updated GCD DBDD

Jochen G.

unread,
Sep 15, 2012, 4:34:01 AM9/15/12
to gcd-...@googlegroups.com
Hmm, what is missing, since it is not in the dump, are the cover tables
(this also applies to the just introduced image tables).

We do not distribute the cover table because we do not distribute the
cover scans, same for the new generic images.

I guess this should be mentioned in the introduction as well.

Jochen

Jochen G.

unread,
Sep 20, 2012, 11:42:30 PM9/20/12
to gcd-...@googlegroups.com
The other post about keywords got me to realize that I forgot to mention
them here. I (or someone else) also needs to check the public dump, I
guess they are not included in that one as well.

Keywords are done in a separate table using an external library. They
are connected to every object via "keywords = TaggableManager()" in the
django model definition.

Jochen

Am 31.08.2012 16:29, schrieb Daniel Nauschuetz:
Reply all
Reply to author
Forward
0 new messages