Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

SchemaSpy on SourceForge

7 views
Skip to first unread message

John Currier

unread,
Jun 5, 2005, 2:03:06 AM6/5/05
to
I recently released my database schema analysis tool on SourceForge and
would like to know what you guys think. It's available at
http://schemaspy.sourceforge.net and some sample output is available at
http://schemaspy.sourceforge.net/sample/.

It focuses on trying to help you understand the relationships between
various tables/views. Yes, it can show relationships with views
through implied relationships. An implied relationship is where a
table or view has a column with the same name and type of a primary key
of another table.

It generates graphs that depict relationships within either one or two
degrees of separation from the selected table/view. It also generates
insertion-order and deletion-order lists of the tables in the schema to
help when you're trying to populate or purge data with scripts.

I really need some feedback on the anomalies section. Here are the
anomalies that it currently detects:

- Columns whose name and type imply a relationship to another table's
primary key.
- Tables without indexes (includes number of records).
- Columns that are flagged as both 'nullable' and 'must be unique'.
- Unreferenced primary keys.
- Tables that contain a single column.
- Tables with incrementing column names, potentially indicating
denormalization.

Are there other "not quite right" things that you've seen that it could
detect? I've never seen the last one but had a DBA suggest that it'd
be cool to be able to detect it.

Note that I'm not a DBA...just a programmer that needed to understand a
schema.

Thanks,
John

Bob Hairgrove

unread,
Jun 5, 2005, 7:46:05 AM6/5/05
to
On 4 Jun 2005 23:03:06 -0700, "John Currier" <md4c...@gmail.com>
wrote:

>I recently released my database schema analysis tool on SourceForge and
>would like to know what you guys think. It's available at
>http://schemaspy.sourceforge.net and some sample output is available at
>http://schemaspy.sourceforge.net/sample/.

Looks good! I'm going to try it soon -- does it also work with
MS-Access databases? I'd like to see a tool that can extract DDL from
the schema objects in an Access database for migrating a schema to
other RDBMS.

>It focuses on trying to help you understand the relationships between
>various tables/views. Yes, it can show relationships with views
>through implied relationships. An implied relationship is where a
>table or view has a column with the same name and type of a primary key
>of another table.
>
>It generates graphs that depict relationships within either one or two
>degrees of separation from the selected table/view. It also generates
>insertion-order and deletion-order lists of the tables in the schema to
>help when you're trying to populate or purge data with scripts.
>
>I really need some feedback on the anomalies section. Here are the
>anomalies that it currently detects:
>
>- Columns whose name and type imply a relationship to another table's
> primary key.
>- Tables without indexes (includes number of records).
>- Columns that are flagged as both 'nullable' and 'must be unique'.
>- Unreferenced primary keys.
>- Tables that contain a single column.
>- Tables with incrementing column names, potentially indicating
> denormalization.
>
>Are there other "not quite right" things that you've seen that it could
>detect? I've never seen the last one but had a DBA suggest that it'd
>be cool to be able to detect it.

Exposing the incrementing column names is a good idea. It happens a
lot, unfortunately (but not on schemas that I design<g>). However, I
can always spot this just by looking at the schema or examining the
CREATE TABLE statements. Lots of other 1NF violations can also happen
(columns with names like "Jan", "Feb", "Mar" -- "Home_Phone",
"Business_Phone", "Mobile_Phone", etc. -- your tool wouldn't catch
those).

As always, there is then the question of what you are going to do
about it when you find it? Just because it's not my design, doesn't
mean that I can go in and change it. <sigh>

What do you mean by "unreferenced primary keys"? Does this mean there
are no other tables with foreign keys referencing that PK? That would
hardly be an anomaly IMHO because the PK fulfils other
responsibilities, too.

Tables with a single column which is also the primary key can be used
to implement domains on RDBMS which do not support them. It's better
to use a domain if they are available, though.

Relatively small lookup tables often do not have indexes because the
database can usually cache the entire table in memory. Doing a full
table scan of such tables is much faster than using an index, so this
is not always an anomaly.

Also, although perhaps unusual, there is no rule that says columns
with a unique index should not be nullable. I suppose it would be
helpful to point it out, because it might be a mistake in the design.
OTOH, having a unique index on a non-nullable column makes it a
candidate key, and perhaps THIS could be an anomaly if there is also a
primary key present.

As clever as this tool is, I am wary of getting into the habit of
relying on automated tools to report such anomalies. I always look at
the DDL which tells me much more than any graphical tool possibly
could. For example, do the FK's cascade, and how (on update, on
delete)? There are usually some integrity rules which are implemented
in triggers because there is no other way to do it. What can I tell
about those? Autoincrement, for example, cannot be implemented without
a trigger and a sequence (or sequence table) on some RDBMS. What about
domains?

--
Bob Hairgrove
NoSpam...@Home.com

John Currier

unread,
Jun 5, 2005, 2:54:54 PM6/5/05
to
Thanks for the feedback Bob. It should work with MS-Access if someone
has a JDBC driver for Access that provides database metadata. I
haven't found a free one yet.

The anomalies detected aren't necessarily things that are wrong with
the database design, they're just things that might require a closer
look. The tables w/o indexes anomaly that you mentioned is a prime
example where I include the number of records that were in the table so
the user of the tool can use their own judgement on whether an index is
appropriate. In one of our systems we had a table with 1.5 million
rows without an index. That one might require a second look.

> What do you mean by "unreferenced primary keys"? Does this
> mean there are no other tables with foreign keys referencing that
> PK? That would hardly be an anomaly IMHO because the PK fulfils
> other responsibilities, too.

A co-worker asked me to add that one. His argument was that people
blindly think that every table must have a primary key when a unique
index may be more appropriate. I'm not sure what the performance or
overhead implications are between a primary key and a unique index.

> Also, although perhaps unusual, there is no rule that says columns
> with a unique index should not be nullable. I suppose it would be
> helpful to point it out, because it might be a mistake in the design.

We had it all over the place in one of our databases...it didn't make
sense there. There can only be one null. It might make sense in some
situations but it probably requires some review.

> OTOH, having a unique index on a non-nullable column makes it a
> candidate key, and perhaps THIS could be an anomaly if there is
> also a primary key present.

Still thinking about this one.....you might be right.

The tool was never designed to be any form of crutch for a real DBA,
but the anomalies page does attempt to step into that territory. What
it does attempt to do is to keep developers from incessantly bugging
their DBA with relationship questions.

Thanks again for the feedback,
John

Bob Hairgrove

unread,
Jun 5, 2005, 4:13:14 PM6/5/05
to
On 5 Jun 2005 11:54:54 -0700, "John Currier" <md4c...@gmail.com>
wrote:

>> What do you mean by "unreferenced primary keys"? Does this


>> mean there are no other tables with foreign keys referencing that
>> PK? That would hardly be an anomaly IMHO because the PK fulfils
>> other responsibilities, too.
>
>A co-worker asked me to add that one. His argument was that people
>blindly think that every table must have a primary key when a unique
>index may be more appropriate. I'm not sure what the performance or
>overhead implications are between a primary key and a unique index.

A primary key must be unique and non-nullable; a unique index can
conain nulls. Although some RDBMS might implement PK with a unique
index, they are conceptually different. As to overhead and
performance, this is very RDBMS-specific.

>> Also, although perhaps unusual, there is no rule that says columns
>> with a unique index should not be nullable. I suppose it would be
>> helpful to point it out, because it might be a mistake in the design.
>
>We had it all over the place in one of our databases...it didn't make
>sense there. There can only be one null. It might make sense in some
>situations but it probably requires some review.

You could have a table of people's name, address etc. including SSN.
Some people might not yet have a SSN (children or foreign visitors),
yet the ones that do should be unique (although there have been cases
of duplicate SSN's, they probably wouldn't ever show up in the same
database [your database or mine, that is -- mabe the FBI's would have
them<g>]). Null works better than a default value here because you
can't have a unique index if more than one person has the same default
value in that column.

>> OTOH, having a unique index on a non-nullable column makes it a
>> candidate key, and perhaps THIS could be an anomaly if there is
>> also a primary key present.
>
>Still thinking about this one.....you might be right.
>
>The tool was never designed to be any form of crutch for a real DBA,
>but the anomalies page does attempt to step into that territory. What
>it does attempt to do is to keep developers from incessantly bugging
>their DBA with relationship questions.

I think they should either let the DBA design the database or send the
develpers to school so they can learn to do it right. The real
problems occur, however, when you let the customers design their own
databases. ;)

Good luck!

--
Bob Hairgrove
NoSpam...@Home.com

Lucy

unread,
Jun 5, 2005, 5:29:27 PM6/5/05
to

"John Currier" <md4c...@gmail.com> wrote in message
news:1117951386.3...@g43g2000cwa.googlegroups.com...

This seems to be a great thing. I tried it with MySQL and Oracle. I am new
to Oracle, but got the 'thin' driver to work. My problem is this.

1. There are a whole bunch of tables that exist just by having Oracle
installed.
I have only one database (toy) and one table (jones). Can I somehow tell
your program
to ignore the Oracle generated ones?

2. I am getting a whole bunch of errors out of "dot". I looked at a couple
of the "dot"
files and ran dot on them independently. The problem seems to be that it
doesn't like to find the dollar sign as part of the table (or constraint or
whatever) name.
Is there a work around for this? I have dot version 2.2.1 (Thu Apr 7
19:37:52 UTC 2005)


--CELKO--

unread,
Jun 5, 2005, 5:41:06 PM6/5/05
to
>> You could have a table of people's name, address etc. including SSN. Some people might not yet have a SSN (children or foreign visitors), <<

For those those, you use the open numbers in the SSN scheme.

>> Yet the ones that do should be unique (although there have been cases of duplicate SSN's, they probably wouldn't ever show up in the same database [your database or mine, that is -- maybe the FBI's would have them<g>]). <<

~ 5% duplication is the high estimate -- illegals with fake cards,
errors thanks to the poor design of SSN with no check digit, etc. That
is going to change with Homeland Security. It is an ugly problem.

John Currier

unread,
Jun 5, 2005, 6:20:35 PM6/5/05
to
> Can I somehow tell your program to ignore the Oracle generated ones?

Your schema is probably "toy", so you'll need to specify it on the
command line with "-s toy". Note that the tool won't be of much help
if you've only got one table. You won't get any real graphs since
there aren't any relationships.

I've never let it run against tables with dollar signs in their names
or in column names. I'm not sure what the ramificiations would be, but
either dot or my code is choking on them. I'll look into it.

Thanks,
John

John Currier

unread,
Jun 5, 2005, 6:21:55 PM6/5/05
to
Yep, we've got to deal with one person with multiple SSNs as well as
duplicates.

Ugh,
John

Lucy

unread,
Jun 5, 2005, 11:03:23 PM6/5/05
to

"John Currier" <md4c...@gmail.com> wrote in message
news:1118010035....@o13g2000cwo.googlegroups.com...

I looked at some of the dot files you generate and they look fine. My dot
documentation says that names can be any string. However, if there is
a dollar sign in a name it generates an error. If I change the name manually
and replace the dollar sign with some other letter, dot works fine.
So I think it is dot. My schema is 'toy'.
If I specify both -s toy -db toy I get a java error.
---
C:\Shareware\schemaspy>java net.sourceforge.schemaspy.Main -t orathin
-p chi -s TOY -db TOY -host 192.168.1.122 -port 1521 -u system -o toy
-cp C:\oracle\product\10.1.0\Db_1\jdbc\lib\classes12.zip
Using database properties:
[.]/net/sourceforge/schemaspy/dbTypes/orathin.properties
Connected to Oracle - Oracle Database 10g Release 10.1.0.2.0 - Production

Gathering schema details(2sec)
Writing/graphing results(0sec)
Writing/graphing summary...
java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(Unknown Source)
at
net.sourceforge.schemaspy.view.HtmlMainIndexFormatter.write(HtmlMainIndexFor
matter.java:2
1)
at net.sourceforge.schemaspy.Main.main(Main.java:200)
---
If I specify just -s toy and not -db, I get a message that db is missing.
If I specify just -db toy, it runs, but I am getting the errors out of dat.
---
C:\Shareware\schemaspy>java net.sourceforge.schemaspy.Main
-t orathin -p chi -db TOY -host 192.168.1.122 -port 1521 -u system
-o toy -cp C:\oracle\product\10.1.0\Db_1\jdbc\lib\classes12.zip
Using database properties:
[.]/net/sourceforge/schemaspy/dbTypes/orathin.properties
Connected to Oracle - Oracle Database 10g Release 10.1.0.2.0
- Production

Gathering schema
details.....................................................................
.....
.........................................................................(12
7sec)
Writing/graphing
results..toy\graphs\AQ$_SCHEDULES.1degree.dot: Error:
toy\graphs\AQ$_SCHEDULES.1deg ree.dot:11: syntax error near
line 11 toy\graphs\AQ$_SCHEDULES.1degree.dot: context: >>>
AQ$ <<< _SCHEDULES [ 'dot -Tpng
toy\graphs\AQ$_SCHEDULES.1degree.dot
-otoy\graphs\AQ$_SCHEDULES.1degree.png' failed with return code 3

.toy\graphs\REPCAT$_AUDIT_COLUMN.1degree.dot: Error:
toy\graphs\REPCAT$_AUDIT_COLUMN.1degree.dot:11:
syntax error near line 11
toy\graphs\REPCAT$_AUDIT_COLUMN.1degree.dot: context: >>>
REPCAT$ <<< _AUDIT_COLUMN [ 'dot -Tpng
toy\graphs\REPCAT$_AUDIT_COLUMN.1degree.dot
-otoy\graphs\REPCAT$_AUDIT_COLUMN.1degree.png ' failed with return code 9

.toy\graphs\DEF$_PROPAGATOR.1degree.dot: Error:
toy\graphs\DEF$_PROPAGATOR.1degree.dot:11: syntax er
ror near line 11 toy\graphs\DEF$_PROPAGATOR.1degree.dot:
context: >>> DEF$ <<< _PROPAGATOR [ 'dot -Tpng
toy\graphs\DEF$_PROPAGATOR.1degree.dot
-otoy\graphs\DEF$_PROPAGATOR.1degree.png' failed with return code 3

John Currier

unread,
Jun 6, 2005, 12:43:24 AM6/6/05
to
Somehow the "no tables in schema" case slipped through the cracks and
resulted in your NoSuchElementException. I've put a new release on
sourceforge that doesn't fail so horribly in that situation.

Mapping tables/columns with dollar signs in their names will take some
research into dot.

Just e-mail me directly for additional help. This newsgroup isn't an
appropriate forum for this discussion.

Thanks,
John

John Currier

unread,
Jun 6, 2005, 11:48:41 AM6/6/05
to
Note that the latest version out there will tell you what schemas are
available in the database if it can't see any tables/views in the one
you're pointing at.

One thing that I discovered while playing with this is that your user
has to have appropriate rights in that schema or you won't see any
tables. It makes sense, but I guess I was expecting something more
definitive.

John

Lee Fesperman

unread,
Jun 6, 2005, 3:42:41 PM6/6/05
to
John Currier wrote:
>
> > What do you mean by "unreferenced primary keys"? Does this
> > mean there are no other tables with foreign keys referencing that
> > PK? That would hardly be an anomaly IMHO because the PK fulfils
> > other responsibilities, too.
>
> A co-worker asked me to add that one. His argument was that people
> blindly think that every table must have a primary key when a unique
> index may be more appropriate. I'm not sure what the performance or
> overhead implications are between a primary key and a unique index.

It's not blindly done. It's a requirement of the Relational Model ... every table must
have a primary key, without exception. SQL is just blind on that requirement.

> > OTOH, having a unique index on a non-nullable column makes it a
> > candidate key, and perhaps THIS could be an anomaly if there is
> > also a primary key present.
>
> Still thinking about this one.....you might be right.

Nope. Candidate keys are perfectly reasonable. For instance, employee_number as primary
key and social_security_number as candidate key. Also part of RM. Any table can have
multiple candidate keys, one of which is chosen as the primary key.

(References provided on request)

--
Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com)
==============================================================
* The Ultimate DBMS is here!
* FirstSQL/J Object/Relational DBMS (http://www.firstsql.com)

John Currier

unread,
Jun 6, 2005, 5:01:18 PM6/6/05
to
Lee, can you post those references? You have to alter an existing
table to add a primary key to it, so it's definitely possible to have a
table without a primary key. If that's not a valid table then I can
easily axe the anomaly.

Thanks,
John

John Currier

unread,
Jun 6, 2005, 5:28:48 PM6/6/05
to
> It's not blindly done. It's a requirement of the Relational Model ... every table must
> have a primary key, without exception. SQL is just blind on that requirement.

I think the requirement is that each normalized table must include a
key that uniquely identifies each record. That key doesn't have to be
a primary key, does it?

John

Thomas Kellerer

unread,
Jun 6, 2005, 5:26:46 PM6/6/05
to
John Currier wrote on 06.06.2005 23:01:

> Lee, can you post those references? You have to alter an existing
> table to add a primary key to it

Hmm, there is no need to alter a table to create a PK. You can define the
primary key together with the table.

The following syntax if valid for several DBMS I know (Oracle, Postgres,
HSQLDB, Firebird and even SQL Server, to name a few)

CREATE TABLE foo
(
bar integer not null primary key,
bla varchar(100)
);

or multi-column PKs

CREATE TABLE foobar
(
foo integer not null,
bar integer not null,
bla varchar(100),
primary key (foo, bar)
);

Thomas

Gene Wirchenko

unread,
Jun 6, 2005, 7:47:29 PM6/6/05
to
On 6 Jun 2005 14:28:48 -0700, "John Currier" <md4c...@gmail.com>
wrote:

>> It's not blindly done. It's a requirement of the Relational Model ... every table must

That is the definition of "candidate key". A primary key is just
a candidate key. (The candidacy in question is whether the key can
serve as a PK.)

Sincerely,

Gene Wirchenko

Lee Fesperman

unread,
Jun 7, 2005, 1:31:03 AM6/7/05
to

I think others have covered the issue of adding a primary key.

The reference is a book -- "Practical Issues in Database Management" by Fabian Pascal
(See http://www.dbdebunk.com/books.html). The specific section is "A Matter of Identity:
Keys" (I believe it is Chapter 3). It covers the requirement for a primary key and the
meaning of candidate keys as well as other 'key' issues. The book is a good reference in
general ... highly recommended.

John Currier

unread,
Jun 7, 2005, 3:04:06 AM6/7/05
to
Lucy, I just released version 1.3.0 of SchemaSpy. It fixes your
dollars in dot problem, provides decent diagnostics when it can't
detect any tables in the schema you've specified (it lists all the
schemas in the database), adds a handful of usability enhancements and
fixes some misc issues.

Let me know if it helps to resolve the problems you ran into (probably
a permissions issue).

John
SchemaSpy: http://schemaspy.sourceforge.net
Sample output: http://schemaspy.sourceforge.net/sample

--CELKO--

unread,
Jun 8, 2005, 6:05:49 PM6/8/05
to
>> the requirement is that each normalized table must include a key that uniquely identifies each record. <<

Rows are not records. If you cannot learn the difference, you will
never understand the relational model. Bit of history here.

When Dr. Codd defined the Relational Model, the world was still on a
file system model of data (physically contigous storage, sequential
record numbers, etc.). Tape files depend on all files being sorted in
the same order for merges. The concept of a PRIMARY KEY grew out of
that.

A few years later, Dr. Codd realized that any key is a key in the
abstraction and that the best way to store data is not physically
contigous locations. He dropped the PRIMARY KEY idea.

Unfortunately, the early SQL products were built on top of either
existing file systems or with that same mindset (look at the "ID"-iots
who use IDENTITY as a PK in SQL Server Family, etc.) So the early
products used the PRIMARY KEY concept and then SQL came along and made
it special for DRI. Opps! Then ANSI X3H2 added the UNIQUE constraint
and DRI actions to patch over this early error in the model.

A table must have at least one key. If a table has more than one key
and they are independent, then they must have a deterministic function
that shows they map to one and only one row in the table (i.e. Dewey
Decimal and LOC classification codes).

If a table has more than one (multi-column) key and they overlap, then
they must be internally consistent. That gets tricky.

Lee Fesperman

unread,
Jun 9, 2005, 2:57:02 AM6/9/05
to
--CELKO-- wrote:
> A few years later, Dr. Codd realized that any key is a key in the
> abstraction and that the best way to store data is not physically
> contigous locations. He dropped the PRIMARY KEY idea.

When was that? In 1985, he published his 12 rules which still required a Primary Key for
all tables (2. Guaranteed access rule).

John Currier

unread,
Jun 9, 2005, 7:44:24 PM6/9/05
to
Where exactly does rule 2 require a primary key? Doesn't it say "can
be accomplished" and not "must be accomplished"?

John

Lee Fesperman

unread,
Jun 10, 2005, 2:37:05 PM6/10/05
to
John Currier wrote:
>
> Where exactly does rule 2 require a primary key? Doesn't it say "can
> be accomplished" and not "must be accomplished"?

I don't understand why you're having a problem with this. However, I don't recognize the
wording you are using. Can you post the complete text of Rule 2 that you are using? That
will make it much easier for me to explain.

For whatever reason, there seems to be a number of different versions of the 12 Rules,
with varied wording. I couldn't even find your version. I have a copy of the original
ComputerWorld articles somewhere. I guess I'll have to dig them out and publish on our
site.

The Guaranteed Access Rule is very important. All by itself, Rule 2 basically
distinguishes the Relational Model from all other data models ... or lack of data model.
See the (accursed) other thread on Multi-Valued columns on comp.databases for a sterling
example.

John Currier

unread,
Jun 13, 2005, 1:26:13 PM6/13/05
to
Google for "codd's 12 rules" and go to the first hit. It's possible
that this is a mis-interpretation of the rules.

I'd agree with you if it said "must be accomplished" or some other
strongly worded statement.

John

Lee Fesperman

unread,
Jun 13, 2005, 5:20:30 PM6/13/05
to
John Currier wrote:
>
> Google for "codd's 12 rules" and go to the first hit. It's possible
> that this is a mis-interpretation of the rules.

It is emphatically a misinterpration! For reference, the link is
http://www.itworld.com/nl/db_mgr/05072001/. The text at this link for Rule 2 is:

>>>Rule 2: Guaranteed Access Rule
"All data should be accessible without ambiguity. This can be accomplished through a
combination of the table name, primary key, and column name."
<<<

He calls it a 'simplified description'. Unfortunately, it is a distortion of the 12
Rules to advance his own agenda.

> I'd agree with you if it said "must be accomplished" or some other
> strongly worded statement.

The original articles are:

Codd, E. F. (1985), "Is Your DBMS Really Relational?" and "Does Your
DBMS Run By the Rules?", ComputerWorld, October 14 and October 21.

Apparently, ComputerWorld does not publish the articles online. For now, I suggest you
use this link -- http://www.cse.ohio-state.edu/~sgomori/570/coddsrules.html. It includes
Rule 0 and seems to be accurate, however I still need to dig up my copy of the
ComputerWorld articles. At that link, Rule 2 is:

>>>2. Guaranteed access rule. Each and every datum (atomic value) in a relational database
is guaranteed to be logically accessible by resorting to a combination of table name,
primary key value, and column name.
<<<

This is pretty unambiguous.

Also, for you multi-valued adherents, note the use of the word -- datum. The atomic
values in multi-valued columns must be accessed differently from all other values, and
there is no comprehensive way to do that. Most extant MV DBMSs have a laundry list of
limitations on MV columns. Codd was acutely aware of these problems, thus 1NF (1st
Normal Form) prohibits such constructs.

0 new messages