[ExamVT13] Block 7: Seni-structured data and XML

67 views

Skip to first unread message

Niklas Broberg

unread,

Mar 10, 2014, 2:17:17 AM3/10/14

to tda357...@googlegroups.com

============================================

The following DTD attempts to as faithfully as possible model the same domain and constraints for the music streaming site as the relations used in the previous blocks.

<!DOCTYPE MusicStream [

<!ELEMENT MusicStream (Track+,Album+,Artist+,User*)>

<!ELEMENT Track (Participant+)>

<!ELEMENT Participant EMPTY >

<!ELEMENT Album (TrackOnAlbum+)>

<!ELEMENT TrackOnAlbum EMPTY >

<!ELEMENT Artist EMPTY >

<!ELEMENT User (Playlist*,PlayedTrack*)>

<!ELEMENT Playlist (InList*)>

<!ELEMENT InList EMPTY >

<!ELEMENT PlayedTrack EMPTY >

<!ATTLIST Track

trackId ID #REQUIRED

title CDATA #REQUIRED

length CDATA #REQUIRED >

<!ATTLIST Participant

artist IDREF #REQUIRED>

<!ATTLIST Album

albumId ID #REQUIRED

title CDATA #REQUIRED

yearReleased CDATA #IMPLIED>

<!ATTLIST TrackOnAlbum

trackNr CDATA #REQUIRED

track IDREF #REQUIRED>

<!ATTLIST Artist

artistId ID #REQUIRED

name CDATA #REQUIRED>

<!ATTLIST User

username ID #REQUIRED

email CDATA #REQUIRED

name CDATA #IMPLIED>

<!ATTLIST Playlist

name CDATA #REQUIRED>

<!ATTLIST InList

number CDATA #REQUIRED

track IDREF #REQUIRED>

<!ATTLIST PlayedTrack

time CDATA #REQUIRED

track IDREF #REQUIRED>

============================================

7A (5p)

(i) (2p)

Give an example XML document that is valid with respect to the DTD above.

(ii) (3p)

For a document conforming to the schema given above, what would the following XQuery expression compute? Answer in plain text:

{ for $d in ( doc("musicstream.xml") )

for $u in $d//User[/Playlist]

let $c := count (for $x in $u/Playlist[/InList]

return $x)

order by (-$c)

return <User username="{$u/@username}">{$c}</User> }

</Result>

============================================

7B (8p)

When answering this question, disregard types and usage (required vs NULL etc).

Compare the DTD schema given above to the relational schema presented in block 3. For each of the four kinds of constraints listed below, give one concrete example of a constraint that is enforced by one of the schemas but not the other. For each, state which of the two schemas that enforces it, and give an example of unwanted data that could be entered into the one that does not enforce it.

(i) (1p)

A value constraint.

(ii) (2p)

A dependency constraint.

(iii) (2p)

A reference constraint.

(iv) (3p)

A multiplicity constraint.

============================================

Grégoire Détrez

unread,

Mar 13, 2014, 11:17:11 AM3/13/14

to tda357...@googlegroups.com

A correct answer for this block:

7A i

To create the xml document, a simple strategy is to start from the root::

<?xml version="1.0" standalone="yes" ?>

This is a valid XML document but incorrect with regard to our DTD because there

should be at least one Track element, one Album element and one Artist element

under MusicStream. Let's add those::

<?xml version="1.0" standalone="yes" ?>

</MusicStream>

Now the MusicStream element has enough children but each of them require

children of their own, still according to the DTD. By following this process

recursively, we end up with the following document::

<?xml version="1.0" standalone="yes" ?>

<Track>

</Track>

<Album>

</Album>

</MusicStream>

Now that we have our document structure, let's look at which attribute we need

to add to comply with our DTD. MusicStream doesn't require any attribute but

Track does::

<?xml version="1.0" standalone="no" ?>

</Track>

<Album>

</Album>

</MusicStream>

By checking each tags one by one, we end up with the following document

(validated with xmllint)::

<?xml version="1.0" standalone="no" ?>

</Track>

</Album>

</MusicStream>

A few notes:

prefixed the IDs with the element type to make sure that they are globally unique. Note that the value of an ID attribute cannot start with a digit, so writing trackId="1" would me syntactically incorrect.
Every IDREF correspond to an existing ID. In this document, writing ``<Participant artist="artist:0" />`` would have been incorrect because there is no element with ID ``artist:0``. Note on the other hand, that although it would have been semantically wrong, having ``<Participant artist="album:1" />`` would have been valid...
I didn't add a user because the schema didn't require one: the ``*`` meaning 0 or more, having no User element is perfectly valid. You could, however, add User element(s), or more data if you wish (and enjoy writing inequality signs!)

7A ii

By simply looking at the XML elements in the query, we can already say that the

it will return a list of non empty User element with a username attribute

enclosed in a Result root tag. Now, looking at the query more closely:

The first line just opens the xml database which should be in the file

``musicstream.xml`` in the current directory::

{ for $d in ( doc("musicstream.xml") )

then, we select all User elements having at least one playlist::

for $u in $d//User[/Playlist]

and for each of those users, we count the number of playlists with at least one

song::

let $c := count (for $x in $u/Playlist[/InList] return $x)

and we order the list of users by this number, starting with the biggest::

order by (-$c)

and the last line of the query just construct the result tags. So in English,

this query will return the username and number of non-empty playlist for user

having at least one playlist (potentially empty...) and sorted according to the

number of non-empty playlist the have in descending order.

7B i

*Value constraints* are constraint that restrict the kind of values an attribute

can take and are typically easier to enforce in a relational schema, often (but

not always!) with a CHECk constraint. In this example, we see that the

relational schema enforces the length of a track to be non negative whereas the

DTD schema has no such constraint. In fact, in the XML document the value of

``length`` could be anything and doesn't even have to be an number. So any

column type (except maybe for BLOB) can be seen as a value constraint that is

enforced by the relational schema but not the DTD.

7B ii

*Dependency constraints* are constraints expressed by functional dependencies

and typically enforced by key. Now, as we saw in the tutorial, DTD schema can't

express compound keys so that might be a good place to start looking. For

instance, the FD::

album, track -> trackNr

is enforced by the relational schema but not the DTD.

7B iii

*Reference constraints* are quite easy. Since the DTD schema doesn't express

what an IDREF can reference to, you can pick any of them and you will get a

reference constraint that's better enforced by the relational schema.

Note that the opposite is not true: you cannot pick any reference from the

schema and say that it is not enforced by the DTD. This is because it is

possible to enforce *some* of the references by using the parent-child construct.

This means that, if the topology of your data is a tree, a DTD schema will

enforce reference constraints as good as the relational schema (and maybe even

better regarding multiplicity, see next point)

7C iv

Finally, for *multiplicity constraints*, I'd first look at the DTD for elements

that must contains one-or-more of a certain other element. For instance, in the

DTD, ``<!ELEMENT Album (TrackOnAlbum+)>`` makes sure that an album has at least

one track. The relational schema on the other hand will (most probably

incorrectly) allow empty albums.

Reply all

Reply to author

Forward

0 new messages