[ExamVT13] Block 7: Seni-structured data and XML

67 views
Skip to first unread message

Niklas Broberg

unread,
Mar 10, 2014, 2:17:17 AM3/10/14
to tda357...@googlegroups.com
============================================
The following DTD attempts to as faithfully as possible model the same domain and constraints for the music streaming site as the relations used in the previous blocks.

<!DOCTYPE MusicStream [
 <!ELEMENT MusicStream (Track+,Album+,Artist+,User*)>
 <!ELEMENT Track (Participant+)>
 <!ELEMENT Participant EMPTY >
 <!ELEMENT Album (TrackOnAlbum+)>
 <!ELEMENT TrackOnAlbum EMPTY >
 <!ELEMENT Artist EMPTY >
 <!ELEMENT User (Playlist*,PlayedTrack*)>
 <!ELEMENT Playlist (InList*)>
 <!ELEMENT InList EMPTY >
 <!ELEMENT PlayedTrack EMPTY >

 <!ATTLIST Track
   trackId ID #REQUIRED
   title CDATA #REQUIRED
   length CDATA #REQUIRED >
 <!ATTLIST Participant
   artist IDREF #REQUIRED>
 <!ATTLIST Album
   albumId ID #REQUIRED
   title CDATA #REQUIRED
   yearReleased CDATA #IMPLIED>
 <!ATTLIST TrackOnAlbum
   trackNr CDATA #REQUIRED
   track IDREF #REQUIRED>
 <!ATTLIST Artist
   artistId ID #REQUIRED
   name CDATA #REQUIRED>
 <!ATTLIST User
   username ID #REQUIRED
   email CDATA #REQUIRED
   name CDATA #IMPLIED>
 <!ATTLIST Playlist
   name CDATA #REQUIRED>
 <!ATTLIST InList
   number CDATA #REQUIRED
   track IDREF #REQUIRED>
 <!ATTLIST PlayedTrack
   time CDATA #REQUIRED
   track IDREF #REQUIRED>
]>

============================================
7A (5p)

(i) (2p)
Give an example XML document that is valid with respect to the DTD above.

(ii) (3p)
For a document conforming to the schema given above, what would the following XQuery expression compute? Answer in plain text:

<Result>
 { for $d in ( doc("musicstream.xml") )
   for $u in $d//User[/Playlist]
   let $c := count (for $x in $u/Playlist[/InList]
                          return $x)
   order by (-$c)
   return <User username="{$u/@username}">{$c}</User> }
</Result>

============================================
7B (8p)

When answering this question, disregard types and usage (required vs NULL etc).
Compare the DTD schema given above to the relational schema presented in block 3. For each of the four kinds of constraints listed below, give one concrete example of a constraint that is enforced by one of the schemas but not the other. For each, state which of the two schemas that enforces it, and give an example of unwanted data that could be entered into the one that does not enforce it.

(i) (1p)
A value constraint.

(ii) (2p)
A dependency constraint.

(iii) (2p)
A reference constraint.

(iv) (3p)
A multiplicity constraint.
============================================

Grégoire Détrez

unread,
Mar 13, 2014, 11:17:11 AM3/13/14
to tda357...@googlegroups.com
A correct answer for this block:

7A i

To create the xml document, a simple strategy is to start from the root::

    <?xml version="1.0" standalone="yes" ?>
    <MusicStream/>

This is a valid XML document but incorrect with regard to our DTD because there
should be at least one Track element, one Album element and one Artist element
under MusicStream. Let's add those::

    <?xml version="1.0" standalone="yes" ?>
    <MusicStream>
      <Track></Track>
      <Album></Album>
      <Artist></Artist>
    </MusicStream>

Now the MusicStream element has enough children but each of them require
children of their own, still according to the DTD. By following this process
recursively, we end up with the following document::

    <?xml version="1.0" standalone="yes" ?>
    <MusicStream>
      <Track>
        <Participant/>
      </Track>
      <Album>
        <TrackOnAlbum/>
      </Album>
      <Artist/>
    </MusicStream>

Now that we have our document structure, let's look at which attribute we need
to add to comply with our DTD. MusicStream doesn't require any attribute but
Track does::

    <?xml version="1.0" standalone="no" ?>
    <MusicStream>
      <Track trackId="track:1" length="4:33" title="Four thirty-three">
        <Participant/>
      </Track>
      <Album>
        <TrackOnAlbum/>
      </Album>
      <Artist/>
    </MusicStream>

By checking each tags one by one, we end up with the following document
(validated with xmllint)::

    <?xml version="1.0" standalone="no" ?>
    <MusicStream>
      <Track trackId="track:1" length="4:33" title="Four thirty-three">
        <Participant artist="artist:1" />
      </Track>
      <Album albumId="album:1" title="Zen Buddhism">
        <TrackOnAlbum track="track:1" trackNr="1" />
      </Album>
      <Artist artistId="artist:1" name="John Cage" />
    </MusicStream>

A few notes:

  •  prefixed the IDs with the element type to make sure that they are globally unique. Note that the value of an ID attribute cannot start with a digit, so writing trackId="1" would me syntactically incorrect. 
  • Every IDREF correspond to an existing ID. In this document, writing ``<Participant artist="artist:0" />`` would have been incorrect because there is no element with ID ``artist:0``. Note on the other hand, that although it would have been semantically wrong, having ``<Participant artist="album:1" />`` would have been valid...
  • I didn't add a user because the schema didn't require one: the ``*`` meaning 0 or more, having no User element is perfectly valid. You could, however, add User element(s), or more data if you wish (and enjoy writing inequality signs!)

7A ii

By simply looking at the XML elements in the query, we can already say that the
it will return a list of non empty User element with a username attribute
enclosed in a Result root tag. Now, looking at the query more closely:

The first line just opens the xml database which should be in the file
``musicstream.xml`` in the current directory::

    { for $d in ( doc("musicstream.xml") )

then, we select all User elements having at least one playlist::

    for $u in $d//User[/Playlist]

and for each of those users, we count the number of playlists with at least one
song::

    let $c := count (for $x in $u/Playlist[/InList] return $x)

and we order the list of users by this number, starting with the biggest::

    order by (-$c)

and the last line of the query just construct the result tags. So in English,
this query will return the username and number of non-empty playlist for user
having at least one playlist (potentially empty...) and sorted according to the
number of non-empty playlist the have in descending order.

7B i

*Value constraints* are constraint that restrict the kind of values an attribute
can take and are typically easier to enforce in a relational schema, often (but
not always!) with a CHECk constraint. In this example, we see that the
relational schema enforces the length of a track to be non negative whereas the
DTD schema has no such constraint. In fact, in the XML document the value of
``length`` could be anything and doesn't even have to be an number. So any
column type (except maybe for BLOB) can be seen as a value constraint that is
enforced by the relational schema but not the DTD.

7B ii

*Dependency constraints* are constraints expressed by functional dependencies
and typically enforced by key. Now, as we saw in the tutorial, DTD schema can't
express compound keys so that might be a good place to start looking. For
instance, the FD::

    album, track -> trackNr

is enforced by the relational schema but not the DTD.

7B iii

*Reference constraints* are quite easy. Since the DTD schema doesn't express
what an IDREF can reference to, you can pick any of them and you will get a
reference constraint that's better enforced by the relational schema.

Note that the opposite is not true: you cannot pick any reference from the
schema and say that it is not enforced by the DTD. This is because it is
possible to enforce *some* of the references by using the parent-child construct.
This means that, if the topology of your data is a tree, a DTD schema will
enforce reference constraints as good as the relational schema (and maybe even
better regarding multiplicity, see next point)

7C iv

Finally, for *multiplicity constraints*, I'd first look at the DTD for elements
that must contains one-or-more of a certain other element. For instance, in the
DTD, ``<!ELEMENT Album (TrackOnAlbum+)>`` makes sure that an album has at least
one track. The relational schema on the other hand will (most probably
incorrectly) allow empty albums.
 
Reply all
Reply to author
Forward
0 new messages