input dataset name

84 views
Skip to first unread message

Ramy Baly

unread,
Dec 12, 2018, 12:56:11 AM12/12/18
to PAN Workshop Series on Digital Text Forensics
Hi,

As I understand the inputDataset are basically XML files.

In my reader, should I include the "xml" extension in the filename? For instance, should I be reading pan19-hyperpartisan-news-detection-by-article-test-dataset-2018-12-07?  or pan19-hyperpartisan-news-detection-by-article-test-dataset-2018-12-07.xml?

Also, are all these files located in the "/media/training-datasets/" directory?

Thanks
-Ramy

Johannes

unread,
Dec 12, 2018, 2:48:47 AM12/12/18
to pan-works...@googlegroups.com
Hi Ramy,

This is the path your software will receive for the test data:


/media/test-datasets/hyperpartisan-news-detection/pan19-hyperpartisan-news-detection-by-article-test-dataset-2018-12-07

As you see, it is a directory. So you have to read the single XML file
that is in the directory.

And as the path shows, the test datasets are in /media/test-datasets
(which is only accessible while your software is run on the test
dataset: you can not access it while logged in to your virtual machine).

Hope that helps!
Johannes

Am 11.12.18 um 23:18 schrieb Ramy Baly:
> Hi,
>
> As I understand the inputDataset are basically XML files.
>
> In my reader, should I include the "xml" extension in the filename? For
> instance, should I be reading
> *pan19-hyperpartisan-news-detection-by-article-test-dataset-2018-12-07*?
>  or
> *pan19-hyperpartisan-news-detection-by-article-test-dataset-2018-12-07.xml?*
> *
> *
> Also, are all these files located in the "/media/training-datasets/"
> directory?
>
> Thanks
> -Ramy
>
> --
> --
> You received this message because you are subscribed to the Google Group
> "PAN".
> Visit this group at http://groups.google.com/group/pan-workshop-series
> To unsubscribe send email to
> pan-workshop-se...@googlegroups.com.
> ---
> You received this message because you are subscribed to the Google
> Groups "PAN Workshop Series on Digital Text Forensics" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pan-workshop-se...@googlegroups.com
> <mailto:pan-workshop-se...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Johannes Kiesel

Bauhaus-Universität Weimar
Bauhausstr. 11, Room 109
99423 Weimar, Germany

Phone: +49 (0)3643 - 58 3720

Martin Potthast

unread,
Dec 12, 2018, 5:11:40 AM12/12/18
to pan-workshop-series
PS: It is important that you do not encode that path into your software explicitly, but use the path provided via the $inputDataset variable. Otherwise, your software will fail when we change the dataset location.

To unsubscribe from this group and stop receiving emails from it, send an email to pan-workshop-se...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages