Documentation of Filters used in DataMachine?

14 views
Skip to first unread message

Shashank Gupta

unread,
Mar 14, 2017, 2:54:21 PM3/14/17
to jwpl-users
Is there some easily available documentation of the filters that you use inside DataMachine? Basically, what i'm interested in figuring out is things like :-
  1. Do you have any filter on the minimum article/document length somewhere?
  2. Do you remove the hidden categories from the system?
  3. Do you expand the templates?
  4. Do you filter out pages from namespaces other than 0 ("articles") and 14 ("categories")?
  5. When queried for the categories of a page, do you recursively follow the path to the root and return the parents of directly mentioned categories as well, or just return the directly mentioned categories? For instance, an editor might have just put "Political System" as Category for a particular page, but "Political System" category might itself be a child of "Politics". So, will your system return just "Political System" here, or "Politics" as well?

If possible, please also tell me which files in the source code helped you figure out answers to these.


Thanks,
Shashank

Johannes Daxenberger

unread,
Mar 16, 2017, 8:20:53 AM3/16/17
to jw...@googlegroups.com

Hi,

 

As far as I can see, what you refer to as "filters" is handled in different places in JWPL. So: no, there is no single documentation covering all of them.

 

The DataMachine basically contains and works with what you imported into the database (i.e., anything in the dump should also end up in the database). Later on, the API helps you to access that data with predefined interfaces, but it is not meant to act as a filter in a deeper sense.

 

However: as the API doesn't do anything else than translating your calls into mysql queries, you can customize them in any way you like (either directly via mysql or from within JWPL using hibernate).

 

Regarding some of your specific questions:

 

- article length: not that I know, you would need to do that yourself (see above)

- hidden categories: are they contained in the dump? If not, they will not be accessible with JWPL

- templates: JWPL doesn't handle templates, this needs to be tackled with a parser (we recommend sweble)

- namespaces: again, everything contained in the initial dump

 

Categories: this is a somewhat different issue. I'm not very familiar with how categories are dealt with in JWPL, maybe somebody else has more expertise with this?

 

Best,

Johannes

--
You received this message because you are subscribed to the Google Groups "jwpl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jwpl+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Torsten Zesch

unread,
Mar 16, 2017, 9:05:08 AM3/16/17
to jw...@googlegroups.com
JWPL only gives you the categories that are directly assigned to a page.
In order to get other categories, you can use the CategoryGraph object that also handles breaking cycles in the graph etc.

-Torsten 

To unsubscribe from this group and stop receiving emails from it, send an email to jwpl+unsubscribe@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "jwpl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jwpl+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages