How to scape data different format

46 views
Skip to first unread message

Maciek Grodzki

unread,
Jul 4, 2016, 4:32:34 PM7/4/16
to django-dynamic-scraper
Hello everyone,
I am new in django and perhaps my problem is easy to solve but i cant find any way to resolve it in docs. I want to crape datetime from webside and store it in model as datetime field. In documentation i found something like date preprocesor but it works only for eng(and den?) language. For instance my datetime on page looks like 24 luty 20:00 and it meanse 24 February 20:00. How can scape it? I will be gratefull for any advice

Holger Drewes

unread,
Jul 5, 2016, 4:42:42 AM7/5/16
to django-dyna...@googlegroups.com
Hi Maciek,
I suppose there is no simple solution to your problem. The date preprocessor is a good start, but it uses the Python strptime function, don't know if this works with Polish (?) language. Did you try to set the locale in the Django settings or maybe the custom scraper classes?

I know this is not optimal, but one solution would be to monkey-patch the DDS library itself, and add a custom processor in the processors.py file of the library, using maybe these approaches to parse the date:

The single processor functions are very simple, just take a text as a string and return a string, you could then use the function name as the processor.

Of course this is not super-pretty, since these changes will be overwritten with the next library update, maybe I find some time in the future to implement a more elegant way to integrate custom processors into own projects.

Let me know how you are approaching, would be interested in your solution!

Cheers
Holger

2016-07-04 22:32 GMT+02:00 Maciek Grodzki <maciek...@gmail.com>:
Hello everyone,
I am new in django and perhaps my problem is easy to solve but i cant find any way to resolve it in docs. I want to crape datetime from webside and store it in model as datetime field. In documentation i found something like date preprocesor but it works only for eng(and den?) language. For instance my datetime on page looks like 24 luty 20:00 and it meanse 24 February 20:00. How can scape it? I will be gratefull for any advice

--
Sie erhalten diese Nachricht, weil Sie in Google Groups E-Mails von der Gruppe "django-dynamic-scraper" abonniert haben.
Wenn Sie sich von dieser Gruppe abmelden und keine E-Mails mehr von dieser Gruppe erhalten möchten, senden Sie eine E-Mail an django-dynamic-sc...@googlegroups.com.
Weitere Optionen finden Sie unter https://groups.google.com/d/optout.

Holger Drewes

unread,
Aug 5, 2016, 11:35:13 AM8/5/16
to django-dynamic-scraper
I have now integrated an official way to add new custom processors in DDS v.0.11.1 release, see documentation for usage information:

Holger
Reply all
Reply to author
Forward
0 new messages