Add files to load in Portia [Feature]

26 views
Skip to first unread message

Prabhakar D

unread,
May 8, 2015, 3:01:19 AM5/8/15
to portia-...@googlegroups.com

I have converted list of PDF URLs to HTML files using pdf2htmlEX application and stored in localdrive. I want to add this feature in Portia.
i.e. If I enter the PDF URL in Start page and Portia has to load the converted HTML content. I have a python script to run this application and store the converted HTML files in local drive.
Where can I able to add this feature in Portia source?

Also I want to deploy those HTML files using scrapyd. How can I do that?

Ruairi Fahy

unread,
May 9, 2015, 4:17:02 AM5/9/15
to portia-...@googlegroups.com
This would mean adding pdf as a type supported by Portia which I'm reluctant to do. This could be achieved with a middleware that checks the file mimetype and if it is a pdf it converts it to html and then it can be processed as normal by Portia.
If you already have the HTML files you can host them on localhost by using `python -m SimpleHTTPServer 8080` in the folder with html files and then add 'http://localhost:8080/HTMLFILE' to your start urls

Prabhakar D

unread,
May 11, 2015, 10:57:12 AM5/11/15
to portia-...@googlegroups.com
Thanks for your valuable reply. I am interested to create middleware for this.
Where can I able to use the middleware to render the local HTML files which is converted from PDF links.
i.e. if I enter PDF link as start URL, the middleware have to process the PDF to HTML conversion and open in Portia UI.
Reply all
Reply to author
Forward
0 new messages