import Network.HTTP
import Data.Maybe
import Data.List
main = do
x <- getLine
htmlpage <- getResponseBody =<< simpleHTTP ( getRequest x ) --
open url
--print.words $ htmlpage
let ind_1 = fromJust . ( \n -> findIndex ( n `isPrefixOf`) .
tails $ htmlpage ) $ "<!-- content -->"
ind_2 = fromJust . ( \n -> findIndex ( n `isPrefixOf`) .
tails $ htmlpage ) $ "<!-- /content -->"
tmphtml = drop ind_1 $ take ind_2 htmlpage
writeFile "down.html" tmphtml
and its working fine except some symbols are not rendering as it
should be. Could some one please suggest me how to accomplish this
task.
Thank you
Mukesh Tiwari
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
--Max
import Network.HTTP
import Text.HTML.TagSoup
import Data.Maybe
parseHelp :: Tag String -> Maybe String
parseHelp ( TagOpen _ y ) = if ( filter ( \( a , b ) -> b == "Download
a PDF version of this wiki page" ) y ) /= []
then Just $ "http://en.wikipedia.org" ++ ( snd $
y !! 0 )
else Nothing
parse :: [ Tag String ] -> Maybe String
parse [] = Nothing
parse ( x : xs )
| isTagOpen x = case parseHelp x of
Just s -> Just s
Nothing -> parse xs
| otherwise = parse xs
main = do
x <- getLine
tags_1 <- fmap parseTags $ getResponseBody =<< simpleHTTP
( getRequest x ) --open url
let lst = head . sections ( ~== "<div class=portal id=p-coll-
print_export>" ) $ tags_1
url = fromJust . parse $ lst --rendering url
putStrLn url
tags_2 <- fmap parseTags $ getResponseBody =<< simpleHTTP
( getRequest url )
print tags_2
On Sep 9, 2011 7:33 AM, "mukesh tiwari" <mukeshtiw...@gmail.com> wrote:
>
> Thank your for reply Daniel. Considering my limited knowledge of web programming and javascript , first i need to simulated the some sort of browser in my program which will run the javascript and will generate the pdf. After that i can download the pdf . Is this you mean ? Is Network.Browser any helpful for this purpose ? Is there way to solve this problem ?
> Sorry for many questions but this is my first web application program and i am trying hard to finish it.
>
Have you tried finding out if simple URLs exist for this, that don't require Javascript? Does Wikipedia have a policy on this?
Conrad.
http://en.wikipedia.org/wiki/Wikipedia:Database_download
There is also text in that site saying "Please do not use a web
crawler to download large numbers of articles. Aggressive crawling of
the server can cause a dramatic slow-down of Wikipedia."
Matti
2011/9/9 Kyle Murphy <orc...@gmail.com>:
--
/*******************************************************************/
try {
log.trace("Id=" + request.getUser().getId() + " accesses " +
manager.getPage().getUrl().toString())
} catch(NullPointerException e) {}
/*******************************************************************/
This is a real code, but please make the world a bit better place and
don’t do it, ever.
* http://www.javacodegeeks.com/2011/01/10-tips-proper-application-logging.html *
I've actually used wkhtmltopdf[1] for this kind of stuff in the past.
[1] http://code.google.com/p/wkhtmltopdf/