Wget multiples file PDF

7 views
Skip to first unread message

Brush Fire

unread,
Jan 24, 2012, 8:43:24 AM1/24/12
to id-sla...@googlegroups.com
Halo,

Saya hari ini iseng mau download koran...sekalian mempraktekkan link yang udah lama di-bookmark cuma belom dicoba2 :D

Linknya ini:

Saya mau download SEMUA file pdf yang ada di folder "republika.pressmart.com/PUBLICATIONS/RP/RP/2012/01/24/PagePrint/". 
Saya udah coba option2 nya + cari2 di mbah google, tapi masih ga bisa2...

Saat ini pdf udah kedownload semua. Solusi sementara buat file jadi tinggal wget -i, tapi saya yakin kayaknya bisa langsung deh :P...
Mohon bantuannya ya mas2...Tengkyu.

>>>Ini contoh Ga bisanya:
+++snip+++
XXX@darkstar:~/Republika$ wget -r -l1 -A.pdf republika.pressmart.com/PUBLICATIONS/RP/RP/2012/01/24/PagePrint      
Resolving republika.pressmart.com (republika.pressmart.com)... 174.120.184.50
Connecting to republika.pressmart.com (republika.pressmart.com)|174.120.184.50|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Connecting to republika.pressmart.com (republika.pressmart.com)|174.120.184.50|:80... connected.
HTTP request sent, awaiting response... 302 Redirected
--2012-01-24 19:50:50--  http://ads.xlxtra.com/errors/?type=403
Resolving ads.xlxtra.com (ads.xlxtra.com)... 46.137.240.81, 46.137.240.202
Connecting to ads.xlxtra.com (ads.xlxtra.com)|46.137.240.81|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 989 [text/html]

100%[====================================================>] 989         --.-K/s   in 0.02s   

2012-01-24 19:50:52 (55.5 KB/s) - "ads.xlxtra.com/errors/index.html?type=403" saved [989/989]

Removing ads.xlxtra.com/errors/index.html?type=403 since it should be rejected.

FINISHED --2012-01-24 19:50:52--
Downloaded: 1 files, 989 in 0.02s (55.5 KB/s)
+++snap+++

>>>Ini contoh Bisanya:
+++snip+++
Resolving republika.pressmart.com (republika.pressmart.com)... 174.120.184.50
Connecting to republika.pressmart.com (republika.pressmart.com)|174.120.184.50|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 491180 (480K) [application/pdf]
Saving to: "24_01_2012_001.pdf"

100%[====================================================>] 491,180     60.7K/s   in 8.4s    

2012-01-24 19:31:10 (57.2 KB/s) - "24_01_2012_001.pdf" saved [491180/491180]

Connecting to republika.pressmart.com (republika.pressmart.com)|174.120.184.50|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 638028 (623K) [application/pdf]
Saving to: "24_01_2012_002.pdf"

100%[====================================================>] 638,028     59.3K/s   in 11s     

2012-01-24 19:31:23 (54.3 KB/s) - "24_01_2012_002.pdf" saved [638028/638028]

FINISHED --2012-01-24 19:31:23--
Downloaded: 2 files, 1.1M in 20s (55.5 KB/s)
+++snap+++

Wah jadi panjang :D...Maafkan saya ya :P

--
Sometimes asking better than googling...Hehehehe

Widya Walesa

unread,
Jan 24, 2012, 9:26:10 AM1/24/12
to id-sla...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/24/2012 08:43 PM, Brush Fire wrote:
> Halo,
>
> Saya hari ini iseng mau download koran...sekalian mempraktekkan
> link yang udah lama di-bookmark cuma belom dicoba2 :D
>
> Linknya ini:
> http://www.master.web.id/mwmag/issue/04/content/hack-wget/hack-wget.html
>
> Saya mau download SEMUA file pdf yang ada di folder "
> republika.pressmart.com/PUBLICATIONS/RP/RP/2012/01/24/PagePrint/".
> Saya udah coba option2 nya + cari2 di mbah google, tapi masih ga
> bisa2...
>
> Saat ini pdf udah kedownload semua. Solusi sementara buat file jadi
> tinggal wget -i, tapi saya yakin kayaknya bisa langsung deh :P...
> Mohon bantuannya ya mas2...Tengkyu.
>
>>>> Ini contoh Ga bisanya:
> +++snip+++ XXX@darkstar:~/Republika$ wget -r -l1 -A.pdf
> republika.pressmart.com/PUBLICATIONS/RP/RP/2012/01/24/PagePrint
> --2012-01-24 19:50:46--
> http://republika.pressmart.com/PUBLICATIONS/RP/RP/2012/01/24/PagePrint
>
>
Resolving republika.pressmart.com (republika.pressmart.com)...
> 174.120.184.50 Connecting to republika.pressmart.com
> (republika.pressmart.com)|174.120.184.50|:80... connected. HTTP
> request sent, awaiting response... 301 Moved Permanently Location:
> http://republika.pressmart.com/PUBLICATIONS/RP/RP/2012/01/24/PagePrint/[following]
>
>

- --2012-01-24 19:50:49--

Itu biasanya karena website menggunakan fitur no-dir-listing alias
klien tidak diijinkan untuk melihat isi direktori secara langsung.
Jadi klien harus menggunakan full path/URL.

- --
Slacker's Dent - http://www.walecha.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPHr+CAAoJELIMgxZHBHke8fcH/2GczyP69Hi2V19V0sbnkNMR
2n7j9pXzUGl5z+DPV+gdmH9nEqHfanqM9sqI/GlcjMPlnS3vlJ30QryRorZuETUj
7ChaJX0NrYjJHBiWjUuxwA7ghccB74a4QB6rRh4KEtsMMSt7nFvuq1Hh7UJ16imu
Kk22Up9eYuShsHbUfhrlapxuTKplXov3iRhzqdzkxJQEi236JlCV3BwsS/r43yQo
1U8s5Ipj/1JEQwfWdJENpJjHhTA711fYQxkrwoG1ZfqItnMlqnwT1c2uQtDSjlrk
x/bQorZiGLRrn0vYp2mqGf3N0Jgh1sLeGrLEekooQJvMpBK3JcKY2jZoASDleIo=
=Rkik
-----END PGP SIGNATURE-----

Ridho Rachman

unread,
Jan 25, 2012, 9:36:15 PM1/25/12
to id-sla...@googlegroups.com
coba lagi pasang opsi2 anti robot dan referernya sudah defaultnyanya situs2 sekarang menolak robot seperti wget....

2012/1/24 Widya Walesa <wale...@gmail.com>

--
=============================ID-SLACKWARE=============================
Alamat pengiriman pesan : id-sla...@googlegroups.com
Untuk Keluar: kirimkan email ke id-slackware...@googlegroups.com
Web Milis : http://groups.google.com/group/id-slackware
Chat room : ##id-slackware (irc.freenode.net)
=============================ID-SLACKWARE=============================

Reply all
Reply to author
Forward
0 new messages