OW redirects from example.com to example.com/index.php and back again

32 views
Skip to first unread message

webdep...@gmail.com

unread,
Aug 31, 2016, 5:13:58 AM8/31/16
to openwayback-dev
       
      Hi,     

          I would like to ask you about a problem with OW redirections. We tried to harvest (Heritrix) the page www.olympics.sk , and in OW the page keeps redirect itself from www.olympic.sk to www.olympics.sk/index.php . No content is shown, only a blank space. Harvest reached the set volume of 2 GB harvested data per this domain.

      Thank you for any advice, best regards,

      Peter,
      Slovak webarchive,
      www.webdepozit.sk

   
Redirects.png

Kristinn Sigurðsson

unread,
Aug 31, 2016, 10:20:39 AM8/31/16
to openway...@googlegroups.com
Hi Peter,

The most obvious issue would be if www.olympics.sk/index.php was not (correctly) harvested.

Probably best to look into the WARC files and extract the response record for both URLs. Probably worth checking how their CDX lines look as well.

Assuming you're using gzipped WARCs, you can bash utility 'zcat' to look into the WARC. I find it best to pipe the output into 'less' and then uses less's search capabilities.

Best,
Kris

-------------------------------------------------------------------------
Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 Reykjavík
Sími/Tel: +354 5255600 | www.landsbokasafn.is
-------------------------------------------------------------------------
fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is
> --
> You received this message because you are subscribed to the Google Groups
> "openwayback-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openwayback-d...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

webdep...@gmail.com

unread,
Sep 13, 2016, 4:55:36 AM9/13/16
to openwayback-dev


   Thank you.
Reply all
Reply to author
Forward
0 new messages