relative image links

66 views
Skip to first unread message

Dominik Frey

unread,
Dec 13, 2019, 9:47:29 AM12/13/19
to openwayback-dev
Hello,

I am struggling with relative links in the data-srcset attribute, because these links are not rewritten by open wayback and the images hence not displayed. 

I am using open wayback 2.4.0

for example:

 <picture>
<source data-srcset="/sport/mehr-sport/Sportlercollage%2C1574002596479%2Clandessportlerwahl2019-rheinland-pfalz-100~_v-1x1@2dXS_-9a91314e681c924f29ec89ca45e4e12f5b1d728e.jpg 320w,/sport/mehr-sport/Sportlercollage%2C1574002596479%2Clandessportlerwahl2019-rheinland-pfalz-100~_v-1x1@2dS_-4026684eab220e7fb9eb2f80f316da061f3d025c.jpg 480w" sizes="(min-width: 1900px) 371px, (min-width: 1200px) 257px, (min-width: 768px) 188px, (min-width: 520px) 490px, 100vw" srcset="/sport/mehr-sport/Sportlercollage%2C1574002596479%2Clandessportlerwahl2019-rheinland-pfalz-100~_v-1x1@2dXS_-9a91314e681c924f29ec89ca45e4e12f5b1d728e.jpg 320w,/sport/mehr-sport/Sportlercollage%2C1574002596479%2Clandessportlerwahl2019-rheinland-pfalz-100~_v-1x1@2dS_-4026684eab220e7fb9eb2f80f316da061f3d025c.jpg 480w">
 
<img data-src="http://webarchiv/openwayback/20191206090017im_/https://www.swr.de/sport/mehr-sport/Sportlercollage,1574002596479,landessportlerwahl2019-rheinland-pfalz-100~_v-1x1@2dXS_-9a91314e681c924f29ec89ca45e4e12f5b1d728e.jpg" data-spy="lazyload" class="rs_skip_always error" data-copyright="Foto: Imago, picture-alliance / Reportdienste, Mike Schmidt, Jens Büttner; Revierfoto; Sven Simon, Anke Waelischmiller/SVEN SIMON, BEAUTIFUL SPORTS/Nils Koepke; Michael Kappeler; Borjab.Hojas;" alt="Sportlercollage (Foto: Imago, picture-alliance / Reportdienste, Mike Schmidt, Jens Büttner; Revierfoto; Sven Simon, Anke Waelischmiller/SVEN SIMON, BEAUTIFUL SPORTS/Nils Koepke; Michael Kappeler; Borjab.Hojas;)" title="Foto: Imago, picture-alliance / Reportdienste, Mike Schmidt, Jens Büttner; Revierfoto; Sven Simon, Anke Waelischmiller/SVEN SIMON, BEAUTIFUL SPORTS/Nils Koepke; Michael Kappeler; Borjab.Hojas;" src="http://webarchiv/openwayback/20191206090017im_/https://www.swr.de/sport/mehr-sport/Sportlercollage,1574002596479,landessportlerwahl2019-rheinland-pfalz-100~_v-1x1@2dXS_-9a91314e681c924f29ec89ca45e4e12f5b1d728e.jpg" data-was-processed="true">
</picture>

Open wayback tries to open these links via eg. http://webarchiv/sport/mehr-sport/Sportlercollage%2C1574002596479%2Clandessportlerwahl2019-rheinland-pfalz-100~_v-1x1@2dXS_-9a91314e681c924f29ec89ca45e4e12f5b1d728e.jpg


Do you have any ideas how I could fix this issue?

Thank you and best regards
Dominik



Dominik Frey

unread,
Dec 13, 2019, 10:23:14 AM12/13/19
to openwayback-dev
I just made a quick fix, by including a custom javascript file in the Toolbar.jsp that removes all picture source tags. Thus the links in the <img> tag are displayed.

$(function() {
$('picture source').remove();
});


However it would be still good to have a proper solution for the relative image links in the data-srcset

Kind regards
Dominik

Sawood Alam

unread,
Dec 13, 2019, 10:32:19 AM12/13/19
to openway...@googlegroups.com
Hi Dominik,

I am not sure what can be done to rewrite "data-*" HTML attributes as they are custom attributes and their purpose and contained data in them is application-specific. However, if we have some some heuristic data to treat certain custom attributes in a specific way, we perhaps can add rules for them, but we need to be careful not to cause false negatives in the process.

That said, you can use client-side rewriting to handle some of these issues. here are a couple of these that you can look into:

* Reconstructive - a Service Worker for client-side URL rerouting https://github.com/oduwsdl/Reconstructive
* Wombat - a client-side URL rewriting system https://github.com/webrecorder/wombat

Best,

--
Sawood Alam
PhD Candidate
Old Dominion University
Norfolk, Virginia - 23529



--
You received this message because you are subscribed to the Google Groups "openwayback-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openwayback-d...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openwayback-dev/012875f7-c0d1-401a-a094-dfecdc0725a1%40googlegroups.com.

Lauren Ko

unread,
Dec 16, 2019, 12:18:31 PM12/16/19
to openway...@googlegroups.com
Hi Dominik,
If you switch to use `archivalSAXDelegator` (in ArchivalUrlReplay.xml, import ArchivalUrlSaxReplay.xml and swap out the reference to `fastArchivalSAXDelegator` with `archivalSAXDelegator`) you can specify attributes you want to be written for different tags in ArchivalUrlSaxReplay.xml. For example, to rewrite `data-srcset` attributes in `source` tags, you would include something like:

<bean class="org.archive.wayback.replay.html.rules.AttributeModifyingRule">
<property name="tagName" value="SOURCE" />
<property name="modifyAttributeName" value="DATA-SRCSET" />
<property name="transformer" ref="srcsetAttributeHandler" />
</bean>


I am not sure how  switching to `archivalSAXDelegator` may affect performance of your OpenWayback. Alternatively, if you want to keep using `fastArchivalSAXDelegator`, via the source code available on GitHub, you could instead add in wayback-core/src/main/resources/org/archive/wayback/archivalurl/attribute-rewrite.properties the attribute that should be rewritten:

*.DATA-SRCSET.type=ss

Like Sawood says, "data-*" HTML attributes are custom data attributes, however, we have added rewrites for some such attributes in OpenWayback. If you would like to open a PR similar to https://github.com/iipc/openwayback/pull/382 to handle rewriting `data-srcset` attributes, it would be welcome.

Lauren Ko
UNT Libraries

--
You received this message because you are subscribed to the Google Groups "openwayback-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openwayback-d...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages