Hi Patrik,
You might be interested in this article about 'fuzzy anchoring': https://web.hypothes.is/blog/fuzzy-anchoring/ and this (somewhat dated, but still useful) reading list: https://web.hypothes.is/robust-anchoring/. You can also see some simple advice for publishers that helps them understand the meta/microdata that they can include to make their documents more friendly for annotation even if they do not share exactly matching URLs: https://web.hypothes.is/for-publishers/
All best,
Steel
--
You received this message because you are subscribed to the Google Groups "dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@list.hypothes.is.
To post to this group, send email to d...@list.hypothes.is.
To view this discussion on the web visit https://groups.google.com/a/list.hypothes.is/d/msgid/dev/e4f50148-2077-4c60-a4b6-b9a5e3ee3a77%40list.hypothes.is.
To unsubscribe from this group and stop receiving emails from it, send an email to d...@list.hypothes.is.
Thanks Steel for those links. I must admit I had not read those pages carefully yet; I took a closer look at them now.I noticed that the pages on fuzzy/robust anchoring are more than 6 years old. I wonder, is this still representative of the current state?
The selectors listed do seem quite strongly targeted to a "relatively static" document, as opposed to fluid content, so I would not expect these types of selectors to so work well on social media sites.
This can be (nearly always) determined using "Matt's Rule of Text" which is as follows: "Any 8 to 10 word string uniquely identifies a document". Yep, mind-boggling, I know.
Here's an example (for this page): https://www.google.com/search?q=%22Assuming%20that%20we%20can%20determine%20that%20we%20are%22
Another example for this page: https://www.google.com/search?q=%22this%20fast%20and%20straightforward%20method%20will%20find%20a%22
And for completeness, one more: https://www.google.com/search?q=%22This%20is%20an%20old%20problem,%20and%20over%20the%22
Note that the last example returns two results. The second result does not include the quoted ",". The first result is (nearly always) the source document.
This algorithm has some very important ramifications, I feel. Its use here, in this type of application (e.g. documents returning a 404 could be passed to Google for reattachment of annotations) and another pressing one, determining the source of fake news. There are other uses clearly.
Please share with those who care. Thank me later. :)
ASIDE: I "discovered" this algorithm in 2003 during my work with PurpleSlurple and QuIP.