url shortener revisited

100 views
Skip to first unread message

Miguel Freitas

unread,
Dec 9, 2015, 5:57:04 AM12/9/15
to twist...@googlegroups.com
Hi,

After a while I've decided to finally give it a try to the url shortener thing.

The main concept we discussed last time is pretty simple: the long url should be stored in a separate post inside one's own torrent. This long url is then referenced by a shortened url format we have yet to decide, but basically it must include just the username and post number k (optimizations are possible, see below).

My concern however is about the recoverability of such information. We don't want the url resolver to take ages to complete.

The first obvious way to get the long url is fetching it from DHT. While it may work for a while after the post is produced, the DHT's post copy will eventually expire following our network rules.

The second easy way (which is actually faster) is the case we are following the user. In this case, the original post has probably already been downloaded as a local copy and fetch is immediate.

The real problem is when DHT expires and we are not following user. In this case we must start a new torrent to obtain just a single piece.

This is in kind of conflict with bittorrent anti-leech mechanisms that foster cooperation. A new peer connection starts "choked" so it may not perform piece requests until "unchoked". Only a few connections are unchoked for a given time until they contribute something back.

Here i'm worried about the possibly long waiting time the client will stay on the queue until unchoked. This can make the resolver to take a lot of time to complete. At the same time i cannot simply provide an exception to be abused by bad clients.

So what i'm currently experiencing is adding a "peek" extension to our bittorrent protocol fork inspired in hashcash. This is a simple proof-of-work (originally intended for anti-spam) the client must perform in order to be allowed to make the request. PoW difficulty starts very small (about ~10ms on i7) but is increased if twister detects an abusive behavior.

In my tests it seems to be working fine. Of course we will only be sure when it starts getting deployed to all peers the network. Until then, url shortener will work but possibly a bit slow.

Another interesting discussion will be about the actual shortened url format:

Something like "xxxx://user:k" is the most obvious candidate but we can do better... as username is limited on allowed (lowercase) characters and k is an integer, we may encode it to a binary format and then base64 it. Another possibility is using user registration number instead.

Suggestions are welcomed!

regards,

Miguel

Vincent Olivier

unread,
Feb 24, 2016, 10:07:19 AM2/24/16
to twister-dev


On Wednesday, December 9, 2015 at 5:57:04 AM UTC-5, Miguel Freitas wrote:
Hi,

After a while I've decided to finally give it a try to the url shortener thing.



How is this going along ? Need help ? 

Miguel Freitas

unread,
Feb 25, 2016, 6:28:11 AM2/25/16
to twist...@googlegroups.com
On Wed, Feb 24, 2016 at 12:07 PM, Vincent Olivier <vin...@up4.com> wrote:

How is this going along ? Need help ? 

I'll try to continue this weekend, but i'm not sure it will be finished before i travel next week...

One thing that would be nice to discuss though is the official format(s) we should adopt for the shortened URLs (or URI?). I mean, something like:

twister://username.k   (where k is an integer, the post number)

Actually I'd suggest we accept two formats from the start, one with username as string and other with uid (another integer) which is possibly smaller. So should we may define:

twst1://username.k
twst2://uid.k

Later optimizations (twst3, twst4 etc) may include further encoding uid and k integers in a shorter representation (integer => binary => base64?).

I don't know... I'm not good with these URI standard rules...

regards,

Miguel

Miguel Freitas

unread,
Mar 19, 2016, 3:11:37 PM3/19/16
to twist...@googlegroups.com
I've been reading about URI, and it seems we don't really need "//" since we are not using any "authority" in that scheme.

Also the list of registered protocols is actually smaller than i'd guess:


so, i'm currently inclined to something like that:

twist:12334566788=

Where 3, 6 and 8 are doubled just to represent the overhead of base64 encoding.

That's 18 bytes, shorter than t.co (which is 19 afaik). So we have a fixed size element to replace in text editor.

The encoded base64 thing would be two 32 integers to specify user and post number k.

Future URL gateways may be created by passing the string to the server, eg. "twist:12334566788=" => http://twister.resolver.com/12334566788=

comments?

regards,

Miguel



Сёма Мрачный

unread,
Mar 19, 2016, 5:08:39 PM3/19/16
to twister-dev
so, i'm currently inclined to something like that:

twist:12334566788=
what about trl:// (Twister Resource Locator) or irl:// (Internal Resource Locator or In Real Life)

does "=" character mark end of URL? why, if so?


The encoded base64 thing would be two 32 integers to specify user and post number k.
help me decode your "QwAAAN0DAAA" to 2 integers.

Miguel Freitas

unread,
Mar 19, 2016, 5:21:13 PM3/19/16
to twist...@googlegroups.com
On Sat, Mar 19, 2016 at 6:08 PM, Сёма Мрачный <scarylit...@gmail.com> wrote:
so, i'm currently inclined to something like that:

twist:12334566788=
what about trl:// (Twister Resource Locator) or irl:// (Internal Resource Locator or In Real Life)


Sure, we may use a different "protocol" but i think the "//" is not required.
 
does "=" character mark end of URL? why, if so?


That's actually part of the base64 string. every 3 bytes encodes to 4, but we have 8. then it adds "=" as padding.


The encoded base64 thing would be two 32 integers to specify user and post number k.
help me decode your "QwAAAN0DAAA" to 2 integers.

You know you don't have to, right? Just pass into twisterd's decodeshorturl RPC and we're done.
 
but still, if you'd like doing it step by step to check (not something we need to do in twister-html) you must first base64 decode "QwAAAN0DAAA=". This will give you 8 bytes. First 4 are the userid, little endian. Last 4 are the post number (k), little endian.

---

Well... Now I know I will really need your help :-)

I've tried to code some simple test here into applyHtml() but i've failed miserably... I thought I'd have a jquery element reference to change later, but then i realized the html is concatened as text...

I don't know exactly how to get a reference into the formatted post that can be later changed by a callback or something. I'm confused.

regards,

Miguel

Miguel Freitas

unread,
Mar 19, 2016, 5:26:45 PM3/19/16
to twist...@googlegroups.com
On Sat, Mar 19, 2016 at 6:20 PM, Miguel Freitas <mfre...@gmail.com> wrote:
I've tried to code some simple test here into applyHtml() but i've failed miserably... I thought I'd have a jquery element reference to change later, but then i realized the html is concatened as text...


Ops, not applyHtml(), but rather htmlFormatMsg() 

It was easy detecting and adding a link with "twist:xxxxxx" of course, but not easy to make it update after resolving :(

regards,

Miguel

 

Сёма Мрачный

unread,
Mar 19, 2016, 5:45:00 PM3/19/16
to twister-dev
Sure, we may use a different "protocol" but i think the "//" is not required.
just suppose it's a bit easier to parse (and by eye too).

That's actually part of the base64 string. every 3 bytes encodes to 4, but we have 8. then it adds "=" as padding.
 why is it need to add it by hand? can twisterd add it by itself?

You know you don't have to, right? Just pass into twisterd's decodeshorturl RPC and we're done.
oh, silly me.

I don't know exactly how to get a reference into the formatted post that can be later changed by a callback or something.
when we match shortened twister URL and perform "msg = msgAddHtmlEntity()" we may add some id to so called templateShortenedTwisterURL which we put in newHtmlEntityLink() and then search element with that id from callback function to manage it. it's not so good but I don't see clearly your need.

Сёма Мрачный

unread,
Mar 19, 2016, 5:50:00 PM3/19/16
to twister-dev
I don't see clearly your need
damn, what am I talking about here? I mean I don't see your code.

Сёма Мрачный

unread,
Mar 19, 2016, 5:59:15 PM3/19/16
to twister-dev
ok, I have twisterd compiled.

so question
why is it need to add it by hand? can twisterd add it by itself?
now sounds like — why we need put both "twist:" and "=" to "twisterd decodeshorturl twist:QwAAAN0DAAA="?

Miguel Freitas

unread,
Mar 19, 2016, 9:42:45 PM3/19/16
to twist...@googlegroups.com
On Sat, Mar 19, 2016 at 6:45 PM, Сёма Мрачный <scarylit...@gmail.com> wrote:
That's actually part of the base64 string. every 3 bytes encodes to 4, but we have 8. then it adds "=" as padding.
 why is it need to add it by hand? can twisterd add it by itself?

exactly, i'm not adding it by hand! 

twisterd is using a standard base64 encode function, and the string returned includes the "=".
 
I don't know exactly how to get a reference into the formatted post that can be later changed by a callback or something.
when we match shortened twister URL and perform "msg = msgAddHtmlEntity()" we may add some id to so called templateShortenedTwisterURL which we put in newHtmlEntityLink() and then search element with that id from callback function to manage it. it's not so good but I don't see clearly your need.

Great idea!

regards,

Miguel

Miguel Freitas

unread,
Mar 19, 2016, 9:45:31 PM3/19/16
to twist...@googlegroups.com
On Sat, Mar 19, 2016 at 6:59 PM, Сёма Мрачный <scarylit...@gmail.com> wrote:
so question
why is it need to add it by hand? can twisterd add it by itself?
now sounds like — why we need put both "twist:" and "=" to "twisterd decodeshorturl twist:QwAAAN0DAAA="?

because my idea is that, after we devise shorter/alternative uri's, the frontend would still simply pass the whole string into twisterd to be decoded. twister-html doesn't need to care.

twisterd will have to identify between possibly different variations, for that is better to always have the whole thing.


Сёма Мрачный

unread,
Mar 20, 2016, 5:42:30 AM3/20/16
to twister-dev
twisterd will have to identify between possibly different variations, for that is better to always have the whole thing.
oh, I see. it's versioning thing. we may pass API version number or URI prefix as second parameter, which may be optional for initial version of URL shortener.

Сёма Мрачный

unread,
Mar 20, 2016, 2:07:22 PM3/20/16
to twister-dev
I don't know exactly how to get a reference into the formatted post that can be later changed by a callback or something.
when we match shortened twister URL and perform "msg = msgAddHtmlEntity()" we may add some id to so called templateShortenedTwisterURL which we put in newHtmlEntityLink() and then search element with that id from callback function to manage it. it's not so good but I don't see clearly your need.
Great idea!
it was just one way. I've decided to go another one: https://github.com/miguelfreitas/twister-html/commit/3e43fdb5946a10e956b3f5c3610ad17d39517dfd

we use fillElemWithTxt() to apply formatting to elements so now we search there for .link-shortened elements, disable clicks on them and fetch URIs. then we put fetched URIs on all related to them links on document and enable clicks.

btw I suppose we may cache shortened URIs to localStorage.

Miguel Freitas

unread,
Mar 20, 2016, 2:47:35 PM3/20/16
to twist...@googlegroups.com
On Sun, Mar 20, 2016 at 3:07 PM, Сёма Мрачный <scarylit...@gmail.com> wrote:

Works great! Thanks! :-)

There is, of course, room for future improvements: because 'decodeshorturl' may block, the best thing would be to serialize it together with the dhtget (which are limited in number of simultaneous requests). Otherwise, multiple shorturls in timeline will cause the interface to get less responsive.

This may require some surgery in twister_io.js though...

 
we use fillElemWithTxt() to apply formatting to elements so now we search there for .link-shortened elements, disable clicks on them and fetch URIs. then we put fetched URIs on all related to them links on document and enable clicks.

So the link is unclickable until it resolves? Good idea!

Another possibility, perhaps more user friendly, could be opening a dialog box to inform the user the URL is still being fetched.

 

btw I suppose we may cache shortened URIs to localStorage.

Yes, good idea.

regards,

Miguel

Сёма Мрачный

unread,
Mar 20, 2016, 3:18:35 PM3/20/16
to twister-dev
There is, of course, room for future improvements: because 'decodeshorturl' may block, the best thing would be to serialize it together with the dhtget (which are limited in number of simultaneous requests). Otherwise, multiple shorturls in timeline will cause the interface to get less responsive.
 yep, I've missed that. can you explain it a bit more detailed about "may block"? and why does daemon not dhtget requested resources by itself?

I got
can't fetch URI "twist:CQAAAG4AAAA=": resource busy, try again
from Vegos's link. is it related?

Another possibility, perhaps more user friendly, could be opening a dialog box to inform the user the URL is still being fetched.
yeah, also we need to grey out unfetched links via CSS.

since there may be troubles with fetching and some links cannot be fetched at all even — I think we need to be always able to choose if we want to have links in a post shortened or not. I want to add tool bar to active textarea with buttons to format text and paste links. there may be a checkbox in link pasting dialog to short link or not.

Miguel Freitas

unread,
Mar 20, 2016, 3:30:29 PM3/20/16
to twist...@googlegroups.com
On Sun, Mar 20, 2016 at 4:18 PM, Сёма Мрачный <scarylit...@gmail.com> wrote:
 yep, I've missed that. can you explain it a bit more detailed about "may block"? and why does daemon not dhtget requested resources by itself?

It does exactly that ;-)

But remember 'dhtget' requests do also block (ie. they take some time to complete). That's why we've an enqueuing mechanism specifically for them in twister_io.js, since the browser only allows opening a limited number of parallel requests to the same server.

For all other RPC we assume the time to complete is immediate, or at least very fast, so we don't bother.

 

I got
can't fetch URI "twist:CQAAAG4AAAA=": resource busy, try again
from Vegos's link. is it related?

ops, not really. that's a limitation of my implementation, sorry :(

It means we cannot perform two simultaneous decoding requests to the same resource. So that's another reason we may want to serialize/enqueue 'decodeshorturl'.


since there may be troubles with fetching and some links cannot be fetched at all even — I think we need to be always able to choose if we want to have links in a post shortened or not. I want to add tool bar to active textarea with buttons to format text and paste links. there may be a checkbox in link pasting dialog to short link or not.

Great idea, that would be very cool indeed!

You are right: if possible, not shortening the URL is the best (the hassle-free) solution. So we should always prefer having the URL within the post text and only resorting to shortener if user requests it.

regards,

Miguel


Сёма Мрачный

unread,
Mar 20, 2016, 4:25:56 PM3/20/16
to twister-dev

yep, I've missed that. can you explain it a bit more detailed about "may block"? and why does daemon not dhtget requested resources by itself?
It does exactly that ;-)

But remember 'dhtget' requests do also block (ie. they take some time to complete).
so 'decodeshorturl' and 'dhtget' requests both block. I need to dig through it somehow tomorrow, tell me more about requests serialization for that.

for today I've added caching of fetched URIs: https://github.com/miguelfreitas/twister-html/commit/7d0ee60abacd380d6a27608344c295d30d885322

Сёма Мрачный

unread,
Mar 20, 2016, 4:33:54 PM3/20/16
to twister-dev
here may be a checkbox in link pasting dialog to short link or not.
also I think there will be button to show "URIs Shortener Center" or something with the list of links.

Сёма Мрачный

unread,
Mar 21, 2016, 2:21:24 PM3/21/16
to twister-dev


we use fillElemWithTxt() to apply formatting to elements so now we search there for .link-shortened elements, disable clicks on them and fetch URIs. then we put fetched URIs on all related to them links on document and enable clicks.
actually not on all because $('.link-shortened[href="' + short + '"]') can't select a) detached and b) not attached yet elements. so we may miss some URIs in some cases. https://github.com/miguelfreitas/twister-html/commit/08856095b098a57b301dc3a02778da896e360e45 covers case a, but b is a more complicated case, maybe it's need to attach all created elements to element attached to twister.html.detached immediately after creation.

Сёма Мрачный

unread,
Mar 21, 2016, 3:03:20 PM3/21/16
to twister-dev
maybe it's need to attach all created elements to element attached to twister.html.detached immediately after creation.
fix for twists' elements https://github.com/miguelfreitas/twister-html/commit/21ee46a466cb6333aebd307a1e49d2a8c339680a. to see it in action click on quote in @mfreitas's twist k=1002 and check the link in original twist in conversation window before applying of this commit and after.

Сёма Мрачный

unread,
Mar 21, 2016, 5:07:48 PM3/21/16
to twister-dev

Сёма Мрачный

unread,
Mar 22, 2016, 7:23:14 PM3/22/16
to twister-dev
I want to add tool bar to active textarea with buttons to format text and paste links. there may be a checkbox in link pasting dialog to short link or not.
today I've added shorten URL link to new post textareas, shortened link pastes in caret position of last focused textarea of closest .post-area-new form: https://github.com/miguelfreitas/twister-html/commit/085108cc58112c6833636437a51923943bc8b75a

Сёма Мрачный

unread,
Mar 23, 2016, 12:56:20 PM3/23/16
to twister-dev
I want to add tool bar to active textarea with buttons to format text and paste links. there may be a checkbox in link pasting dialog to short link or not.
today I've added shorten URL link to new post textareas
shorten URL link was tuned and placed to that bar, bar are attached to the bottom of  textarea which are in focus currently: https://github.com/miguelfreitas/twister-html/commit/6cdc6cc511da95d50f030a37a93bd44877cfcd41

Сёма Мрачный

unread,
Mar 23, 2016, 4:57:26 PM3/23/16
to twister-dev
brief explanation was added to warning in enter-URL-to-shorten prompt: https://github.com/miguelfreitas/twister-html/commit/5c5f262ca01e66f35f7e19836bb2c55ed2dbf088
Reply all
Reply to author
Forward
0 new messages