On 09/05/12 04:16, Laurent Savaëte wrote:
> It took almost a week to figure where the bug was, but I finally nailed
> it :)
>
> You create links the same way for both internal and external links, you
> can search internal ones by url (eg. type "rest" in the url input and
> you'll be shown a list of pages named "/en/Restaurant_vocabulary" and so
> on).
> Search *should* work whether you type a urn or url (ie: both "en:rest"
> or "/en/rest" should bring up the same results).
> If you don't pick any result from the list, aloha will assume you're
> doing an external link (like if you type in "
http://wikiotics.org" and
> hit enter).
>
> For now, we need to store two data-xxx microdata attributes in the links
> to work out the difference between internal anexternal ones.
> I think there's a way to do without these extra attributes, but I'll
> work on this once more pressing stuff is solved.
> A CSS class is added to each link, so we can display them accordingly in
> view mode.
> The good news is, it works with the standard aloha link plugin and just
> a simple "repository" (about 100 lines of code) requesting links from
> the backend. So the volume of (our own) code is very limited.
>
> Jim, it would be great if you had a quick look at the few commits that
> contain python code. Just skip the js code, it's pretty clean (I think).
>
> Edit
http://laurent.dev.wikiotics.net/en/visual_example2 for a test drive.
I think the code looks great. From my perspective it is ready to be
merged into master whenever you think is good. I am really excited
about this, and appreciate you taking the time to look into it and get
everything right.
I have only a few comments.
In d92b35cf3 you changed the API of search_pages(), as now it is
supposed to accept a dictionary of various things. However, I don't
think you updated all the existing code to the new API, so some things
are probably broken. Actually, my vote would be to stick with the old
API, but it doesn't really matter. If you need to pass a dict to the
function, you can do it by calling search_pages(**d). For more
information on this syntax, see
http://www.saltycrane.com/blog/2008/01/how-to-use-args-and-kwargs-in-python/
(I recall that you were asking about it recently, and it's quite
simple/elegant once you get the hang of it). If you want me to propose
a patch of how I would do the API, let me know, it's only a few lines.
I would argue that the function urn_from_url() is misnamed as it does
not return a urn. I would call it fqpagename_from_url().
Obviously we need to start sanitizing input for real before we deploy to
the main site. I think it makes most sense to sanitize on output, for
one reason: it is possible that new attacks will be found over time,
which will require more aggressive sanitization in the future. In this
case, we would likely want to store the sanitized version in cache
(memcache or disk cache), as the operation can be somewhat expensive
(right?). Even with this enabled, we would probably want to sanitize on
input, just to keep ridiculous stuff out of the database altogether.
Actually, for now, sanitizing on input is sufficient, though we should
acknowledge that in the future we will probably end up sanitizing on
output as well (as soon as we find the need to sanitize something more
aggressively). So those are my thoughts, though they require no action.
If you haven't made one already, it would be very nice to have a test
case to make sure sanitization is actually working...