Position calculcations, range handling and offsets

0 views
Skip to first unread message

Chris Searle

unread,
Oct 14, 2009, 4:22:57 AM10/14/09
to google-...@googlegroups.com
Am having an issue that I don't understand.

I have a bot that looks for URLs (regexp matching) and replaces them with an image.

Here's a code piece:

def OnBlipSubmitted(properties, context):
  blip = context.GetBlipById(properties['blipId'])
  doc = blip.GetDocument()
  
  m = r.search(doc.GetText())
  
  while m != None:
    doc.DeleteRange(document.Range(m.start(0), m.end(0)))
    image = getImage(m.group(2))
    newimage = document.Image(image.thumb.source, caption=image.title, width=image.thumb.width, height=image.thumb.height)
    doc.InsertElement(m.start(0), newimage)
    m = r.search(doc.GetText())

So - I grab the first match - remove it, insert an image then search again - on the new text.

This works fine for the first URL - it gets it exactly right.

However - if I add the following to a blip:


I expect

IMG bla bla IMG

I get

IMG bla bIMGfoo

That is - the second one is three characters (always 3) too early both in start and end. From what I can see - its the values of m.start and m.end that are at fault. The actual match itself is correct - since the values of m.group(0) and m.group(2) are exactly right for the input. I'm uncertain as to why this is - what could be in the text that is confusing it. If anyone can give me any hints ?

Oh - and while asking - what is the caption param of an Image for? It certainly doesn't appear in the blip even though it's being set.

Chris C.

unread,
Oct 14, 2009, 10:46:30 AM10/14/09
to Google Wave API
I was going to suggest an offset variable, but as you say, it appears
that the subsequent regex search is on the just-updated text. I'm not
sure what the issue is (I have a dice rolling bot that uses regex and
inserts results after the relevant dice expression, and I use an
offset variable to adjust the insertion points by an amount equal to
the length of what was inserted on the previous iteration). Perhaps
there's a mutability issue with doc.getText()? I use Java, so I don't
know how that works.

Be interested to know the answer, though!

On Oct 14, 4:22 am, Chris Searle <chrisdsea...@gmail.com> wrote:
> Am having an issue that I don't understand.
> I have a bot that looks for URLs (regexp matching) and replaces them with an
> image.
>
> Here's a code piece:
>
> def OnBlipSubmitted(properties, context):
>   blip = context.GetBlipById(properties['blipId'])
>   doc = blip.GetDocument()
>
>   m = r.search(doc.GetText())
>
>   while m != None:
>     doc.DeleteRange(document.Range(m.start(0), m.end(0)))
>     image = getImage(m.group(2))
>     newimage = document.Image(image.thumb.source, caption=image.title,
> width=image.thumb.width, height=image.thumb.height)
>     doc.InsertElement(m.start(0), newimage)
>     m = r.search(doc.GetText())
>
> So - I grab the first match - remove it, insert an image then search again -
> on the new text.
>
> This works fine for the first URL - it gets it exactly right.
>
> However - if I add the following to a blip:
>
> http://www.example.com/foo/barbla blahttp://www.example.com/bar/foo

Chris Searle

unread,
Oct 15, 2009, 3:56:13 AM10/15/09
to Google Wave API
On Oct 14, 4:46 pm, "Chris C." <yclept.ch...@gmail.com> wrote:
> I was going to suggest an offset variable, but as you say, it appears
> that the subsequent regex search is on the just-updated text. I'm not
> sure what the issue is (I have a dice rolling bot that uses regex and
> inserts results after the relevant dice expression, and I use an
> offset variable to adjust the insertion points by an amount equal to
> the length of what was inserted on the previous iteration). Perhaps
> there's a mutability issue with doc.getText()? I use Java, so I don't
> know how that works.

I could try this - but - how much offset is an image?

It seems a little strange to me - but - then I feel a little unclear
about the blip's document structure. What is the result of getText()
when the blip contains media for example. The same applies to knowing
what formatting you can add (want to have a nice title, the image and
a couple of links - one to the image page one to the image license -
but I don't yet know how to apply this kind of formatting
programatically).

I could do the "get all matches in list, reverse it and then apply" -
but that seems less elegant.

Chris C.

unread,
Oct 15, 2009, 10:37:10 AM10/15/09
to Google Wave API
I have no idea. Images may count as one "character" position, but I'm
not sure. You could experiment on that basis, and just subtract from
the start point of each match the cumulative length of all previous
matches, and see if that works. I basically had to mess with numbers
and offsets (and do manual counting of characters to see how things
moved) to get it right for my robot.

I think that one of the things that would be most helpful - especially
for me, as an amateur, but probably for everyone - is more clarity on
what kind of content is being seen. I know that we only want to work
through the interfaces provided - and that Wave is changing under the
hood all the time - but some more information on document structure,
annotation capabilities, available markup, etc. would be a great
resource.

Chris C.

unread,
Oct 15, 2009, 11:19:18 AM10/15/09
to Google Wave API
Quick follow up - I was working with Watexy, which replaces text with
LaTex rendered images of the same text (you surround the string to be
replaced with $$), and it looks like images may take up 0 character
"spaces." If you experiment with the Watexy bot, you'll find that if
you replace 2 separate equations in a single line of text, one of them
- the first - will work properly, and the second will not. However, if
you count the length of the _text_ in each, the difference comes out
to the total length of the replaced text..

The text I used to test was the 34 character string:

Testing $$10x2=20$$ and $$3^2 = 9$$

When modified by Watexy, you get 12 characters of text, with the
images inserted where noted.

Testing (Image) a(Image)9$$

The total length of the text replaced by Watexy is 22 characters,
which means - if I've got my head on straight - the two images take up
no string space. Thus, your offset would be based solely on the length
of the URL's being replaced, and doesn't have to account for the size
of the image. Hope a) I got that right, and b) it helps!

Chris Searle

unread,
Oct 16, 2009, 5:54:11 AM10/16/09
to google-...@googlegroups.com
Starting to wonder if it would be better for the bot to replace the link not with a picture but with a gadget (served from the same appspot app) - wondering if that would give better display control.


Chris Searle

unread,
Oct 16, 2009, 9:36:11 AM10/16/09
to google-...@googlegroups.com
Right now I've caved in and done a reverse search (get all matches, reverse the list and then apply) - so that we start at the end of the string and work back. It works - but it seems wildly inelegant.


Reply all
Reply to author
Forward
0 new messages