mod_archive mixed to/from

63 views
Skip to first unread message

Hugo Osvaldo Barrera

unread,
May 20, 2013, 1:33:25 AM5/20/13
to proso...@googlegroups.com
Hi,

I've started using mod_archive, and I noticed that my client displayed
sent messages as received, and viceversa.

After looking at the module's code, and the pidgin plugin's code, and
XEP-0136, I realized that mod_archive was mixing up "to" and "from".

In XEP-0136, messages marked "from" mean "it came *from* this contact",
while it seems that what the mod's author had in mind was "this message
came *from* the local user".

Here's a patch that fixes the issue. [see attached file]

I know that mod_archive is unsupported, but after playingaround with it
for a while, I didn't notice it either unstable nor broken (save this
small issue). Did I miss some major issue here?
I'd be willing to maintain this mod, if it's ok with everyone. By
"maintain" I mean "fix important issues", since I don't think there's
any huge work that needs to be done.

Cheers,

--
Hugo Osvaldo Barrera
to_from.patch

Matthew Wild

unread,
May 20, 2013, 3:59:45 AM5/20/13
to Prosody IM Developers Group
Hi Hugo,

On 20 May 2013 06:33, Hugo Osvaldo Barrera <hu...@osvaldobarrera.com.ar> wrote:
> I know that mod_archive is unsupported, but after playingaround with it
> for a while, I didn't notice it either unstable nor broken (save this
> small issue). Did I miss some major issue here?
> I'd be willing to maintain this mod, if it's ok with everyone. By
> "maintain" I mean "fix important issues", since I don't think there's
> any huge work that needs to be done.

First, a bit of background on the archiving situation...

mod_archive was written as part of a Google Summer of Code project. It
implements XEP-0136, but has some outstanding bugs that didn't get
fixed. While the project was ongoing and I was helping the student, I
realised what an ugly complex protocol XEP-0136 actually was (thus
mod_archive, and other implementations I know, only implements a
small part of it).

I decided the answer was a new, simpler, archiving specification
(XEP-0313). My hope was (and is) that by making it as easy to
implement as possible, more clients and servers will implement it than
implemented XEP-0136 (of which I can count very very few...). XEP-0313
is implemented in mod_mam.

Now, regardless of protocols and modules, there is another problem.
Prosody's current storage API does not have any efficient way of
storing a random-access, and potentially very large, list of stanzas.
The closest we currently have is offline messages, but we never need
to query those, modify them, or access them out of order.

We are (I am) working on an extension to our storage API, which is
also crucial for a number of other things, including pubsub
persistence. Until it is implemented, large archives have the
potential to bring a running Prosody to its knees, which is why none
of the archiving modules are ready for production use at the moment
(except mod_mam_sql, which bypasses the storage API). The modules at
this point exist with the goal of helping client developers test
against something when implementing the protocols.

With the above issues, and people frequently installing and requesting
support for mod_archive which was buggy, implemented a protocol we
want to move away from, and also not recommended for general use, we
decided to no longer support it.

And now here you are :)

As long as you understand all of the above, I think it's great if you
want to fix and maintain mod_archive. I haven't yet looked at your
patch, but I know at least one thing that causes issues and is
probably related to what you see. The way mod_archive catches messages
is wrong. It has a single event handler and tries to determine if it's
an incoming or outgoing message. The correct thing to do is to have
two handlers, and attach them to the correct events. I believe the
current code will choke on any messages from remote servers, and
actually cause them to be dropped. This is the most serious issue that
people were hitting.

You can take a look at mod_mam for the correct way to catch messages.

If you have a Google account then let me know, and I can give you push
access to prosody-modules.

Regards,
Matthew

Hugo Osvaldo Barrera

unread,
May 21, 2013, 8:42:32 AM5/21/13
to proso...@googlegroups.com
On 2013-05-20 08:59, Matthew Wild wrote:
> Hi Hugo,

Hi Matthew,

Sorry for the rather late reply. I wanted to take the time to reply
properly and I had a rather busy monday.

>
> On 20 May 2013 06:33, Hugo Osvaldo Barrera <hu...@osvaldobarrera.com.ar> wrote:
> > I know that mod_archive is unsupported, but after playingaround with it
> > for a while, I didn't notice it either unstable nor broken (save this
> > small issue). Did I miss some major issue here?
> > I'd be willing to maintain this mod, if it's ok with everyone. By
> > "maintain" I mean "fix important issues", since I don't think there's
> > any huge work that needs to be done.
>
> First, a bit of background on the archiving situation...
>
> mod_archive was written as part of a Google Summer of Code project. It
> implements XEP-0136, but has some outstanding bugs that didn't get
> fixed. While the project was ongoing and I was helping the student, I
> realised what an ugly complex protocol XEP-0136 actually was (thus
> mod_archive, and other implementations I know, only implements a
> small part of it).

I know that XEP-0136 is rather huge, and I assumed it wasn't fully
implemented. However, as far as I can see, the important bits (I least
the bits I care about the most, and the bits I belive most users care
about the most are there).

>
> I decided the answer was a new, simpler, archiving specification
> (XEP-0313). My hope was (and is) that by making it as easy to
> implement as possible, more clients and servers will implement it than
> implemented XEP-0136 (of which I can count very very few...). XEP-0313
> is implemented in mod_mam.

While it would be great to have XEP-0313, client support is still missing
in most cases. While there's a XEP-0136 plugin for pidign, there's still
no suport for XEP-0313 yet. The same applies to a few other plugins. I
think it would be nice to have proper XEP-0313 support it prosody though,
since server support generally motivates clients to implement them,
however, in the meantime, partial-XEP-0136 is quite useful.

>
> Now, regardless of protocols and modules, there is another problem.
> Prosody's current storage API does not have any efficient way of
> storing a random-access, and potentially very large, list of stanzas.
> The closest we currently have is offline messages, but we never need
> to query those, modify them, or access them out of order.
>
> We are (I am) working on an extension to our storage API, which is
> also crucial for a number of other things, including pubsub
> persistence. Until it is implemented, large archives have the
> potential to bring a running Prosody to its knees, which is why none
> of the archiving modules are ready for production use at the moment
> (except mod_mam_sql, which bypasses the storage API). The modules at
> this point exist with the goal of helping client developers test
> against something when implementing the protocols.

Would a single-user installation, or an installation with just 2 or
three users still have that effect on Prosody?

>
> With the above issues, and people frequently installing and requesting
> support for mod_archive which was buggy, implemented a protocol we
> want to move away from, and also not recommended for general use, we
> decided to no longer support it.
>
> And now here you are :)
>
> As long as you understand all of the above, I think it's great if you
> want to fix and maintain mod_archive. I haven't yet looked at your
> patch, but I know at least one thing that causes issues and is
> probably related to what you see. The way mod_archive catches messages
> is wrong. It has a single event handler and tries to determine if it's
> an incoming or outgoing message. The correct thing to do is to have
> two handlers, and attach them to the correct events. I believe the
> current code will choke on any messages from remote servers, and
> actually cause them to be dropped. This is the most serious issue that
> people were hitting.
>
> You can take a look at mod_mam for the correct way to catch messages.

Actually, I will. I'll try to catch messages in mod_archive the same
way mod_mam does. Basically, for few reasons:

1) So that it doesn't eat up people's messages in any cases.
2) Because the current techique has some wierd issues (ie: when a message
is sent to oneself).
3) So as to clean up any bugs in mod_archive, laying some groundwork so
as to use the storage API in future.

What I don't honestly intend to do, is implement *all* of the XEP-0136
features. At least not for the time being.

>
> If you have a Google account then let me know, and I can give you push
> access to prosody-modules.
>
> Regards,
> Matthew
>
> --
>

By the way, do you think it would be wise for mod_archive and mod_mam to
use the same storage format in future? I haven't read XEP-0313 though,
so I don't even know it that's feasable, but I think that would be
useful, since it would allow users to easily move between XEP-0136 and
XEP-313 clients.

Matthew Wild

unread,
May 21, 2013, 11:04:56 AM5/21/13
to Prosody IM Developers Group
On 21 May 2013 13:42, Hugo Osvaldo Barrera <hu...@osvaldobarrera.com.ar> wrote:
> On 2013-05-20 08:59, Matthew Wild wrote:
>> We are (I am) working on an extension to our storage API, which is
>> also crucial for a number of other things, including pubsub
>> persistence. Until it is implemented, large archives have the
>> potential to bring a running Prosody to its knees, which is why none
>> of the archiving modules are ready for production use at the moment
>> (except mod_mam_sql, which bypasses the storage API). The modules at
>> this point exist with the goal of helping client developers test
>> against something when implementing the protocols.
>
> Would a single-user installation, or an installation with just 2 or
> three users still have that effect on Prosody?

The main factor (with the current code) is the number of messages in
the archive. Storing a message reads the whole archive from disk, adds
the new message to the data, and saves it all back to disk again.
Whether this can be avoided easily while still supporting XEP-0136
features like collections, I don't know. mod_mam at least doesn't have
*this* issue (though it still has to load everything from disk to run
queries).

Anyway, since I couldn't say for certain, I ran some quick (ok, not so
quick) tests, this resulting graph shows the time taken to add a
single message to the archive as the archive size increases:
http://matthewwild.co.uk/uploads/mod_archive_performance.png

A (very) quick survey suggests 100 messages/day might be typical usage
(if there is any such thing). Therefore after only a few months of
archiving, you will start to see a noticeable lag in your message
delivery.

> Actually, I will. I'll try to catch messages in mod_archive the same
> way mod_mam does. Basically, for few reasons:
>
> 1) So that it doesn't eat up people's messages in any cases.
> 2) Because the current techique has some wierd issues (ie: when a message
> is sent to oneself).
> 3) So as to clean up any bugs in mod_archive, laying some groundwork so2
> as to use the storage API in future.
>
> What I don't honestly intend to do, is implement *all* of the XEP-0136
> features. At least not for the time being.

I don't blame you :)

> By the way, do you think it would be wise for mod_archive and mod_mam to
> use the same storage format in future? I haven't read XEP-0313 though,
> so I don't even know it that's feasable, but I think that would be
> useful, since it would allow users to easily move between XEP-0136 and
> XEP-313 clients.

The new storage API is being designed with archives of stanzas in
mind, but I don't know how easy it will be to map some features of
XEP-0136 to it (that is, I really don't know... it might end up being
easy). But making the implementation simple was a large motivation for
XEP-0313, so I won't be going out of my way to make room for advanced
XEP-0136 support if nothing else needs them.

Regards,
Matthew

Hugo Osvaldo Barrera

unread,
May 22, 2013, 4:11:05 AM5/22/13
to proso...@googlegroups.com
This can be mitigated by having each collection in a separate file until
the storage API is in place. (A collection is basically a
"conversation"). This will keep files way smaller, and will reduce a
lot of load when just storing messages.
... Or, it could be left as-is until the storage API is in place, and then
mod_archive can me migrated to it.

>
> > Actually, I will. I'll try to catch messages in mod_archive the same
> > way mod_mam does. Basically, for few reasons:
> >
> > 1) So that it doesn't eat up people's messages in any cases.
> > 2) Because the current techique has some wierd issues (ie: when a message
> > is sent to oneself).
> > 3) So as to clean up any bugs in mod_archive, laying some groundwork so2
> > as to use the storage API in future.
> >
> > What I don't honestly intend to do, is implement *all* of the XEP-0136
> > features. At least not for the time being.
>
> I don't blame you :)
>
> > By the way, do you think it would be wise for mod_archive and mod_mam to
> > use the same storage format in future? I haven't read XEP-0313 though,
> > so I don't even know it that's feasable, but I think that would be
> > useful, since it would allow users to easily move between XEP-0136 and
> > XEP-313 clients.
>
> The new storage API is being designed with archives of stanzas in
> mind, but I don't know how easy it will be to map some features of
> XEP-0136 to it (that is, I really don't know... it might end up being
> easy). But making the implementation simple was a large motivation for
> XEP-0313, so I won't be going out of my way to make room for advanced
> XEP-0136 support if nothing else needs them.

In any case, even if the same archives can't be shared by mod_archive
and mod_mam, there's no reason for XEP-0136 to be incompatible with
the storage api, right? I think I got a bit confused with that last
parragraph there.

>
> Regards,
> Matthew
>

Thanks, cheers,

--
Hugo Osvaldo Barrera

Matthew Wild

unread,
May 22, 2013, 4:55:43 AM5/22/13
to Prosody IM Developers Group
That seems like it would work, yes. It could also be a way to map to
the future archive storage API (see additional comments below).

>> The new storage API is being designed with archives of stanzas in
>> mind, but I don't know how easy it will be to map some features of
>> XEP-0136 to it (that is, I really don't know... it might end up being
>> easy). But making the implementation simple was a large motivation for
>> XEP-0313, so I won't be going out of my way to make room for advanced
>> XEP-0136 support if nothing else needs them.
>
> In any case, even if the same archives can't be shared by mod_archive
> and mod_mam, there's no reason for XEP-0136 to be incompatible with
> the storage api, right? I think I got a bit confused with that last
> parragraph there.

It's been a while since I read XEP-0136, but my point was simply that
if it requires we store extra data in stanza archives, or that we use
a more complex schema for that data, I don't see us supporting it. My
guess is that you could implement a subset of XEP-0136 that *is*
compatible though, and then yes, XEP-0313 and XEP-0136 could work on
the same data store.

Regards,
Matthew
Reply all
Reply to author
Forward
0 new messages