Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

A few notes

20 views
Skip to first unread message

Matthieu Rakotojaona

unread,
May 15, 2012, 6:29:29 PM5/15/12
to Newebe
Hello everyone,

First, thank you for Newebe. I haven't tested it yet, but I already
like its philosophy very much. I also believe that each one's data
should sit in one's computer, and share with everyone what is needed,
and that storing everyone's life in a central place is a bad design.
Just like IRL. Thus, the aproach was very relevant to me.

I have a few questions/notes on the overall implementation :

1/

When a node has some activities related to a contact, it sends it to
the contact and stores it in the database. If, at any moment, a long-
asleep contact wakes up, he fetches for all the information he as
missed.

I was thinking that it would be more efficient to directly store the
activities in the DB, and let the participators of this activity
replicate this, instead of sending it to every one. This is already
implemented anyway.

Going further, we could let CouchDB handle it directly. It has a
wonderful `_changes` API that gives you every changes in the database
since any revision id in the past, and can feed you continuously. With
that, we could store all the participators' URL in a field alongside
the activity, and let them retrieve what's related to them when they
want.

This poses 2 problems :
- I don't know if we can listen to many `_changes` feed in parallel.
Even if a user typically doesn't have 100k contacts, we can assume
that a number of 100 is somewhat reasonable
- Authentication : everyone shouldn't be able to retrieve all the
activities of a contact if he is not in the participants. There are
some

But it also has some nice points :
- It doesn't require a 99.9% uptime. This is already managed by asking
a contact for all missed information, so why not reuse it ?
- It only requires pulls, and no pushes. This eliminates the need of
routing/opening ports, and makes it far more easy to setup and
use(think about mobile use, or using a public hotspot)
- What I had in mind is a typical user who fires newebe as a software,
but more as a client than as a server (no uptime needed, no
maintenance, no port forwarding, etc) on his computer, just like he
would launch his MUA who fetches all the mails (-> activities) from
his mail server (-> contacts). This is much more resilient to crashes
too =]
- Suppose A has an activity with B and C. At the moment, A sends it to
B and C. Message is not considered as delivered until all the
recipients have received it (or so I think). In the reactive approach
I was thinking about, B pulls from A, and C can pull from B or A; if A
stops after B pulled the activity, C can still retrieve it from B.
After all, it's exactly the same activity for everyone (participators
are the same, news/picture/note is the same).

2/

I see that the data is distributed among multiple databases in
CouchDB. Is there a reason for that ? Instictively, I prefer stashing
everything in one database and appending a `doc_type` field, which is
already done anyway.

3/

I see that we use Node.js for developping client-side stuff. Is it
used only as a developping tool, not as a brick of the final software
stack ? This is what I understeed.

4/

I see in some of your views that you emit the full `doc` as a value.
The pro hint is to emit a `null` value and retrieve the doc at query
time using `&include_docs=true`. This eliminates the need to duplicate
the doc in the database _and_ in the view. The little added latency is
invisible.

---

As you may have seen, I am more directed towards CouchDB, mainly
because I like it (and I have made a few toys to play with it, too :
see https://github.com/rakoo/MultiBin or https://github.com/rakoo/dml).
In fact, I think that the aproach of Newebe is essentially what drove
the development of CouchDB, and that most of its function can be
directly provided by it.

Anyway, keep up with the good work !

--
Matthieu Rakotojaona

Gelnior

unread,
May 16, 2012, 6:32:22 AM5/16/12
to new...@googlegroups.com
Hi Matthieu,

Thank you for your email, I'm glad you like the concept behind Newebe. My answers comes next:


1/

When a node has some activities related to a contact, it sends it to
the contact and stores it in the database. If, at any moment, a long-
asleep contact wakes up, he fetches for all the information he as
missed.
You're right.

I was thinking that it would be more efficient to directly store the
activities in the DB, and let the participators of this activity
replicate this, instead of sending it to every one. This is already
implemented anyway.

That could be a good, that is almost the way picture sharing works. You send a small thumbnail of your picture to all your contacts. If someone is interested it requests for downloading the full-size image.

Going further, we could let CouchDB handle it directly. It has a
wonderful `_changes` API that gives you every changes in the database
since any revision id in the past, and can feed you continuously. With
that, we could store all the participators' URL in a field alongside
the activity, and let them retrieve what's related to them when they
want.
One year ago, I tried to use "_changes" to send data to contacts everytime a change occurs. My Python skills were too low and I didn't arrive to listen to it without crashing couchdbkit (the ODM I use to talk with CouchDB).  I finally gave up because I thought that every document type would require a specific behavior so I prefer to handle that logic in newebe server. Whatever I agree with you, there is probably something to do with "_changes".

This poses 2 problems :
- I don't know if we can listen to many `_changes` feed in parallel.
Even if a user typically doesn't have 100k contacts, we can assume
that a number of 100 is somewhat reasonable
Don't worry, actually I am not sure Newebe can handle 100 contacts properly.
- Authentication : everyone shouldn't be able to retrieve all the
activities of a contact if he is not in the participants. There are
some
That's another reason why a server between contact and user DB is required.

But it also has some nice points :
- It doesn't require a 99.9% uptime. This is already managed by asking
a contact for all missed information, so why not reuse it ?
Actually the sync implementation is naive and would need some improvements, but you're right it could be a better start than the change API.
- It only requires pulls, and no pushes. This eliminates the need of
routing/opening ports, and makes it far more easy to setup and
use(think about mobile use, or using a public hotspot)
But how do you make pulls if you don't know where is your contact ? Initiating a pull require a first push.
- What I had in mind is a typical user who fires newebe as a software,
but more as a client than as a server (no uptime needed, no
maintenance, no port forwarding, etc) on his computer, just like he
would launch his MUA who fetches all the mails (-> activities) from
his mail server (-> contacts). This is much more resilient to crashes
too =]
Doing pull requests require a good availibility of your contacts. I agree that with your system this availbilty can drop from 99% to 70% but it stills requires some uptime. Or maybe I missed something.
- Suppose A has an activity with B and C. At the moment, A sends it to
B and C. Message is not considered as delivered until all the
recipients have received it (or so I think). In the reactive approach
I was thinking about, B pulls from A, and C can pull from B or A; if A
stops after B pulled the activity, C can still retrieve it from B.
After all, it's exactly the same activity for everyone (participators
are the same, news/picture/note is the same).

Nice idea, it could be good to implement it in the sync algorithm. But this will lead to a security problem : how do you trust C that he really sends you the B activities and not fake ones ? This will probably require that you retrieve C data from another contact (D) to ensure that they are ok.

 
2/

I see that the data is distributed among multiple databases in
CouchDB. Is there a reason for that ? Instictively, I prefer stashing
everything in one database and appending a `doc_type` field, which is
already done anyway.

Normally all are in the same DB. In settings.py, the variable names are probably not well chosen. If you have enough time, could you set up a Newebe locally, look at the DB and tell us if the DB is ok ?
 
3/

I see that we use Node.js for developping client-side stuff. Is it
used only as a developping tool, not as a brick of the final software
stack ? This is what I understeed.
You're right. To develop client-side, you need a coffee-script compiler and a stylus compiler.

4/

I see in some of your views that you emit the full `doc` as a value.
The pro hint is to emit a `null` value and retrieve the doc at query
time using `&include_docs=true`. This eliminates the need to duplicate
the doc in the database _and_ in the view. The little added latency is
invisible.

Thank you for that hint, could you open an issue about it ?
https://github.com/gelnior/newebe/issues?direction=desc&sort=created&state=open
 
---

As you may have seen, I am more directed towards CouchDB, mainly
because I like it (and I have made a few toys to play with it, too :
see https://github.com/rakoo/MultiBin or https://github.com/rakoo/dml).
In fact, I think that the aproach of Newebe is essentially what drove
the development of CouchDB, and that most of its function can be
directly provided by it.

That's fine, we need expertise in every domains. If you have some CouchDB security advice, feel free to share it !

Frank

unread,
Jul 3, 2012, 7:11:48 PM7/3/12
to new...@googlegroups.com
HI all,

 I have a little problem with a CouchdDB view.
Can someone help me ?

Here is the problem description:

Requirements

I would like to add the equivalent of Diaspora aspects to Newebe.

1/ Each contact can be tagged with simple words. All contacts have tag "all" by default.
2/ When someone write a micropost or post a picture, he has to select a tag to tell which contacts will receive the micropost or the picture.
3/ Then when an user select a given tag, I would like that only posts related to this tag appears in the timeline.

Problem

1/ + 2/ are ok.
3/ When I want to retrieve 30 microposts which are tagged with a given tag ordered by date since a given date, I don't know how to express that with CouchDB.
SQL equivalent would be :

SELECT microposts.content, microposts.title, microposts.author
FROM microposts, tags
WHERE microposts.date < "23/06/2012"
AND tags.name = "friends"
AND tags.micropost_id = microposts.id
ORDER BY microposts.date

Here is my Couchdb schema:

class MicroPost(NewebeDocument):
    # herited from NewebeDocument
    authorKey = StringProperty()
    date = DateTimeProperty(required=True)
    attachments = ListProperty()
    tags = ListProperty(default=["all"])
    # micropost specific fields
    author = StringProperty()
    content = StringProperty(required=True)
    isMine = BooleanProperty(required=True, default=True) 

and my current map function for my micropost/tags view :

function(doc) {

  if("MicroPost" == doc.doc_type) {

    if(doc.tags === undefined || doc.tags === null) 
      doc.tags = ["all"]

    doc.tags.forEach(function(tag) {
      emit([tag, doc.date], doc);
    });
  }
}

Additional informations

My researches led me to think that it's not feasible directly with CouchDB and that some filtering stuff should be done on Tornado side. But I don't know CouchDB well enough to be sure of that.
Code of this feature is available in branch features/tags .


Thanks for your attention,

Frank

Matthieu Rakotojaona

unread,
Jul 4, 2012, 4:03:38 AM7/4/12
to new...@googlegroups.com
Instead of querying the posts with no arguments, you can query them
either with no key, which will return all the docs, or with specific
keys regarding what you emitted in the view [0].

So what you want to do is query along those lines :

GET /path/to/view?key=["friends", {}]
GET /path/to/view?key=["foes", {}]

GET /path/to/view?keys=[["friends", {}],["foes",{}]

[0] http://wiki.apache.org/couchdb/HTTP_view_API#Access.2BAC8-Query
--
Matthieu RAKOTOJAONA

Frank

unread,
Jul 7, 2012, 8:07:00 AM7/7/12
to new...@googlegroups.com
Thanks for that tip, it solves one of my problem : getting the last
posts for a given tag.
But how could I get 10 posts until a given date for a given tag ?

Matthieu Rakotojaona

unread,
Jul 7, 2012, 12:27:00 PM7/7/12
to new...@googlegroups.com
On Sat, Jul 7, 2012 at 2:07 PM, Frank <gel...@free.fr> wrote:
> But how could I get 10 posts until a given date for a given tag ?

You already emit the correct array. If you want to retrieve 10 posts
after said date :

GET /path/to/view?startkey=["friends",<date in correct format>]&limit=10

If you want 10 posts BEFORE the date :

GET /path/to/view?startkey=["friends",<date in correct
format>]&limit=10&descending=true

--
Matthieu RAKOTOJAONA

Frank

unread,
Jul 14, 2012, 1:20:43 PM7/14/12
to new...@googlegroups.com
Ok I finally got it. endKey was missing:

GET /path/to/view?startkey=["friends",<date in correct format>]&endKey=["friends0"]&limit=10


Thank you.

Frank
Reply all
Reply to author
Forward
0 new messages