Fetch more unread items?

Grant Barrett

unread,

Mar 1, 2009, 5:25:11 PM3/1/09

to NewsRob User Group

I have been using NewsRob for a couple of days and find it to be
almost exactly what I need. The problem is that I have thousands of
unread items but NewsRob fetches only the 500 most recent items, both
read and unread. There seems to be no way to make NewsRob to fetch
only the older unread items. When I've gone through the 500 feed
items, there's no way at all to access the other ones.

The easiest solution would be to remove the 500-item limit. Let users
download as many items as they want.

Thank you!

Grant Barrett

Mariano Kamp

unread,

Mar 2, 2009, 2:35:42 PM3/2/09

to NewsRob User Group

Hey Grant,

> I have been using NewsRob for a couple
> of days and find it to be almost exactly
> what I need.

Great. I love to hear that.

> The problem is that I have thousands of
> unread items but NewsRob fetches only the
> 500 most recent items, both read and unread.

Yeah, I hear you. Unfortunately there doesn't seem to be an easy way
around it.

The problem is that NewsRob needs to get all articles all the time. So
when you set the capacity to 500 and the schedule to once an hour,
NewsRob will load 24 * 500 articles' metadata (not the pages and
images though). That might still be ok bandwidth-wise, but is rather
unfortunate and as more items NewsRob gets from Google Reader as more
likely it gets that it hits a server error. So for the time being this
is the maximum in NewsRob.

For most use cases this is enough and I hope that most people are fine
with a limit way below 500.

I can understand that for your use cases this doesn't work all that
well, but until I find a way to more efficiently query Google Reader
for the new items, there isn't all that much that I can do about it.
Lifting the limit to 1000 doesn't heal your problem.

I unsuccessfully asked the Google Reader team on Twitter about this
and I posted my question to a couple of sites, e.g.
http://stackoverflow.com/questions/384771/how-to-skip-known-entries-when-syncing-with-google-reader.
At least as of today I haven't been able to obtain any more info
though.

And I can't really blame Google. They said a couple of years back that
they would open up the API, but as they haven't there just isn't any
open supported API for their service.

Through reverse-engineering their protocol I found some more bits and
pieces that would help me a little bit, but NewsRob would still miss
state changes on the server, i.e. an item that you mark as read in the
Google Reader web interface would even after a sync remain unread in
NewsRob.

And it is not an option for me to make NewsRob complicated, at least
not more complicated than it is now ;-)

> There seems to be no way to make NewsRob to
> fetch only the older unread items. When I've
> gone through the 500 feed items, there's no
> way at all to access the other ones.

True. NewsRob downloads old and new items. It just downloads the
images etc. for new items though.
There is a couple of problems with a different approach. One is,
again, that I have already downloaded an item with NewsRob and it is
then marked as read on the server as read. I wouldn't see the state
change from unread to read as I just query Google Reader for the
unread items.
Also, referring to the problem from above, I would always download the
maximum number of items with every sync.

There is a little bit of a silver lining at the horizon though, but
nothing in the near future.

For one if I would think this problem really through and spend much
time in implementing a complicated sync approach it might work for
most of the cases. But as this requires lots of work and there are a
couple of features that are likely more needed by more people I would
like to postpone it until May.
Also Google Reader has an offline mode too. It uses gears and is very
straight forward, i.e. it doesn't even download images, but maybe they
use some special API calls that to deal with the problems from above.
I haven't reverse-engineered that traffic yet.
Or it might be that Google officially opens their APIs with
documentation and a proper synchronization facilities in place :-)
Don't hold your breath though.

A long story short: Can't do much about it at the moment, but I see
the problem and will invest more time to solve it in May. I hope that
you can live with that?

Cheers,
Mariano

Marcus

unread,

Mar 12, 2009, 1:24:40 PM3/12/09

to NewsRob User Group

Hi Mariano,

schluck ... Did I understand right, NewsRob downloads 500 articles on
every single sync? I was just calculating. 500 articles x 3 syncs per
hour x 24 hours per day x 30 days gives a download of about 1 million
articles per month. Even if every article has in average only 100
Byte, this will be 100,000,000 Byte per month. And I have got a
monthly full speed limit of 300 MB which I supposed I will never
reach. With this calculation, it does not seem unreachable anymore ...
(Ok, perhaps at least half of this will be via WiFi.)

I think I'll have to reduce frequency or number of articles or
both ...

> Through reverse-engineering their protocol I found some more bits and
> pieces that would help me a little bit, but NewsRob would still miss
> state changes on the server, i.e. an item that you mark as read in the
> Google Reader web interface would even after a sync remain unread in
> NewsRob.
>

I would like the following procedure:

During a day, regularly download only new articles. Suppose that all
cached articles which are not in the "unread" feed have been read via
google reader and also mark them as read in cache.

If you think that's not enough, once in 24 hours you could make a
(optional) full download, but for me, I don't think this will be
necessary since I don't want to archive articles on g1, just want to
read new stuff. If I want to look up something old (starred, for
example), I will be using desktop PC. In fact, I use starring only if
I want to review sth on desktop PC.

> And it is not an option for me to make NewsRob complicated, at least
> not more complicated than it is now ;-)

You simply should not tell anybody about when it is fetching all or
only unread items :-) Let it go automagically.

> A long story short: Can't do much about it at the moment, but I see
> the problem and will invest more time to solve it in May. I hope that
> you can live with that?

As far as I'm concerned - don't hurry!

Marcus.

Mariano Kamp

unread,

Mar 12, 2009, 2:41:09 PM3/12/09

to NewsRob User Group

Hey Marcus,

thank you for following my suggestion to take the discussion here. I
didn't expect you to do that right away ;-), but here we are ;-)

Yes, sounds about right, plus the actual content like images etc,
but those only once.

On a sidebar: A 20 minutes sync shouldn't make too much sense for
most cases, as afaik Google doesn't poll the feeds all that
frequently. So once an hour should be ok and for example my gf uses
once every four hours, because she is looking for content, not
breaking news.
The only need for frequent updates I can see is for syncing back to
Reader, because you might be switching back to the Google Reader web
interface and expect your changes from the phone to be reflected
already. For this particular case though, there is already a solution
in place: The upload-only-sync, that happens automatically 5 minutes
after your last change.

One way to shrink the MBs would be for NewsRob to use gzip when
talking to Google Reader. But as most users are on a flat rate that
doesn't seem like the right tradeoff in most cases, as gunzipping is
very CPU intensive. And performance is not an aspect that NewsRob is
all that good at right now. Furthermore CPU usage means, burning
through the battery faster and producing more heat, which will also
reduce battery life.
And it would only lessen the problem, not solve it.

Ok, now to the proposed approach, please consider this. You have a
capacity of 500 articles, 500 articles are downloaded in the database
after a "full-sync". 250 of them are read, 250 unread. Ok, so far?

Now you want to make a "partial-sync". You ask Google Reader for the
latest 500 articles that are unread.

Now the 250 articles that are unread in the local database need to
be fetched too. So with this approach you would have only cut this
problem into a half, so instead of 1.000.000 articles (taken from your
mail) it would be 500.000 articles.
And as this doesn't solve my problem, not even reduce it by a
magnitude, I would stop here. But for the sake of argument and because
I might be missing something here, now as we got 500 unread articles
from Google the other 250 articles, that we don't have in our local
database, would need to be added to our local database, right? When
doing that, we would reach a capacity of 750. The limit was 500
though. Which 250 to delete? The read ones? The user might still want
them, in particular might just have marked one (accidentally) as read
and would be annoyed if s/he can't undo his or her action later on.

The last issue might be circumvented somewhat, when decoupling the
storage capacity (let's say we increase that to 5.000) from the
download chunk size (500 articles). But this would mean that an old
article that was in the state read locally will not know when the
article was set to unread in Google Reader. At least not until the
next full sync.

Also having a full and a partial-sync is a complicated concept. Some
users told me that they don't read the release notes, nor the website
and wouldn't even want to watch a video. So everything that is non-
intuitive needs explaining, and therefore will miss many users.
Btw. This is exactly the reasons why I will release a version of
NewsRob on Sunday that has a new navigation, because the old way was
too complicated to explain. You'll see what I am talking about on
Sunday.

But continuing with the original problem: Users will be po'ed,
because they marked something old as read (or unread, or starred, or
shared, etc.) and it doesn't show up with the new state on their
phone. They will hate NewsRob for it, write flame mails to me and post
1-star ratings on the Android Market ;-(

The same problem exists, when using some other undocumented feature
of the Google Reader API, where I can specify when I last accessed
there service, but then it also swallows state changes. After
discovering that I didn't investigate this route further, but maybe
there are even other issues waiting for me.

As stated in my other reply, it comes down to this: I can try to
invest a huge amount of time in ugly, complicated workarounds, but
that is also time that could be spend on ugly performance
problems ;-)
So for the time being performance and some missing features (feeds as
first class citizens) are my priority, but I will get back to this
issue, if only to allow for capacity enlargements beyond 500 articles.
During my performance tests I was working with 5000 articles and that
was ok, or at least roughly as ok as 500 are, but until I don't get
the "intelligent sync" problem solved I can't go down that route.

I still hope that Google will add something helpful to their
protocol, but I am not holding my breath ;-) If you happen to know
somebody in the Google Reader team, please point them to this
problem ;-)

Thanks for taking the time to share your thoughts. I am sure that
using this process we will eventually come up with a solution that we
like or at least accept!

Cheers,
Mariano

Mariano Kamp

unread,

Mar 13, 2009, 10:54:51 AM3/13/09

to NewsRob User Group

Just yesterday I found out about a Google Reader feedback group.

I posted a link to our conversation there:
http://groups.google.com/group/google-reader-howdoi/browse_thread/thread/c895a942ce6f4676

Reply all

Reply to author

Forward