[Wikitech-l] page view stats redux

1 view
Skip to first unread message

Ariel T. Glenn

unread,
Sep 15, 2011, 6:38:41 AM9/15/11
to wikit...@lists.wikimedia.org
I think we finally have a complete copy from December 2007 through
August 2011 of the pageview stats scrounged from various sources, now
available on our dumps server.

See http://dumps.wikimedia.org/other/pagecounts-raw/

Ariel


_______________________________________________
Wikitech-l mailing list
Wikit...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

MZMcBride

unread,
Sep 15, 2011, 6:16:27 PM9/15/11
to Wikimedia developers
Ariel T. Glenn wrote:
> I think we finally have a complete copy from December 2007 through
> August 2011 of the pageview stats scrounged from various sources, now
> available on our dumps server.
>
> See http://dumps.wikimedia.org/other/pagecounts-raw/

This is a great step in the right direction! Thanks!

MZMcBride

Diederik van Liere

unread,
Sep 16, 2011, 11:26:32 AM9/16/11
to Wikimedia developers
This is really cool! Thanks Ariel and team for making this available.
best,
Diederik

--
<a href="http://about.me/diederik">Check out my about.me profile!</a>

Howie Fung

unread,
Sep 16, 2011, 1:14:32 PM9/16/11
to Wikimedia developers
Awesome! I'm so glad we have this. Thanks for making this happen.

Howie

Harry Burt

unread,
Sep 18, 2011, 6:55:03 AM9/18/11
to wikit...@lists.wikimedia.org
Ariel T. Glenn wrote:
> I think we finally have a complete copy from December 2007 through
> August 2011 of the pageview stats scrounged from various sources, now
> available on our dumps server.

Great news!

I do think there should be a note about the systemic under-reporting
that made statistics from the last quarter of 2009 and first half of
2010 unreliable, however.

--
Harry (User:Jarry1250)

Ariel T. Glenn

unread,
Sep 18, 2011, 7:04:51 AM9/18/11
to Wikimedia developers
Yes, and I've already been getting the information on that together so
it can be documented. :-)

Ariel

emijrp

unread,
Sep 18, 2011, 1:34:05 PM9/18/11
to Wikimedia developers
Thanks Ariel. That is important data to preserve.

2011/9/15 Ariel T. Glenn <ar...@wikimedia.org>

Sean Timm

unread,
Nov 7, 2011, 1:41:08 PM11/7/11
to wikit...@lists.wikimedia.org
Ariel T. Glenn <ariel <at> wikimedia.org> writes:

>
> I think we finally have a complete copy from December 2007 through
> August 2011 of the pageview stats scrounged from various sources, now
> available on our dumps server.
>
> See http://dumps.wikimedia.org/other/pagecounts-raw/
>
> Ariel
>


This is very cool. Thanks for the work, Ariel. I'm interested to look at the
historical data.

It appears that page view data is pushed to dumps.wikimedia.org daily. dammit.lt
used to push page view stats hourly, but it appears to be down now. Are hourly
pushes still available somewhere?

Thanks,
Sean

Ariel T. Glenn

unread,
Nov 7, 2011, 1:59:15 PM11/7/11
to Wikimedia developers
Στις 07-11-2011, ημέρα Δευ, και ώρα 18:41 +0000, ο/η Sean Timm έγραψε:
> Ariel T. Glenn <ariel <at> wikimedia.org> writes:
>
> >
> > I think we finally have a complete copy from December 2007 through
> > August 2011 of the pageview stats scrounged from various sources, now
> > available on our dumps server.
> >
> > See http://dumps.wikimedia.org/other/pagecounts-raw/
> >
> > Ariel
> >
>
>
> This is very cool. Thanks for the work, Ariel. I'm interested to look at the
> historical data.
>
> It appears that page view data is pushed to dumps.wikimedia.org daily. dammit.lt
> used to push page view stats hourly, but it appears to be down now. Are hourly
> pushes still available somewhere?
>
> Thanks,
> Sean

I had thought to do a daily update. If it turns out that hourly updates
are indeed useful, I'll set that up. I don't know of anyone else that
has a current mirror.

Ariel

Domas Mituzas

unread,
Nov 7, 2011, 2:49:11 PM11/7/11
to Wikimedia developers
Hi!

> I had thought to do a daily update. If it turns out that hourly updates
> are indeed useful, I'll set that up. I don't know of anyone else that
> has a current mirror.

Yeh, don't believe anything I say, wait for someone on mailing list to tell you the same to make conclusions.

Domas

Ikuya Yamada

unread,
Nov 9, 2011, 8:21:24 AM11/9/11
to wikit...@lists.wikimedia.org
> I had thought to do a daily update.  If it turns out that hourly updates
> are indeed useful, I'll set that up.  I don't know of anyone else that
> has a current mirror.

I had been using the hourly updated data previously provided
in dammit.lt in order to detect the real-time trending topics in
Wikipedia. It is highly accurate and it seems that the data can
be used for various use cases.

So, I'd greatly appreciate it if you set it up.

Thanks,
Ikuya

Sean Timm

unread,
Nov 9, 2011, 10:07:14 AM11/9/11
to wikit...@lists.wikimedia.org
On 11/9/2011 8:21 AM, Ikuya Yamada wrote:
>> I had thought to do a daily update. If it turns out that hourly updates
>> are indeed useful, I'll set that up. I don't know of anyone else that
>> has a current mirror.
>
> I had been using the hourly updated data previously provided
> in dammit.lt in order to detect the real-time trending topics in
> Wikipedia. It is highly accurate and it seems that the data can
> be used for various use cases.
>
> So, I'd greatly appreciate it if you set it up.
>
> Thanks,
> Ikuya

That is my use case as well.

Thanks,
Sean

Ariel T. Glenn

unread,
Nov 12, 2011, 1:22:03 PM11/12/11
to Wikimedia developers
Στις 09-11-2011, ημέρα Τετ, και ώρα 10:07 -0500, ο/η Sean Timm έγραψε:
> On 11/9/2011 8:21 AM, Ikuya Yamada wrote:
> >> I had thought to do a daily update. If it turns out that hourly updates
> >> are indeed useful, I'll set that up. I don't know of anyone else that
> >> has a current mirror.
> >
> > I had been using the hourly updated data previously provided
> > in dammit.lt in order to detect the real-time trending topics in
> > Wikipedia. It is highly accurate and it seems that the data can
> > be used for various use cases.
> >
> > So, I'd greatly appreciate it if you set it up.
> >
> > Thanks,
> > Ikuya
>
> That is my use case as well.
>
> Thanks,
> Sean

The files should now be available automatically within the hour.

Ariel

Ikuya Yamada

unread,
Nov 16, 2011, 6:14:58 AM11/16/11
to Wikimedia developers
2011/11/13 Ariel T. Glenn <ar...@wikimedia.org (mailto:ar...@wikimedia.org)>:

> Στις 09-11-2011, ημέρα Τετ, και ώρα 10:07 -0500, ο/η Sean Timm έγραψε:
> > On 11/9/2011 8:21 AM, Ikuya Yamada wrote:
> > > > I had thought to do a daily update. If it turns out that hourly updates
> > > > are indeed useful, I'll set that up. I don't know of anyone else that
> > > > has a current mirror.
> > > >
> > >
> > >
> > > I had been using the hourly updated data previously provided
> > > in dammit.lt (http://dammit.lt) in order to detect the real-time trending topics in

> > > Wikipedia. It is highly accurate and it seems that the data can
> > > be used for various use cases.
> > >
> > > So, I'd greatly appreciate it if you set it up.
> > >
> > > Thanks,
> > > Ikuya
> > >
> >
> >
> > That is my use case as well.
> >
> > Thanks,
> > Sean
> >
>
>
> The files should now be available automatically within the hour.
>
> Ariel

Thanks!
But it seems that the update of pagecounts files is stopped for the
past few hours. Is this a temporary problem?

Thanks,
Ikuya

Ariel T. Glenn

unread,
Nov 16, 2011, 1:27:52 PM11/16/11
to Wikimedia developers

> Thanks!
> But it seems that the update of pagecounts files is stopped for the
> past few hours. Is this a temporary problem?
>
> Thanks,
> Ikuya

Yes, very temporary. A mistaken side-effect of taking Domas' server out
of the loop; fixed.

Ariel

Fred Zimmerman

unread,
Nov 16, 2011, 1:58:15 PM11/16/11
to Wikimedia developers
very cool! is there a readme or project page somewhere that explains what
all these files are?

Ariel T. Glenn

unread,
Nov 16, 2011, 2:05:25 PM11/16/11
to Wikimedia developers
Yes, the index page :-P ;-)

http://dumps.wikimedia.org/other/pagecounts-raw/

Perhaps you have specific questions that aren't answered here? If so,
spill 'em and we'll try to add that information or links to it.

Ariel

Στις 16-11-2011, ημέρα Τετ, και ώρα 13:58 -0500, ο/η Fred Zimmerman
έγραψε:

Fred Zimmerman

unread,
Nov 16, 2011, 2:18:27 PM11/16/11
to Wikimedia developers
blush. I just found that page. I was spending all my time looking at the
directory of the derived products. got it!

Fred Zimmerman

unread,
Nov 16, 2011, 2:23:05 PM11/16/11
to Wikimedia developers
Has anyone already done the work of determining which pages are "fastest
movers" in a way that can be shared? comparing the hour over hour stats for
each page would require a lot of resources ...
Reply all
Reply to author
Forward
0 new messages