Collecting Install Statistics

335 views
Skip to first unread message

Chad Windnagle

unread,
Jun 3, 2012, 12:42:03 AM6/3/12
to joomla-...@googlegroups.com
Hello JoomlaSphere:

I'd like to open an discussion about what the thoughts and positions the community would have on enabling some statistic collection on Joomla installs that the project can collect and tabulate.

We frequently have discussions on this and other lists about how we think or believe Joomla site users "use" Joomla. For example:
  • Hosting Environment
  • Number of Installs
  • Number of Updates
And I'm sure, many other bits of data as well. I think most commonly is the "hosting environment" situation. Wherein, the CMS and Platform developers are continually discussing what users are doing "now days". A perfect example of this is the current discussion on whether we should retain MySQL support in CMS 3.0. Other examples could be:
  • Server operating system
  • PHP version
  • Database Driver Used
And the lists goes on.

Perhaps rather than guessing and only getting feedback from developers, who probably have a much different usage and operating environment than the more-common Joomla user does, we can enable a utility that sends data back to a collection server. This of course would be optional to send, kept private, and not contain any sensitive server data to retain security.

Possible Concerns
The idea has some very real and possibly idea-killing concerns that I have identified already, and I'm sure that there are more:
  • Security
  • Privacy
  • Legal Ramifications
  • License Compliance (Not sure if this sort of functionality conflicts with the GPL)
Possible Benefits
I see the benefits of this, assuming it can be done successfully navigating the above concerns, as solving a great deal of many problems and questions we have to ask ourselves when making changes. Specifically we can actually answer questions like "what version of PHP are users most commonly installing Joomla on?" or "Do users install using MySQL or MySQLi?".

Again, it's important to note that this would be an entirely optional, opt-in data collection program that administrators must enable, understanding what they are doing.

There are some examples of software that does this already. Many operating systems and browsers (chrome, firefox) do this. I think that it could be a valuable asset to the community.

I'd appreciate some thoughts on this. Thanks all for your time.

-Chad

elin

unread,
Jun 3, 2012, 3:48:27 PM6/3/12
to joomla-...@googlegroups.com
GPL has no relationship to this except that people who redistribute Joomla have the right to remove or add to or modify or study the functionality.

I think it would be a really great idea to add this even if just to start it is the environment that the initial install occurs in. Of course users should be able to opt out

 To me database type and version, php version, and server would be the most important baseline pieces of information. 

Elin 

Mark Dexter

unread,
Jun 3, 2012, 5:00:07 PM6/3/12
to joomla-...@googlegroups.com
I agree that collecting some stats would be very helpful. If possible,
it would be great to also get a snapshot each time people do an auto
update (again, of course with an opt in). It could be very useful for
example to know what extensions are being used in a site and what
version of Joomla is being used. Mark
> --
> You received this message because you are subscribed to the Google Groups
> "Joomla! CMS Development" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/joomla-dev-cms/-/chHRTlH3d74J.
>
> To post to this group, send an email to joomla-...@googlegroups.com.
> To unsubscribe from this group, send email to
> joomla-dev-cm...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.

Radek Suski

unread,
Jun 4, 2012, 4:58:34 AM6/4/12
to joomla-...@googlegroups.com
We had something similar in one version of Sobi2. 
We asked Sobi2 users to send (from backend) as an auto-generated XML file with the environment data. 
And then we published this data at: http://www.sigsiu.net/statistics/

Basically we needed these data to find out the best requirements for SobiPro at that time.
But people keep asking us even today if we are going to repeat this action because many Joomla! users and developers are very interested in such statistics.

So I think this is very good idea. 

Regards,
Radek

brian teeman

unread,
Jun 4, 2012, 6:21:44 AM6/4/12
to joomla-...@googlegroups.com
For best practice this should be :
1. Entirely optional opt-in only
2. The data sent should be displayed to the user before it is sent and confirmation again required before it is sent
3. the explanation of how, where, when and why the data is being collected and who it will be used for MUST be in simple language and NOT legalese and of course translated into all languages
4. It must be completely anonymous
(the latter condition might effect where in Joomla the data collection can take place ie the installation is not truly translated in many languages)

Chris Davenport

unread,
Jun 4, 2012, 6:23:38 AM6/4/12
to joomla-...@googlegroups.com
I think this is basically a good idea and is something that has been suggested many times before.  I'd like to suggest that people interested in moving the idea forwards should form a working group that will

1. Define how it will work.  Questions to be answered include how the opt-in/out mechanism will work, what data will be gathered, how security and privacy issues will be addressed, what infrastructure will be needed, and so on.
2. Write the code for inclusion in Joomla.
3. Write the code that will gather the data and present the statistics.

This is not a trivial exercise but I do agree that it will be very worthwhile if you can pull it off.

Chris.


--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/FdGkcCOXHLIJ.

To post to this group, send an email to joomla-...@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cm...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/joomla-dev-cms?hl=en-GB.



--
Chris Davenport
Joomla Leadership Team - Production Working Group
Joomla Documentation Coordinator

Radek Suski

unread,
Jun 4, 2012, 6:43:23 AM6/4/12
to joomla-...@googlegroups.com
I fully agree with Brian in each point. However it would be necessary to have a unique id for each Server.

It would be also necessary to define which data we want to collect. For example available libraries which we normally not using but maybe we could (i.e Spell checkers etc)

Regards,
Radek

To post to this group, send an email to joomla-dev-cms@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cms+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/joomla-dev-cms?hl=en-GB.

Victor Drover

unread,
Jun 4, 2012, 7:43:45 AM6/4/12
to joomla-...@googlegroups.com
As Brian's post indicates, privacy is a big concern here. Tagging individual servers with IDs seems like a slippery slope unless that server is tagged temporarily and, again, leaves the user and server completely anonymous.

Honestly, this feels a bit intrusive despite my personally wanting these statistics.

Jon Neubauer

unread,
Jun 4, 2012, 7:47:47 AM6/4/12
to joomla-...@googlegroups.com
If any tracking, tagging, or identifying were done solely during installation (we may want to expand that in the future, but it sounds like that's a good start) that information could easily be removed on that final step of removing the installation directory.  Obviously personally identifiable information wouldn't be collected at all to begin with.

Radek Suski

unread,
Jun 4, 2012, 7:58:58 AM6/4/12
to joomla-...@googlegroups.com
Tagging individual servers with IDs seems like a slippery slope unless that server is tagged temporarily and, again, leaves the user and server completely anonymous.

This id was fully anonymous and cannot be back-followed.
If you have better idea how to prevent duplicating reports from the same server/sites then let us know :) 

Victor Drover

unread,
Jun 4, 2012, 8:11:14 AM6/4/12
to joomla-...@googlegroups.com
If it's anonymous, that's the most important. But rather then considering them duplicate reports, perhaps this info would be good for understanding how many times a user installs on the same server as a rough measure of failure rate?

-V

Matt Thomas

unread,
Jun 4, 2012, 8:16:17 AM6/4/12
to joomla-...@googlegroups.com
This sounds like a great idea to me as well. There's no doubt that the more information we have to base decisions on is better.

Some sort of random unique ID to prevent duplicate reports is fully understandable, especially if it can't be traced back to the originator.

I fully agree with Brian's points, especially that the data sent should be displayed to the user before it is sent.

@Chad, can you explain what you mean by "kept private"? Are you suggesting that this data not be shared with everyone, or that it be anonymous?

Best,

Matt Thomas
Founder betweenbrain
Phone: 203.632.9322
Twitter: @betweenbrain




--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/_V3kHHpVuAQJ.

To post to this group, send an email to joomla-...@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cm...@googlegroups.com.

Chad Windnagle

unread,
Jun 4, 2012, 8:50:07 AM6/4/12
to joomla-...@googlegroups.com
I totally agree with this. In my mind I was picturing displaying the exact record data to the admin with the "send" button. This way they could review the exact amount of data being sent, and if something came up they were uncomfortable with they could cancel at any time.

@Radek, thanks for your comments. Actually your implementation of this on SOBI has been my reference point for the idea so far, so I appreciate your comments as you've actually done this! 

As Brian's post indicates, privacy is a big concern here. Tagging individual servers with IDs seems like a slippery slope unless that server is tagged temporarily and, again, leaves the user and server completely anonymous.

Vic I totally agree with you--the problem is, I don't know how else to do it. I was thinking of exactly what Radek posted, which is kind of a hashed up tag of some sort. An idea that crossed my mind is perhaps the "secret" param in the CMS config file.

The reason I think that this would be a nice-to-have is because then after an install, if someone does an update we can re-check (with permission!) and update the record. This would provide more metrics like, how many people after their first install actually keep the site updated.

As far as backward-tracing, I don't think that will be possible. We won't be checking or recording IPs that will allow us to "go back" and find where that server is. 

@Chad, can you explain what you mean by "kept private"? Are you suggesting that this data not be shared with everyone, or that it be anonymous?

Rethinking what I said here, "kept private" isn't really what I meant :D I do think it's important to make this information available. Extension developers, blog-post-writers, media, etc... all deserve access to this. I think a front-end somewhere which allows people to report on the data would be important.

By kept private I was trying to say that we would ensure that the reporting servers would be kept private, anonymous, and untraceable.

-Chad

Regards,
Chad Windnagle
Fight SOPA

Beat

unread,
Jun 5, 2012, 2:49:02 AM6/5/12
to Joomla! CMS Development
Great idea. That would give us a precise picture, plus people not
participating can only blame to themselves that their config is not
taken in account in future support.

While I agree with Brian, I wouldn't make the privacy issue a big one
in this case.

The upgrades-checker would typically be a good place to give back only
the information that truly matters but that doesn't really affect
privacy of users:
1) joomla version (actually, it's implicitely given back already!)
2) PHP version
3) Database type and version
4) identification by IP address (that's already the case) since you
want to track servers evolution.

Advantage would be that there is already the upgrade checker, that it
already has a service, and that there would be a single point of
service and of decision, and that 2 out of the basic 4 infos are
already disclosed there. Plus it's a good incentive: "Want service ?
ok, then please help us improve it"

Additionally, depending on PHP and SQL versions, not only Joomla
itself, but also extensions providers would get same info, and could
serve specific upgrades or NOT depending on the versions. Thus admin
would not discover AFTER upgrade that he doesn't match new pre-
requisites, but BEFORE upgrade.....

Plus, it's super-easy to add the missing info there.

My 2 cents :)
Beat

Scoop

unread,
Jun 5, 2012, 10:30:16 AM6/5/12
to joomla-...@googlegroups.com

To maintain better anonymity, perhaps identification could be through a hash of unique variables associated with the installation…just an idea. An IP Address is not exactly anonymous.

Data collection can be a touchy subject and all of the points Brian made are good ones. 

Scott

Daniele Rosario

unread,
Jun 5, 2012, 10:33:05 AM6/5/12
to joomla-...@googlegroups.com
Totally agreed. The Ip adress should not be stored imho. That's backtrackable and could lead to information disclosure.

Daniele Rosario

--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/jotssdATGtEJ.

Chad Windnagle

unread,
Jun 5, 2012, 10:39:38 AM6/5/12
to joomla-...@googlegroups.com
Just to reiterate, my original post doesn't suggest storing an IP address. The idea is to be anonymous / nontracable. If you check Radek's post you'll see an example of a hashed index that is anonymous.

That said, I think that the hash needs to come from the "client" (the server / Joomla Instance) sending the information. This means we can update the record later. If we don't have some sort of unique (unique doesn't mean traceable) identifier, we can't look at things again later when the Joomla Instance is installing an upgrade. 

In fact, with this requirement using an IP address would be a bad idea because in the case where several Joomla instances are from the same server and IP address, there would be some data integrity issues. So most definitely, the IP address idea is out (and was never really "in").

-Chad

Regards,
Chad Windnagle
Fight SOPA


Scoop

unread,
Jun 5, 2012, 11:22:22 AM6/5/12
to joomla-...@googlegroups.com
I think it's fine -- and a good idea to use the IP address as one of the "variables" I mentioned to be hashed (it was intentionally plural). But I think you're right that alone, it's not sufficiently unique.

Scott


On Tuesday, June 5, 2012 10:39:38 AM UTC-4, Chad Windnagle wrote:
Just to reiterate, my original post doesn't suggest storing an IP address. The idea is to be anonymous / nontracable. If you check Radek's post you'll see an example of a hashed index that is anonymous.

That said, I think that the hash needs to come from the "client" (the server / Joomla Instance) sending the information. This means we can update the record later. If we don't have some sort of unique (unique doesn't mean traceable) identifier, we can't look at things again later when the Joomla Instance is installing an upgrade. 

In fact, with this requirement using an IP address would be a bad idea because in the case where several Joomla instances are from the same server and IP address, there would be some data integrity issues. So most definitely, the IP address idea is out (and was never really "in").

-Chad

Regards,
Chad Windnagle
Fight SOPA


On Tue, Jun 5, 2012 at 10:33 AM, Daniele Rosario wrote:
Totally agreed. The Ip adress should not be stored imho. That's backtrackable and could lead to information disclosure.

Daniele Rosario


On Tue, Jun 5, 2012 at 4:30 PM, Scoop wrote:

To maintain better anonymity, perhaps identification could be through a hash of unique variables associated with the installation…just an idea. An IP Address is not exactly anonymous.

Data collection can be a touchy subject and all of the points Brian made are good ones. 

Scott

--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/jotssdATGtEJ.

To post to this group, send an email to joomla-dev-cms@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cms+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/joomla-dev-cms?hl=en-GB.

--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To post to this group, send an email to joomla-dev-cms@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cms+unsubscribe@googlegroups.com.

Radek Suski

unread,
Jun 5, 2012, 11:42:07 AM6/5/12
to joomla-...@googlegroups.com
IP Address + Live Site URL -> md5 :)
I think it's unique enough and not possible to track back.

Regards,
Radek 

Matt Thomas

unread,
Jun 5, 2012, 11:45:29 AM6/5/12
to joomla-...@googlegroups.com
If someone was to obtain, or intercept that data, couldn't they decrypt it?

Best,

Matt Thomas
Founder betweenbrain
Phone: 203.632.9322
Twitter: @betweenbrain




To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/VU9oOQw_S2EJ.

To post to this group, send an email to joomla-...@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cm...@googlegroups.com.

Sam Moffatt

unread,
Jun 5, 2012, 11:46:59 AM6/5/12
to joomla-...@googlegroups.com
Decrypt the hash?

Sam Moffatt
http://pasamio.id.au
>>>>>> joomla-...@googlegroups.com.
>>>>>> To unsubscribe from this group, send email to
>>>>>> joomla-dev-cm...@googlegroups.com.
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Joomla! CMS Development" group.
>>>>> To post to this group, send an email to
>>>>> joomla-...@googlegroups.com.
>>>>> To unsubscribe from this group, send email to
>>>>> joomla-dev-cm...@googlegroups.com.
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>>>>
>>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Joomla! CMS Development" group.
>> To view this discussion on the web, visit
>> https://groups.google.com/d/msg/joomla-dev-cms/-/VU9oOQw_S2EJ.
>>
>> To post to this group, send an email to joomla-...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> joomla-dev-cm...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Joomla! CMS Development" group.

Radek Suski

unread,
Jun 5, 2012, 11:48:59 AM6/5/12
to joomla-...@googlegroups.com
As far I know, md5 hash still cannot be decoded.
It's basically the main functionality of a hash.
>>>>>> To unsubscribe from this group, send email to
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Joomla! CMS Development" group.
>>>>> To post to this group, send an email to
>>>>> To unsubscribe from this group, send email to
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>>>>
>>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Joomla! CMS Development" group.
>> To view this discussion on the web, visit
>> https://groups.google.com/d/msg/joomla-dev-cms/-/VU9oOQw_S2EJ.
>>
>> To post to this group, send an email to joomla-dev-cms@googlegroups.com.
>> To unsubscribe from this group, send email to
>> For more options, visit this group at
>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Joomla! CMS Development" group.
> To post to this group, send an email to joomla-dev-cms@googlegroups.com.
> To unsubscribe from this group, send email to

Matt Thomas

unread,
Jun 5, 2012, 11:49:17 AM6/5/12
to joomla-...@googlegroups.com
Sorry, reverse MD5 hash lookup like http://tools.benramsey.com/md5/.

Best,

Matt Thomas
Founder betweenbrain
Phone: 203.632.9322
Twitter: @betweenbrain




Daniele Rosario

unread,
Jun 5, 2012, 11:51:55 AM6/5/12
to joomla-...@googlegroups.com
Sha1 or whatever non-reversable method you know :)

Daniele Rosario

Radek Suski

unread,
Jun 5, 2012, 11:54:06 AM6/5/12
to joomla-...@googlegroups.com
Matt,

I may be wrong but this is either a database of most common used passwords and not really a de-coder.

Regards,
Radek

>>>>>> To unsubscribe from this group, send email to

>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Joomla! CMS Development" group.
>>>>> To post to this group, send an email to

>>>>> To unsubscribe from this group, send email to

>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>>>>
>>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Joomla! CMS Development" group.
>> To view this discussion on the web, visit
>> https://groups.google.com/d/msg/joomla-dev-cms/-/VU9oOQw_S2EJ.
>>
>> To post to this group, send an email to joomla-dev-cms@googlegroups.com.

>> To unsubscribe from this group, send email to

>> For more options, visit this group at
>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Joomla! CMS Development" group.
> To post to this group, send an email to joomla-dev-cms@googlegroups.com.

> To unsubscribe from this group, send email to

> For more options, visit this group at
> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.

--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To post to this group, send an email to joomla-dev-cms@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cms+unsubscribe@googlegroups.com.

Chad Windnagle

unread,
Jun 5, 2012, 11:56:11 AM6/5/12
to joomla-...@googlegroups.com
I'm pretty certain MD5 can be reversed if it's not salted.

Does anyone know what algorithm generates the "secret" value in the Joomla configuration file? I think that is pretty randomly generated (not based on IP or live site URL), and would stay with the site even if it is moved to a different server or something of that nature.

-Chad
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/3kP3GSRamBcJ.

To post to this group, send an email to joomla-...@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cm...@googlegroups.com.

Sam Moffatt

unread,
Jun 5, 2012, 11:58:41 AM6/5/12
to joomla-...@googlegroups.com
I find it unlikely that a reverse MD5 lookup is going to have a copy of:
- an IP address (dotted quad, 7 to 16 characters)
- a web site URL (lets say 15 characters, www. is four, .com is
another four so that leaves 7 characters for the domain)

That'd be a rather long password and not based on a dictionary for the
most part.

Cheers,

Sam Moffatt
http://pasamio.id.au
>>> >>>>>> joomla-...@googlegroups.com.
>>> >>>>>> To unsubscribe from this group, send email to
>>> >>>>>> joomla-dev-cm...@googlegroups.com.
>>> >>>>>> For more options, visit this group at
>>> >>>>>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> You received this message because you are subscribed to the Google
>>> >>>>> Groups "Joomla! CMS Development" group.
>>> >>>>> To post to this group, send an email to
>>> >>>>> joomla-...@googlegroups.com.
>>> >>>>> To unsubscribe from this group, send email to
>>> >>>>> joomla-dev-cm...@googlegroups.com.
>>> >>>>> For more options, visit this group at
>>> >>>>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>>> >>>>
>>> >>>>
>>> >> --
>>> >> You received this message because you are subscribed to the Google
>>> >> Groups
>>> >> "Joomla! CMS Development" group.
>>> >> To view this discussion on the web, visit
>>> >> https://groups.google.com/d/msg/joomla-dev-cms/-/VU9oOQw_S2EJ.
>>> >>
>>> >> To post to this group, send an email to
>>> >> joomla-...@googlegroups.com.
>>> >> To unsubscribe from this group, send email to
>>> >> joomla-dev-cm...@googlegroups.com.
>>> >> For more options, visit this group at
>>> >> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>>> >
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> > Groups
>>> > "Joomla! CMS Development" group.
>>> > To post to this group, send an email to
>>> > joomla-...@googlegroups.com.
>>> > To unsubscribe from this group, send email to
>>> > joomla-dev-cm...@googlegroups.com.
>>> > For more options, visit this group at
>>> > http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Joomla! CMS Development" group.
>>> To post to this group, send an email to joomla-...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> joomla-dev-cm...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Joomla! CMS Development" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/joomla-dev-cms/-/3kP3GSRamBcJ.
>
> To post to this group, send an email to joomla-...@googlegroups.com.
> To unsubscribe from this group, send email to
> joomla-dev-cm...@googlegroups.com.

Matt Thomas

unread,
Jun 5, 2012, 11:59:16 AM6/5/12
to joomla-...@googlegroups.com
I ask as using this information as the basis of data to be transmitted is likely to be of concern. I'm certainly not a security or encryption expert, but I can see some people having reservation submitting usage information if their IP address or URL is mentioned anywhere. It might just be a perspective issue, but could alter the results.

Best,

Matt Thomas
Founder betweenbrain
Phone: 203.632.9322
Twitter: @betweenbrain




Matt Thomas

unread,
Jun 5, 2012, 12:01:31 PM6/5/12
to joomla-...@googlegroups.com
Radek,

Thanks. It may very well be a dictionary look-up. I don't know if it possible, but was asking out of ignorance as I've heard that it is.

Best,

Matt Thomas
Founder betweenbrain
Phone: 203.632.9322
Twitter: @betweenbrain




To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/3kP3GSRamBcJ.

To post to this group, send an email to joomla-...@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cm...@googlegroups.com.

Daniele Rosario

unread,
Jun 5, 2012, 12:01:16 PM6/5/12
to joomla-...@googlegroups.com
I surely wouldn't like the ip address to be stored anywhere. But i think the joomla password
storing method is safe enough, so if we can reuse it, that would be fine imho

Daniele Rosario

Sam Moffatt

unread,
Jun 5, 2012, 12:02:55 PM6/5/12
to joomla-...@googlegroups.com
MD5 can be brute forced, salting basically adds more source data. So instead of MD5(password), it's MD5(salt + password).  It puts in an extra step. All that these lookups do is hash a whole bunch of stuff. The secret value might be worth using as a salt to avoid brute forcing further however if they're intercepting the request they're going to know the IP address already and can then just use various DNS lookup tools to scan IP address for relevant domains. More importantly, if someone is able to do a MITM between you and us, you've got bigger problems and a hash ain't one.

Cheers,

Chad Windnagle

unread,
Jun 5, 2012, 12:03:17 PM6/5/12
to joomla-...@googlegroups.com
Yeah I agree with Matt here. I'd prefer to not bring the IP or live site URL into the equation Unless there is a specific reason we need to use that as a UID, then we shouldn't do it. There's no reason to bring extra concern to an already sensitive topic.

-Chad

Radek Suski

unread,
Jun 5, 2012, 12:04:49 PM6/5/12
to joomla-...@googlegroups.com
Just checked it seems to be indeed possible to decrypt md5 hash in the meantime although it need a lot of resources.
The IP-Address can be tracked back every time someone is using the one-click update function in Joomla! anyway.

Even if it would be possible to decode these information, as we are certainly not going to publish the id hash anyway (assumption) no one else would be able to do this and I don't think that we want to.

Regards,
Radek 

On Tuesday, 5 June 2012 17:59:16 UTC+2, betweenbrain wrote:
...but I can see some people having reservation submitting usage information if their IP address or URL is mentioned anywhere. It might just be a perspective issue, but could alter the results.

Rouven Weßling

unread,
Jun 5, 2012, 12:04:06 PM6/5/12
to joomla-...@googlegroups.com

On 05.06.2012, at 16:39, Chad Windnagle wrote:

In fact, with this requirement using an IP address would be a bad idea because in the case where several Joomla instances are from the same server and IP address, there would be some data integrity issues. So most definitely, the IP address idea is out (and was never really "in").

That raises the question if we really wanna track installations or if we wanna track servers running Joomla. The latter will of course be error prone since you can have multiple IP adresses on a single server but I fail to see the advantage of storing information for every single installation when we're mainly interested in server specs.

Best regards
Rouven

Rouven Weßling

unread,
Jun 5, 2012, 12:05:20 PM6/5/12
to joomla-...@googlegroups.com
One argument pro storing the IP address would be that in case of shared hosters it will allow us to identify which hoster was used.

Best regards
Rouven

Sam Moffatt

unread,
Jun 5, 2012, 12:08:00 PM6/5/12
to joomla-...@googlegroups.com
The point is to work out the difference between a new site and
existing site and being able to remove stale data when underlying
specs change.

Cheers,

Sam Moffatt
http://pasamio.id.au


Chad Windnagle

unread,
Jun 5, 2012, 12:10:19 PM6/5/12
to joomla-...@googlegroups.com
Rouven you're right that raises a question on what it is exactly we want to get data on.

Personally I'd like to see data on both servers AND single installations. The benefits of getting server specs is of course obvious, but on individual installations we can see if people are keeping the installation up to date, how many extensions are installed, etc... I think unique installations are just as important as the server. 

-Chad

Regards,
Chad Windnagle
Fight SOPA


Donald Gilbert

unread,
Jun 5, 2012, 2:25:58 PM6/5/12
to joomla-...@googlegroups.com
These are all great ideas - but let's not reinvent the wheel if we don't need to. How do WordPress, Drupal or other open source CMS's handle this? I know WordPress does some pretty agressive stats gathering in the core upgrade check and plugin upgrades. Let's see how they do it and at least consider it a good reference.

PS - Don't kill me because I mentioned WP.

Aaron Wood

unread,
Jun 5, 2012, 2:30:39 PM6/5/12
to joomla-...@googlegroups.com
I think that’s a very good idea.
 
And let’s acknowledge that while being extremely limited, WP is very good at what it does. It’s no Joomla, granted, but it gives it’s intended users a very easy to use interface, which is important for their demographic. Believe it or not, people who spend time developing the Joomla CMS, can learn a lot from WP and how it satisfies it’s core users (if not much from the software itself) Winking smile 
--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/jbu8mKTk3FsJ.
wlEmoticon-winkingsmile[1].png

Chad Windnagle

unread,
Jun 5, 2012, 2:57:55 PM6/5/12
to joomla-...@googlegroups.com
Hi Donald and Aaron:

I don't disagree we can learn a thing or from other FOSS projects. Unfortunately I'm not aware of them doing this sort of thing or how they're doing it. I did just take a look at one of my very few WordPress site's and I don't see any sort of feedback checkboxes. If it's built into the updater I don't know about it (and thus I didn't opt-into it).

That said, if you guys have seen these features somewhere, please point me to where they are so I can take a look. Would definitely love to see how it's being done by other folks. As already mentioned, SOBI is/did do this in one of their projects, so we do have a little bit of experience on this thread so far to look too as well.

-Chad
wlEmoticon-winkingsmile[1].png

brian teeman

unread,
Jun 5, 2012, 3:12:13 PM6/5/12
to joomla-...@googlegroups.com
I really don't see the benefit of having the level of detail that requires the ip/server - are we really interested in tracking changes on a server or is the thing tht is most valuable a snapshot of server capabilities at a specific time.

Personally I believe its the latter - KISS and we dont have an issue with backtraces or debatable encryyption technologies

Radek Suski

unread,
Jun 5, 2012, 3:15:16 PM6/5/12
to joomla-...@googlegroups.com


On Tuesday, 5 June 2012 21:12:13 UTC+2, brian teeman wrote:
I really don't see the benefit of having the level of detail that requires the ip/server

I fully agree on that. I'm just saying we should have a unique identification id at best based on the IP and website url. 
The IP and the URL itself is not a part of the data we need. But it can be used to generate such unique id.
 

Scoop

unread,
Jun 5, 2012, 3:34:51 PM6/5/12
to joomla-...@googlegroups.com
Radek has the right idea, except that MD5 is no longer considered secure. It's not so much that it's reversible as it is vulnerable to collisions ( http://en.wikipedia.org/wiki/Md5 ). I don't know the inner workings of Joomla or the practical implications but almost application where security and/or privacy (and perhaps integrity in general) is a concern should be using SHA-2 now. It's actually mandated in the U.S. government.

While I understand the intent and benefit, I do not think transmitting ANY identity bearing information is a good idea, whether it's stored or not and whether it is identifying the host or the site owner/admin. It's tempting but it is a very bad idea, IMHO.

Regarding the use of the URL as one of the variables, I'm not sure that's very practical. When someone is installing/setting up a site, it's not uncommon for them to be accessing it via something other than the FQDN. They may not even have a domain registered. I think it is best to use something that is less likely to be variable or change. Perhaps the site name that is entered during the install? Of course, I'm assuming it would be combined with the IP, hashed, and only the hash is sent.

Scott

Radek Suski

unread,
Jun 5, 2012, 3:46:01 PM6/5/12
to joomla-...@googlegroups.com


On Tuesday, 5 June 2012 21:34:51 UTC+2, Scoop wrote:
Radek has the right idea, except that MD5 is no longer considered secure. It's not so much that it's reversible as it is vulnerable to collisions ( http://en.wikipedia.org/wiki/Md5 ). I don't know the inner workings of Joomla or the practical implications but almost application where security and/or privacy (and perhaps integrity in general) is a concern should be using SHA-2 now. It's actually mandated in the U.S. government.

We can of course try to find a better hash method.
I think however we should also try to see the other side. As a Joomla! user I'm going to send these information to Joomla!
These info has been encrypted and I know, it can be eventualy decrypted. But at the end it's just my URL and server IP. 
To decrypt these information lot of power is needed and these information are not that valuable so I woudn't hasitate.
If I consider the team where I'm sending these information as not trustworthy I wouldn't send it anyway. 
If I trus, I truly hope no one would try to decrypt it and just send it
 

Scoop

unread,
Jun 5, 2012, 3:49:51 PM6/5/12
to joomla-...@googlegroups.com
I wouldn't worry too much a/b differencing and purging stale data. That has a bit of a "I may not know who you are but I can trace your every move" feel to it, from a privacy perspective. There's always going to be noise. I've installed Joomla dozens of times using the same hosting account/url/site name, etc. when testing and playing around with things. Sometimes in a local environment, sometimes not.

Why not just benchmark the data every x number of months, consider it all stale, and restart the stats? Something like this should only be done for general trend analysis, IMHO.

Scott


On Tuesday, June 5, 2012 12:08:00 PM UTC-4, Samuel Moffatt wrote:
The point is to work out the difference between a new site and
existing site and being able to remove stale data when underlying
specs change.

Cheers,

Sam Moffatt
http://pasamio.id.au


On Tue, Jun 5, 2012 at 9:04 AM, Rouven Weßling wrote:
>
> On 05.06.2012, at 16:39, Chad Windnagle wrote:
>
> In fact, with this requirement using an IP address would be a bad idea
> because in the case where several Joomla instances are from the same server
> and IP address, there would be some data integrity issues. So most
> definitely, the IP address idea is out (and was never really "in").
>
>
> That raises the question if we really wanna track installations or if we
> wanna track servers running Joomla. The latter will of course be error prone
> since you can have multiple IP adresses on a single server but I fail to see
> the advantage of storing information for every single installation when
> we're mainly interested in server specs.
>
> Best regards
> Rouven
>
> --
> You received this message because you are subscribed to the Google Groups
> "Joomla! CMS Development" group.
> To post to this group, send an email to joomla-dev-cms@googlegroups.com.
> To unsubscribe from this group, send email to

Chad Windnagle

unread,
Jun 5, 2012, 4:10:02 PM6/5/12
to joomla-...@googlegroups.com
I believe it would be best to generate a totally unique, unrepeatable, value that is entirely unconnected to IP address, sitename, or livesite URL.

The requirement we have here is to have a unique identity for each joomla site. (assuming that's the requirement, I think we all mostly agree that we want joomla instance data, not just server data.)

To be clear, I am talking about one single value for identifying a site. This has nothing to do with the actual transmission or encryption of the data. That (to me) is a different conversation.

To meet this requirement we do not need:
  • The server IP address
  • The livesite
  • The site name
We only need:
  • A random value
  • Saved in the config file
The benefits of this random, "innocent" value are:
  • Unique
  • Not backtracable
  • Not encrypted (and thus, no need to decrypt)
  • "Private" & "secure" in that it's not sensitive data
  • Allows a record to be kept up to date
  • Security problem on data on the server would not reveal any sensitive data
The problems in using the IP / Live Site would be:
  • Could be used to locate the server / joomla instance
  • Does not allow us to say the data is anonymous
  • A security problem on the data server side could lead to this data -no matter how encrypted- to becoming a threat (no matter how small / unlikely, a threat it remains)
Conclusion: I don't believe there is a good reason to use the IP, site name, host name, live url etc... to create this UID. Its not needed to meet the requirement as stated above. There are plenty of ways to generate a unique value, things like the IP / sitename that are backtracable should be left out of the equation as much as possible to achieve the highest level of security and privacy.

-Chad

Regards,
Chad Windnagle
Fight SOPA


To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/NxwOhhVCTv8J.

To post to this group, send an email to joomla-...@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cm...@googlegroups.com.

Scoop

unread,
Jun 5, 2012, 5:26:20 PM6/5/12
to joomla-...@googlegroups.com
Chad, I just think you're overthinking it. It doesn't need to be "unconnected" to the IP, sitename, URL, or any other information that IS identifiable, as long as you're not collecting any of that information. The hash is done by and at the host, and the hash is sent as the unique id. The hash IS unique and, if based on two or more variables and a secure algorithm, is [theoretically] non-backtraceable (word?).

Try to tell me the IP and site name that generated this:  08c9d68f7b45d5aceab692a797b216ba4b48936a 

Storing any identifier would not be a good practice and there are all sorts of issues with random number generation.

Scott

Rouven Weßling

unread,
Jun 5, 2012, 5:31:19 PM6/5/12
to joomla-...@googlegroups.com
The difference is we'll have to disclose the algorithm in our source ;)

but to get back to my question, do we wanna track sites or servers? And if we track sites do we wanna track them as one even as they move to other servers? If so we can just generate a unique ID when installing Joomla and transmit it. As for how to generate a really random ID - we have a method for that in JCrypt. 

If we wanna track servers I think I'd just store the IP adress. There isn't really much harm in it and it gives us additional information (the hoster and rough geographic location).

Best regards
Rouven

Beat

unread,
Jun 5, 2012, 5:39:16 PM6/5/12
to Joomla! CMS Development
Woah, away during 12 hours, and 25 posts on a misunderstanding:

Looks like I need to reiterate this as it looks as if I got misread
between the lines:

> 4) identification by IP address (that's already the case) since you
> want to track servers evolution.

I don't think that I wrote or suggested "store raw IP address as is",
I just said the server itself can be *identified* by its IP address
(and then re-identified with the IP address).
(and we are interested by the servers PHP and SQL versions installed,
and not by the site).

I was just stating that we *already* have IP address *today* *with
each version check*. And it's hard to make an http request without
revealing the originating IP address ;-)
Btw, the Joomla server is *probably* storing it *already* in its
rolling Apache access logs for a few months probably.
Did anyone complain about that ? Should we make a problem where nobody
saw one up to now (compared to the huge benefit of the version checks
and upgrader) ?
I don't think so.

E.g. As most sites are on heavily shared servers, IP address is not
really a privacy issue imho, however transmitting e.g. site name (even
if it's not stored) could be seen as a privacy issue.
We really want to track and make stats on the server environments, and
not really on the sites themselves, right ?

Also we probably want to exclude localhosts from the stats completely
independantly of opt-in, as localhost versions are under the admin's
own control, which is not the case of hosted sites.

That said, for storing individual stats data (beyond rolling access
logs), it's a good idea to do an advanced hashing of the IP address
for the indexing, with additional salting to keep "identification"
next time you get a version check, and be able to update your stats. I
would not use site-provided info, but real IP address of the HTTP
request as hash, to avoid any kind of stats-attack on our stats or
worse, updates, server.

As Brian suggested, what's transmitted (beyond obvious HTTP protocols
basics) should be clearly stated, e.g. PHP version, SQL server type
and version, and that anonymized storage of those is kept.
Importantly, the PHP and SQL versions should be in POST and not in
URL, so that the server access log does not store IP+versions ;-)

Our stats database could have columns: id, created_datetime,
last_updated_datetime, one_way_sha_hashed_salted_IP,
latest_php_version, latest_sql_type, latest_sql_version
I think that's enough and not sensitive storage as it does not contain
any private information.

An alternate way is to just store last 3 months of accesses (supposing
most active sites have a version check all 3 months at least), and
only store:
last_updated_datetime, latest_php_version, latest_sql_type,
latest_sql_version
in database, but that would mean that sites with more backend accesses
and version checks would have more weight in the stats (which could be
fine too).

Hope the above helps understand my original post better :-)

Best Regards,
Beat

Rouven Weßling

unread,
Jun 5, 2012, 6:00:52 PM6/5/12
to joomla-...@googlegroups.com

On 05.06.2012, at 23:39, Beat wrote:

Our stats database could have columns: id, created_datetime,
last_updated_datetime, one_way_sha_hashed_salted_IP,
latest_php_version, latest_sql_type, latest_sql_version
I think that's enough and not sensitive storage as it does not contain
any private information.

I don't think that is enough data. Thinking back about past discussion I'd include the server OS version, the web server used and most importantly the averrable PHP extensions. This would give us the whole stack.

Best regards
Rouven

Chad Windnagle

unread,
Jun 5, 2012, 7:55:17 PM6/5/12
to joomla-...@googlegroups.com
  1. Accomplish the same goal
  2. Not be tied to the IP
@Rouven:

I tried to answer your question, at least to what my ideal plan would be earlier:
Rouven you're right that raises a question on what it is exactly we want to get data on.
Personally I'd like to see data on both servers AND single installations. The benefits of getting server specs is of course obvious, but on individual installations we can see if people are keeping the installation up to date, how many extensions are installed, etc... I think unique installations are just as important as the server.

To answer specifically:
but to get back to my question, do we wanna track sites or servers?

Sites, and the server data corresponding to that site.

And if we track sites do we wanna track them as one even as they move to other servers?

We would update the stored record if moved to another server on the site's next update. Meaning if on the installer I install to a localhost and send the host data, develop the site, push to a live server, and then use the Updater, the UID value would reference the existing database record, and push the new hosting environment data to the storage database. We could either chose to update the entire record, or create just a new instance of that site's hosting environment (1:M relationship with Site:Environments).

If so we can just generate a unique ID when installing Joomla and transmit it. As for how to generate a really random ID - we have a method for that in JCrypt.

Precisely.

@Beat:

I was just stating that we *already* have IP address *today* *with
each version check*. And it's hard to make an http request without
revealing the originating IP address ;-)
Btw, the Joomla server is *probably* storing it *already* in its
rolling Apache access logs for a few months probably.
Did anyone complain about that ? Should we make a problem where nobody
saw one up to now (compared to the huge benefit of the version checks
and upgrader) ?
I don't think so.

Sorry if I misread your post!

The apache / http log came up in some of the research I did on this. Apparently Microsoft got in trouble once upon a time for claiming anonymous / untraceable records but forgot about this log and someone sued them. There are a few solutions:
  1. Legalize "privacy policy" type of thing that claims that even though we don't intend to ever track this sort of data, it's "impossible" (or we aren't taking the steps) to do so, and they agree not to sue if something happens.
  2. Create a cron script that regularly clears the log out so we can claim true anonymity.
There may be other options, floor open!

E.g. As most sites are on heavily shared servers, IP address is not
really a privacy issue imho, however transmitting e.g. site name (even
if it's not stored) could be seen as a privacy issue.

This is true...perhaps the IP address thing isn't "that big" of a deal. But I think for peace of mind sake it's worth at least doing what we can to avoid the IP. There's other ways.

We really want to track and make stats on the server environments, and
not really on the sites themselves, right ?

See above what I wrote to Rouven, and also some of my previous posts. There's other valuable data out there Joomla-Site-Specific. Example:
  1. Is the Joomla-Version up to date?
  2. How many extensions does a site install over a year's time.
  3. Average number of total extensions for all Joomla site's running Joomla 2.5
And other granular data and data mining can be done.

As Brian suggested, what's transmitted (beyond obvious HTTP protocols
basics) should be clearly stated, e.g. PHP version, SQL server type
and version, and that anonymized storage of those is kept.

Yes, in one of my previous posts I suggested displaying an exact copy of all the data that would be sent. or at least making it available. Think of some other times you have seen this sort of function. EG: Adobe software. The popup (at least in my recollection) for sending this data to Adobe would say something along the lines of "Click here to see all the data we will be sending". Users can opt to view the data, exactly as we will be storing it.

I don't think that is enough data. Thinking back about past discussion I'd include the server OS version, the web server used and most importantly the averrable PHP extensions. This would give us the whole stack.

Yes, agree with Rouven here. Whole stack + Joomla Specifics is what I believe to be ideal. 

-Chad

Regards,
Chad Windnagle
Fight SOPA


--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.

elin

unread,
Jun 5, 2012, 10:44:23 PM6/5/12
to joomla-...@googlegroups.com
So .... a few thoughts about the server/site issue from someone with a hosting account with 1.5. 2.5 and until quite recently (blush) a 1.0 site all on the same server and has sometimes used php.ini to specify some versions of elements of the stack for specific instances ... we do need to remember that yes you can have for example  mysql, mysqli and postgres sites all on the same apache2 server.  Also we should keep in mind that we are only collecting data on use not on what servers potentially support. So someone could have all mysql sites but the host would have no problem with mysqli. 

Also, I think should keep in mind the scale of data we are potentially talking about. There are 1 million downloads of the CMS per month from joomlacode. Even if only half of those or a quarter of those or a tenth those results in a successful install on a live server ... that's a lot of records to talk about storing as individual records. Just the demo site would be 250,000+ records per year.  Which is to say, I think as a practical matter it's just not realistic to think that you are doing anything but pulling in data at most for a few hours and updating aggregated records.  I do think that it makes sense perhaps to store aggregated statistics for each month or week to deal with the staleness issue.

I suspect trying to have every updating joomla site for a specific critical security  push data to a central site would be somewhat of a nightmare.  If you want to check in some ongoing way it would need to be structured to have a uniform distribution of pinging over a time period instead of a poisson-y massive numbers in the first hours after a release and then trailing off to almost nothing for most of the interval between releases.

Elin

P.S. WP is not really a relevant comparison in that Automattic hosts about half of the sites and knows by definition exactly what the environment is. 

Chad Windnagle

unread,
Jun 5, 2012, 10:55:51 PM6/5/12
to joomla-...@googlegroups.com
So .... a few thoughts about the server/site issue from someone with a hosting account with 1.5. 2.5 and until quite recently (blush) a 1.0 site all on the same server and has sometimes used php.ini to specify some versions of elements of the stack for specific instances ... we do need to remember that yes you can have for example  mysql, mysqli and postgres sites all on the same apache2 server.  Also we should keep in mind that we are only collecting data on use not on what servers potentially support. So someone could have all mysql sites but the host would have no problem with mysqli. 

This is precisely why it would be a good idea to track both what the server is capable of, and what the Joomla instance is using. As you say, someone may have installed using mysql, but their server may support mysqli. I think we would want to know both parts of information. (So basically think of it like pulling in parts of their phpInfo() record, and parts of their config file).

Also, I think should keep in mind the scale of data we are potentially talking about. There are 1 million downloads of the CMS per month from joomlacode. Even if only half of those or a quarter of those or a tenth those results in a successful install on a live server ... that's a lot of records to talk about storing as individual records. Just the demo site would be 250,000+ records per year.  Which is to say, I think as a practical matter it's just not realistic to think that you are doing anything but pulling in data at most for a few hours and updating aggregated records.  I do think that it makes sense perhaps to store aggregated statistics for each month or week to deal with the staleness issue.

I did put a bit of thought into this. I know that it's possible that this could become a lot of data. All I can think of is potentially having records for roughly 3% of the internet :P 

As far as 'staleness' goes, this is a new term for me but I see it's been mentioned a few times here already. Can someone explain what the issue is and what the monthly / weekly timeframes does to solve it?

I suspect trying to have every updating joomla site for a specific critical security  push data to a central site would be somewhat of a nightmare.  If you want to check in some ongoing way it would need to be structured to have a uniform distribution of pinging over a time period instead of a poisson-y massive numbers in the first hours after a release and then trailing off to almost nothing for most of the interval between releases.

Good point here. I'm not sure what else we can do about it though. I think this may be just part of the problem. We have the capability to provide those updates during these nightmarish security release pushes, so I can only assume that somehow there is a capability to also track it. 

-Chad

Regards,
Chad Windnagle
Fight SOPA


--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/LTaPTEzN10UJ.

elin

unread,
Jun 5, 2012, 11:14:55 PM6/5/12
to joomla-...@googlegroups.com
What I mean by staleness is that the data get out of date. Say we only had data collected on installation.  Data from the past month would tell us much more about where we are now with users doing new installs AND updates than data about installations from a year ago .
For example, servers change, and sites are taken offline. You would not want to include data from a year ago in thinking about where we are right now ... or at least you would want to think differently about data from a year ago. But you would want data from a year ago to help understand trends and maybe if you are a data obsessed person like me you start to think about how you weight newer data more heavily than older data and doing spatial autocorrelation and hierarchical linear models because it would be so cool to do, but in practical reality that would be insane and really ... if we had a margin of error of +/- 5% on PHP version in the last month I think that is good enough for developers to know what they are working with. 

Elin

To post to this group, send an email to joomla-dev-cms@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cms+unsubscribe@googlegroups.com.

Matt Lipscomb

unread,
Jun 6, 2012, 1:28:34 AM6/6/12
to Joomla! CMS Development
I like the idea of being able to opt-in to statistic gathering for
server setup, version, etc. I do not like the idea of IP or URL
inclusion in collected data. It should be strictly anonymous and
disabled by default.

On a side note, this could cause issues within the JED as we are
strongly against any type of call home functionality and some
developers would try this as a use-case for "the Joomla project does
it, so I can too" and that throws a lot of extra time required into
reviewing each of these cases.

~Matt Lipscomb
> > server is *capable *of, and what the Joomla instance is using. As you
> > say, someone may have installed using mysql, but their server may support
> > mysqli. I think we would want to know both parts of information. (So
> > basically think of it like pulling in parts of their phpInfo() record, and
> > parts of their config file).
>
> > Also, I think should keep in mind the scale of data we are potentially
> >> talking about. There are 1 million downloads of the CMS per month from
> >> joomlacode. Even if only half of those or a quarter of those or a tenth
> >> those results in a successful install on a live server ... that's a lot of
> >> records to talk about storing as individual records. Just the demo site
> >> would be 250,000+ records per year.  Which is to say, I think as a
> >> practical matter it's just not realistic to think that you are doing
> >> anything but pulling in data at most for a few hours and
> >> updating aggregated records.  I do think that it makes sense perhaps to
> >> store aggregated statistics for each month or week to deal with the
> >> staleness issue.
>
> > I did put a bit of thought into this. I know that it's possible that this
> > could become *a lot* of data. All I can think of is potentially having
> > records for roughly 3% of the internet :P
>
> > As far as 'staleness' goes, this is a new term for me but I see it's been
> > mentioned a few times here already. Can someone explain what the issue is
> > and what the monthly / weekly timeframes does to solve it?
>
> > I suspect trying to have every updating joomla site for a specific
> >> critical security  push data to a central site would be somewhat of a
> >> nightmare.  If you want to check in some ongoing way it would need to be
> >> structured to have a uniform distribution of pinging over a time period
> >> instead of a poisson-y massive numbers in the first hours after a release
> >> and then trailing off to almost nothing for most of the interval between
> >> releases.
>
> > Good point here. I'm not sure what else we can do about it though. I think
> > this may be just part of the problem. We have the capability to *provide* those
> > updates during these nightmarish security release pushes, so I can only
> > assume that somehow there is a capability to also track it.
>
> > -Chad
>
> > Regards,
> > Chad Windnagle
> > Fight SOPA <https://www.google.com/landing/takeaction/>
> >>>> collisions (http://en.wikipedia.org/wiki/**Md5<http://en.wikipedia.org/wiki/Md5> ).
> >>>> I don't know the inner workings of Joomla or the practical implications but
> >>>> almost application where security and/or privacy (and perhaps integrity in
> >>>> general) is a concern should be using SHA-2 now. It's actually mandated in
> >>>> the U.S. government.
>
> >>> We can of course try to find a better hash method.
> >>> I think however we should also try to see the other side. As a Joomla!
> >>> user I'm going to send these information to Joomla!
> >>> These info has been encrypted and I know, it can be eventualy decrypted.
> >>> But at the end it's just my URL and server IP.
> >>> To decrypt these information lot of power is needed and these
> >>> information are not that valuable so I woudn't hasitate.
> >>> If I consider the team where I'm sending these information as not
> >>> trustworthy I wouldn't send it anyway.
> >>> If I trus, I truly hope no one would try to decrypt it and just send it
>
> >>  --
> >> You received this message because you are subscribed to the Google Groups
> >> "Joomla! CMS Development" group.
> >> To view this discussion on the web, visit
> >>https://groups.google.com/d/msg/joomla-dev-cms/-/LTaPTEzN10UJ.
>
> >> To post to this group, send an email to joomla-...@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> joomla-dev-cm...@googlegroups.com.

Radek Suski

unread,
Jun 6, 2012, 1:55:48 AM6/6/12
to joomla-...@googlegroups.com


On Wednesday, 6 June 2012 07:28:34 UTC+2, Matt Lipscomb wrote:
I like the idea of being able to opt-in to statistic gathering for
server setup, version, etc.  I do not like the idea of IP or URL
inclusion in collected data.  It should be strictly anonymous and
disabled by default.


But no one is talking about collecting the IP or URL at all ;) 

Chad Windnagle

unread,
Jun 6, 2012, 3:31:09 AM6/6/12
to joomla-...@googlegroups.com, Joomla! CMS Development
Matt,

Isn't the fact that developers have the option to include an update server XML file already a violation of said policy?

Further more, why does the policy allow for hundreds of extensions to connect and share data with google, Facebook, twitter, Flickr, and um-teen number of other services, but the second it is a connection to the developer him/her self, it is a "call home" problem?

Perhaps the JED should reexamine this policy for the this and the reasons stated above.

-Chad

Regards,
Chad Windnagle
s-go Consulting, LLC
http://www.s-go.net
Office: 607-330-2574 x 103
Mobile: 607-229-6260

Andreas Tasch

unread,
Jun 6, 2012, 7:42:06 AM6/6/12
to joomla-...@googlegroups.com
Hi,

a very good idea and good points already made.

My 2ct on this:
1) IP & Livesite hash
I think this would not be unique enough because think of local dev environments using e.g. NAT
a) local dev machine IP's of e.g. 192.168.1.1 and livesite of http://localhost/
b) even if only public IP's will be stored there is the possibility that there are more dev machines with different setups behind NAT/router
So, beside other issues I think using IP+livesite is not ideal as base for a unique hash.

2) unique id generated on installation and stored in configuration
This is imho a good idea but the id should be generated on each stats submit (the id should be the same if host does not change). (What if I copy the whole dir + db to another host)

3) how
a) additional (last) step on install process (agree to send stats, like mozilla does)
b) based on a plugin which (may) be installed on older versions too?
c) manual execution if you moved hosts

Greets

Am Sonntag, 3. Juni 2012 06:42:03 UTC+2 schrieb Chad Windnagle:
Hello JoomlaSphere:

I'd like to open an discussion about what the thoughts and positions the community would have on enabling some statistic collection on Joomla installs that the project can collect and tabulate.

We frequently have discussions on this and other lists about how we think or believe Joomla site users "use" Joomla. For example:
  • Hosting Environment
  • Number of Installs
  • Number of Updates
And I'm sure, many other bits of data as well. I think most commonly is the "hosting environment" situation. Wherein, the CMS and Platform developers are continually discussing what users are doing "now days". A perfect example of this is the current discussion on whether we should retain MySQL support in CMS 3.0. Other examples could be:
  • Server operating system
  • PHP version
  • Database Driver Used
And the lists goes on.

Perhaps rather than guessing and only getting feedback from developers, who probably have a much different usage and operating environment than the more-common Joomla user does, we can enable a utility that sends data back to a collection server. This of course would be optional to send, kept private, and not contain any sensitive server data to retain security.

Possible Concerns
The idea has some very real and possibly idea-killing concerns that I have identified already, and I'm sure that there are more:
  • Security
  • Privacy
  • Legal Ramifications
  • License Compliance (Not sure if this sort of functionality conflicts with the GPL)
Possible Benefits
I see the benefits of this, assuming it can be done successfully navigating the above concerns, as solving a great deal of many problems and questions we have to ask ourselves when making changes. Specifically we can actually answer questions like "what version of PHP are users most commonly installing Joomla on?" or "Do users install using MySQL or MySQLi?".

Again, it's important to note that this would be an entirely optional, opt-in data collection program that administrators must enable, understanding what they are doing.

There are some examples of software that does this already. Many operating systems and browsers (chrome, firefox) do this. I think that it could be a valuable asset to the community.

I'd appreciate some thoughts on this. Thanks all for your time.

-Chad

Am Sonntag, 3. Juni 2012 06:42:03 UTC+2 schrieb Chad Windnagle:
Hello JoomlaSphere:

I'd like to open an discussion about what the thoughts and positions the community would have on enabling some statistic collection on Joomla installs that the project can collect and tabulate.

We frequently have discussions on this and other lists about how we think or believe Joomla site users "use" Joomla. For example:
  • Hosting Environment
  • Number of Installs
  • Number of Updates
And I'm sure, many other bits of data as well. I think most commonly is the "hosting environment" situation. Wherein, the CMS and Platform developers are continually discussing what users are doing "now days". A perfect example of this is the current discussion on whether we should retain MySQL support in CMS 3.0. Other examples could be:
  • Server operating system
  • PHP version
  • Database Driver Used
And the lists goes on.

Perhaps rather than guessing and only getting feedback from developers, who probably have a much different usage and operating environment than the more-common Joomla user does, we can enable a utility that sends data back to a collection server. This of course would be optional to send, kept private, and not contain any sensitive server data to retain security.

Possible Concerns
The idea has some very real and possibly idea-killing concerns that I have identified already, and I'm sure that there are more:
  • Security
  • Privacy
  • Legal Ramifications
  • License Compliance (Not sure if this sort of functionality conflicts with the GPL)
Possible Benefits
I see the benefits of this, assuming it can be done successfully navigating the above concerns, as solving a great deal of many problems and questions we have to ask ourselves when making changes. Specifically we can actually answer questions like "what version of PHP are users most commonly installing Joomla on?" or "Do users install using MySQL or MySQLi?".

Again, it's important to note that this would be an entirely optional, opt-in data collection program that administrators must enable, understanding what they are doing.

There are some examples of software that does this already. Many operating systems and browsers (chrome, firefox) do this. I think that it could be a valuable asset to the community.

I'd appreciate some thoughts on this. Thanks all for your time.

-Chad

Matt Lipscomb

unread,
Jun 7, 2012, 2:14:05 AM6/7/12
to Joomla! CMS Development
Chad,
Not at all - the JED TOS allows for the extension to call home for
version checks which is exactly what the core updater does if there is
an update server specified in the manifest.

As to third party services sharing data, these professional
organizations have very defined privacy policies and it's a user's
choice as to whether they agree to them. For example, on the JED
itself we use Twitter, Facebook and Google+ share buttons, but if a
user isn't signed up at those, they can't do anything with them. If
they are, then they have already agreed to the privacy policies of
those respective companies.

There are currently no plans to update a policy that has protected the
community for a long time. Since it has been in place it has
prevented developers from farming data from sites such as usernames
and emails as well as protected against domain usage limitations,
encryption, license checks for usage and much more. There are many
cases that this rule has been used in and it works.

You are misconstruing what a call home is defined as - it does not
ecompass API keys to external services except in the event that users
have not agreed to data sharing with those services. For example an
extension in the JED that used an API couldn't grab access to your
Outlook email addresses and send them all a chain letter without your
express permission and a clear privacy policy.

Regardless - my primary point is that we shouldn't open a door to
allow more call-homes without clearly defining what the means for
other areas of the project.

~Matt

On Jun 6, 2:31 am, Chad Windnagle <drmmr...@gmail.com> wrote:
> Matt,
>
> Isn't the fact that developers have the option to include an update server XML file already a violation of said policy?
>
> Further more, why does the policy allow for hundreds of extensions to connect and share data with google, Facebook, twitter, Flickr, and um-teen number of other services, but the second it is a connection to the developer him/her self, it is a "call home" problem?
>
> Perhaps the JED should reexamine this policy for the this and the reasons stated above.
>
> -Chad
>
> Regards,
> Chad Windnagle
> s-go Consulting, LLChttp://www.s-go.net
> Office: 607-330-2574 x 103
> Mobile: 607-229-6260
>
> ...
>
> read more »

Chad Windnagle

unread,
Jun 7, 2012, 7:09:32 AM6/7/12
to joomla-...@googlegroups.com
Chad,
Not at all - the JED TOS allows for the extension to call home for
version checks which is exactly what the core updater does if there is
an update server specified in the manifest.

I understand this is how it works right now. What I'm saying is, the JED already allows for some call home functionality. Can we get The Joomle Project it's self added to the list?

As to third party services sharing data, these professional
organizations have very defined privacy policies and it's a user's
choice as to whether they agree to them.  For example, on the JED
itself we use Twitter, Facebook and Google+ share buttons, but if a
user isn't signed up at those, they can't do anything with them.  If
they are, then they have already agreed to the privacy policies of
those respective companies.

The way I see it, Joomla is:
  • A professional organization (most of the time :P)
  • A privacy policy has been suggested
  • Agreeing to it could be a stipulation of sending proposed data
I don't see how we would be any different from another professional service, except that the data we care about isn't social, it's technical.

There are currently no plans to update a policy that has protected the
community for a long time.  Since it has been in place it has
prevented developers from farming data from sites such as usernames
and emails as well as protected against domain usage limitations,
encryption, license checks for usage and much more.  There are many
cases that this rule has been used in and it works.
 

You are misconstruing what a call home is defined as - it does not
ecompass API keys to external services except in the event that users
have not agreed to data sharing with those services.  For example an
extension in the JED that used an API couldn't grab access to your
Outlook email addresses and send them all a chain letter without your
express permission and a clear privacy policy.

I apologize for not being clear. I wasn't say don't protect the community. (In fact...I didn't say that..hah). But to me it sounded like there were parts of the policy that weren't clear, especially from a developer perspective. EG:
  • We don't allow calls to home
  • Unless you have an update script
  • And unless you're facebook
  • and unless you're google
  • and unless you're twitter
    • and these organizations only get away with it because they have a privacy policy and are "professional"
I was simply trying to say, maybe refining the policy to be clearer as to what is and isn't allowed (and maybe it is..I haven't read this piece recently), would then provide the "access point" the Joomla project needs to use so that we can get the above proposal okayed. 

Regardless - my primary point is that we shouldn't open a door to
allow more call-homes without clearly defining what the means for
other areas of the project.

I agree with you, it would be rather tricky for us to say J! can call home but have a JED policy saying it's not allowed. What is the solution here then? I think it would be a really sad thing to lose this proposal due to a JED policy, that's all I'm saying. 

-Chad

Regards,
Chad Windnagle
Fight SOPA


Donald Gilbert

unread,
Jun 7, 2012, 8:30:56 AM6/7/12
to joomla-...@googlegroups.com
WP is a perfectly relevant comparison. That Automattic hosts about half of the sites (which you just made up) would have no bearing at all on install stats from WP.org. Those are all self-hosted (ie - not hosted by Automattic) install stats. WordPress core sends environment data back to wordpress.org every time it does a core update check. All of the data is completely anonymous and provides devs with a good view of what the WP server ecosystem looks like. That is EXACTLY what this whole discussion is about.

As I said - a perfectly relevant comparison.

Chad Windnagle

unread,
Jun 7, 2012, 8:52:57 AM6/7/12
to joomla-...@googlegroups.com
Hi Donald:

Could you provide a link to that data? I'd like to see it.

-Chad

Regards,
Chad Windnagle
Fight SOPA


--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/HjDoZj4lEDQJ.

Daniele Rosario

unread,
Jun 7, 2012, 8:56:12 AM6/7/12
to joomla-...@googlegroups.com
I know of this:

Chad Windnagle

unread,
Jun 7, 2012, 9:01:01 AM6/7/12
to joomla-...@googlegroups.com
Okay so I'm really confused by that link and what your previous email just said.

Stuff on the link said:
This is a collection of stats from around WordPress.com that we’ve decided to share with the world because there’s no good reason not to. Interested in your own stats? Every WordPress.com blog includes an integrated stats system, also available for self-hosted WordPress sites with Jetpack.
 
The following stats are for WordPress.com only, not including all of the activity on self-hosted blogs. (Yet!) 

So this is Wordpress.com - not WP.org?

Either way, those are cool stats, and this is exactly what I'm talking about. The only difference would be that I do want it for self-hosted J! sites. 

-Chad

Regards,
Chad Windnagle
Fight SOPA


Daniele Rosario

unread,
Jun 7, 2012, 9:03:03 AM6/7/12
to joomla-...@googlegroups.com
I'm not a wp expet, i just found that link on my link collection ;)

From what i get, that's the stats of the wp hosted version of wordpress.com and they'll be integrating those of wp self hosted websites soon.

And  yes, that's the kind of stats we need, plus details of the environment to be able to make data-backed decisions on php version support, mysql support, etc

Daniele Rosario

Chad Windnagle

unread,
Jun 7, 2012, 9:28:33 AM6/7/12
to joomla-...@googlegroups.com
Oh I'm sorry, Daniele, I confused you and Donald as the same person! My apologies. So there may be other WP.org data we just don't know where it is. Okay. great.

-Chad

Regards,
Chad Windnagle
Fight SOPA


Donald Gilbert

unread,
Jun 7, 2012, 10:18:16 AM6/7/12
to joomla-...@googlegroups.com
I was trying to locate the article I read about the server environment stats - couldn't find it.

Here's the WP.org stats though - http://wordpress.org/about/stats/

Daniele Rosario

unread,
Jun 7, 2012, 10:25:05 AM6/7/12
to joomla-...@googlegroups.com
Those are exactly the kind of stats we need!

P.s. omg, 76% of php 5.2. That's AWFUL!

Daniele Rosario


On Thu, Jun 7, 2012 at 4:18 PM, Donald Gilbert <dilber...@gmail.com> wrote:
I was trying to locate the article I read about the server environment stats - couldn't find it.

Here's the WP.org stats though - http://wordpress.org/about/stats/

--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/FuSsAV8NQnIJ.

Donald Gilbert

unread,
Jun 7, 2012, 10:26:33 AM6/7/12
to joomla-...@googlegroups.com
Looks like in the core update check, this is the data that gets sent:

  • Your IP
  • Blog URL
  • WordPress version
  • PHP version
  • Locale setting if there is one
  • Plugin title, description, author – including all URL’s that form part of this.
  • Full list of all plugins on your site, whether they are active or not.
While not as "anonymous" as I thought (i didn't realize it sent blog url) it's still a starting point.

A good idea for unique identifiers could be a salted and stretched hash of the IP address and URL of the Joomla site. It would still be completely unique - since "127.0.0.1+www.example.com" carries quote a bit of entropy by itself.



On Thursday, June 7, 2012 9:18:16 AM UTC-5, Donald Gilbert wrote:
I was trying to locate the article I read about the server environment stats - couldn't find it.

Here's the WP.org stats though - http://wordpress.org/about/stats/

Chad Windnagle

unread,
Jun 7, 2012, 10:31:35 AM6/7/12
to joomla-...@googlegroups.com
Awesome Donald! That is a great example of what I want to see. 

So as far as I can tell, in my (limited) wordpress experience, they do that data collection without asking people to agree to it (at least explicitly). I think we all agree we don't wish to go that route. But in terms of the data they collect and how they display it, that is really close to what I was thinking.

At this point in time, I haven't heard (many) people totally against this idea. There are some questions of questions of implementation, policy, and technical requirements, but all in all, I think those can be overcome.

Could those who would be interested in creating a work group of some sort please post back here? I'm willing to help anywhere I can with this. But I think we need a few key talents:
  • PHP programmer (duh)
  • Database guru
  • CLT person (set up servers and handle a site..could go on developer.joomla.org)
  • OSM someone (legal / privacy policy)
I'm willing to work up some sort of 2 basic components that handle sending / receiving data to start things off, but I would love some help and assistance with it all. 

-Chad

Regards,
Chad Windnagle
Fight SOPA


On Thu, Jun 7, 2012 at 10:18 AM, Donald Gilbert <dilber...@gmail.com> wrote:
I was trying to locate the article I read about the server environment stats - couldn't find it.

Here's the WP.org stats though - http://wordpress.org/about/stats/

--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/FuSsAV8NQnIJ.

Daniele Rosario

unread,
Jun 7, 2012, 10:37:01 AM6/7/12
to joomla-...@googlegroups.com
If you need help there i can get to help, but i cannot say how much time i can commit right now. But if i can help, count me in!

Daniele Rosario

brian teeman

unread,
Jun 7, 2012, 11:06:39 AM6/7/12
to joomla-...@googlegroups.com
Maybe this is a crazy thought - but as the main things we want to collect are php and mysql versions and wordpress.org sites are going to be typically hosted on the same type of accounts do we need to bother to collect our own stats?


On Thursday, 7 June 2012 15:37:01 UTC+1, Daniele Rosario wrote:
If you need help there i can get to help, but i cannot say how much time i can commit right now. But if i can help, count me in!

Daniele Rosario


On Thu, Jun 7, 2012 at 4:31 PM, Chad Windnagle <drmm...@gmail.com> wrote:
Awesome Donald! That is a great example of what I want to see. 

So as far as I can tell, in my (limited) wordpress experience, they do that data collection without asking people to agree to it (at least explicitly). I think we all agree we don't wish to go that route. But in terms of the data they collect and how they display it, that is really close to what I was thinking.

At this point in time, I haven't heard (many) people totally against this idea. There are some questions of questions of implementation, policy, and technical requirements, but all in all, I think those can be overcome.

Could those who would be interested in creating a work group of some sort please post back here? I'm willing to help anywhere I can with this. But I think we need a few key talents:
  • PHP programmer (duh)
  • Database guru
  • CLT person (set up servers and handle a site..could go on developer.joomla.org)
  • OSM someone (legal / privacy policy)
I'm willing to work up some sort of 2 basic components that handle sending / receiving data to start things off, but I would love some help and assistance with it all. 

-Chad

Regards,
Chad Windnagle
Fight SOPA


On Thu, Jun 7, 2012 at 10:18 AM, Donald Gilbert <dilber...@gmail.com> wrote:
I was trying to locate the article I read about the server environment stats - couldn't find it.

Here's the WP.org stats though - http://wordpress.org/about/stats/

--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/FuSsAV8NQnIJ.

To post to this group, send an email to joomla-dev-cms@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cms+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/joomla-dev-cms?hl=en-GB.

--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To post to this group, send an email to joomla-dev-cms@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cms+unsubscribe@googlegroups.com.

Daniele Rosario

unread,
Jun 7, 2012, 11:11:06 AM6/7/12
to joomla-...@googlegroups.com
Well, joomla versions would be great to have, to check how many people do NOT update, 
which are the most common extensions installed, etc.

Daniele Rosario


To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/KsCkLxd75MIJ.

To post to this group, send an email to joomla-...@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cm...@googlegroups.com.

Sander

unread,
Jun 7, 2012, 11:19:55 AM6/7/12
to joomla-...@googlegroups.com
Stats will be very helpful in making decisions for the Joomla project, developers, users etc... so great idea and would love to see this happen! Thanks for starting this discussion Chad.

Chad Windnagle

unread,
Jun 7, 2012, 12:24:57 PM6/7/12
to joomla-...@googlegroups.com
@Brian:

Check out this thing that Drupal has done:


and:


I had to read this to understand what the numbers meant:

Think of what I (and others) are proposing as collecting both Joomla-Site info, and Hosting Environment info. That's not something we can rely on WordPress's information to provide (but it is useful, if only it had stuff on mysql / mysqli for the current discussion on DB engines)

-Chad

Regards,
Chad Windnagle
Fight SOPA


On Thu, Jun 7, 2012 at 11:19 AM, Sander <sander...@gmail.com> wrote:
Stats will be very helpful in making decisions for the Joomla project, developers, users etc... so great idea and would love to see this happen! Thanks for starting this discussion Chad.

--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/S4LxHfIuVOsJ.

Steve

unread,
Jun 7, 2012, 12:44:35 PM6/7/12
to joomla-...@googlegroups.com
The Drupal and WordPress examples are really valuable for this in what they collect.

Its very useful info and very scary info:

http://drupal.org/project/usage/drupal Around 90% of Drupal 7 sites are out-of-date.

http://wordpress.org/about/stats/ Over 45% of WordPress sites are out-of-date. Only 22% of WordPress users are running PHP 5.3.

Mark Dexter

unread,
Jun 7, 2012, 12:47:54 PM6/7/12
to joomla-...@googlegroups.com
It looks like Drupal's Update Status module does a weekly "phone
home". Is that correct? So it's not just tied to when a site it
updated? Mark

Paul D. Bain

unread,
Jun 7, 2012, 1:21:43 PM6/7/12
to joomla-...@googlegroups.com, Chad Windnagle
On 6/7/2012 12:24 PM, Chad Windnagle wrote:
> @Brian:
>
> Check out this thing that Drupal has done:
>
> http://drupal.org/project/usage/drupal
>
> and:
>
> http://drupal.org/project/usage/572834
>
> I had to read this to understand what the numbers meant:
> http://drupal.org/node/329620
>
> Think of what I (and others) are proposing as collecting both
> Joomla-Site info, and Hosting Environment info. That's not something we
> can rely on WordPress's information to provide (but it is useful, if
> only it had stuff on mysql / mysqli for the current discussion on DB
> engines)
>
> -Chad

Chad Windnagle,

Years ago, I followed developments relating to the CMS Alfresco, which
is also open source software (OSS). At that time, Alfresco also had a
"telephone home" feature, which Matt Asay (Alfresco's chief of sales at
the time) conceded and discussed. Asay insisted that the extent of
information that was gathered was limited, but it is possible that this
feature may have become more intrusive since that time (about 2006-2007,
IIRC).

Sincerely,
Paul Bain


> Regards,
> Chad Windnagle
> Fight SOPA <https://www.google.com/landing/takeaction/>

Sam Moffatt

unread,
Jun 9, 2012, 2:29:20 PM6/9/12
to joomla-...@googlegroups.com
I would also read that looking at those graphs. I also find it
interesting there is still a 5.0-rc1 site out there still. So many
security vulnerabilities...

The flip side of doing weekly phone homes is that you just ignore what
ever drops off over time.

Cheers,

Sam Moffatt
http://pasamio.id.au

Joseph LeBlanc

unread,
Jun 9, 2012, 4:07:32 PM6/9/12
to joomla-...@googlegroups.com
The WordPress community stops just short of telling users to seek out hosts with old versions of PHP. Their attitude is basically "WordPress should work on 99.9% of all PHP-based hosts out there." Of course, since PHP 4 is no longer maintained, they were essentially forced to move to 5.2 as the minimum last year.

Server statistics for Joomla might be similar, or they might be different. But of course we won't know for sure until we attempt to collect them.

-Joe

To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/KsCkLxd75MIJ.
To post to this group, send an email to joomla-...@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cm...@googlegroups.com.

Chad Windnagle

unread,
Jun 9, 2012, 4:28:17 PM6/9/12
to joomla-...@googlegroups.com
Hey Everyone:

Well I didn't want this to become a topic we all talk about but then not do anything about, so I have spent some time this morning and afternoon working out some code. 

IT IS VERY EARLY

And I by no means think this code is great, perfect, etc... I simply wanted to do some proof of concept type of things, see what I was capable of, and see if by kicking things off some other people may want to get involved. (read: don't take the code you see too seriously, I was just having fun)

Here is a branch that I have pushed some code up too:

I have started out on the installation application because it seemed like a nice place to start with a proof of concept. If you look at the 'preinstall' (the second step in the install wizard) you will find another button at the top marked "report usage". This button is linked to the JSON controller with a cURL library and a sample data post.

Beyond proof of concept there are some things I know I need to do already:
  • convert the cURL submit to use the Joomla cURL class (personally I think the library I'm using is way easier to use than the J! Platform class. If anyone has any good examples of using the platform class, I'd love to see them)
  • Obviously where this code is right now would only be set up to provide a data post of the hosting environment. I've said numerous times that I want to capture a lot more data than just that, but I figured this would be a solid starting place. Future plans would be to add this button in the Global Configuration somewhere (server tab, perhaps?), and set up a model that returns some data from the database (such as number of j! extensions installed and some other things)
The code is pre-pre-pre alpha, and unless you have a receiver php file set up where the controller posts (https://github.com/drmmr763/joomla-cms/blob/jstat/installation/controllers/setup.json.php#L406) you will likely get an error.

(my receiver file is here: https://github.com/drmmr763/curl)

If anyone has any thoughts on the direction I'm going right now I'd certainly appreciate them. 

-Chad

Regards,
Chad Windnagle
Fight SOPA


Sam Moffatt

unread,
Jun 9, 2012, 4:49:46 PM6/9/12
to joomla-...@googlegroups.com
$client = new JHttp;
$response = $client->post($url, $data, $headers, $timeout);

URL is the string URL and data can be an array (which is what you're
using) or a raw string to post. Headers is an array and Timeout is a
timeout in seconds, both of which are optional.

You get back a response object with the result code, headers and
response. You have to parse the response out yourself for what ever
you need.

Cheers,

Sam Moffatt
http://pasamio.id.au


Michael Babker

unread,
Jun 9, 2012, 5:00:49 PM6/9/12
to joomla-...@googlegroups.com
Using the JHttp class would actually have some benefit to using another library; for example, it would enable you to use socket and stream connections when cURL isn't available (based on the code in Platform 12.1, which I think has a tracker item to back port it for CMS 2.5).  The example I sent you on Twitter on how JGithub uses it is probably more complex than what you'd need in this instance, so here's a Gist that should get you up and running: https://gist.github.com/2902581

Chad Windnagle

unread,
Jun 9, 2012, 5:34:32 PM6/9/12
to joomla-...@googlegroups.com
Thanks Sam and Michael.

Some updates:

  • Have removed the lib dependancy from my code. Was able to get the cURL posting using Joomla's class as suggested by a few folks (yay!)
  • I have data being sent now that is actually relevant. Using the existing model I'm pulling the PHP configuration and sending it as a string. Clearly we will want to build an object out of this for later use, but getting back to my proof of concept, it's perfect to see stuff happening. (https://twitter.com/drmmr763/status/211570614270562304/photo/1)
Over the next few days I plan to:
  • Build an actual joomla component to accept these cURL posts
  • Get all the correct data set up with SQL tables
  • Look at implementing some sort of UID for records
As always, comments and suggestions (and pull requests ;) ) are welcome.

-Chad

Regards,
Chad Windnagle
Fight SOPA


Sam Moffatt

unread,
Jun 10, 2012, 4:13:13 AM6/10/12
to joomla-...@googlegroups.com
I built a quick proof of concept app and sample client which could
exist as an end point for a stats collection process. The advantage of
this over a Joomla! component is that it can be a lot lighter, ignores
stuff like session handling and focuses rather tightly on handling one
job reasonably well (though obviously with room for expansion).

It uses some of the new router stuff that Louis has been working on
plus using a new JInputJSON class that has been added to the Platform.
It also features a directory structure similar to the pull tester if
you've seen that. It also makes use of the new JController class and
I'm using a JTable to handle storing data simply and easily.

One thing of note is that I built a server side client ID generation
algorithm. It picks the present time on the server, the remote port
for the request and a random number. I'm toying with returning this as
a base 32 string which is a little nicer looking that a lot of
characters. The advantage of this is that it uses time to help ensure
uniqueness, the remote port (a little more entropy) and a random
value. New installs when submitting can ignore the client_id and store
the one generated from the server locally which would provide a way
for the system to update itself. Wouldn't solve host level
deduplication but would provide install level tracking.

Cheers,

Sam Moffatt
http://pasamio.id.au


Janich

unread,
Jun 11, 2012, 3:47:03 AM6/11/12
to Joomla! CMS Development
I agree with Rouven. More info would be preferred to help developers
best, but we probably need to define the goal a bit more precise?

On a sidenote, I believe Radek's example of his own statistics through
Sobi, is a good example to look at: http://sigsiu.net/statistics
Fx. Did you know 12.5% cant use JSON? I surely didnt.

This would be so very cool to have for the Joomla! project!

Best regards,
Janich


On Jun 6, 12:00 am, Rouven Weßling <m...@rouvenwessling.de> wrote:
> On 05.06.2012, at 23:39, Beat wrote:
>
> > Our stats database could have columns: id, created_datetime,
> > last_updated_datetime, one_way_sha_hashed_salted_IP,
> > latest_php_version, latest_sql_type, latest_sql_version
> > I think that's enough and not sensitive storage as it does not contain
> > any private information.
>
> I don't think that is enough data. Thinking back about past discussion I'd include the server OS version, the web server used and most importantly the averrable PHP extensions. This would give us the whole stack.
>
> Best regards
> Rouven

Rouven Weßling

unread,
Jun 11, 2012, 3:50:55 AM6/11/12
to joomla-...@googlegroups.com

On 11.06.2012, at 09:47, Janich wrote:

> On a sidenote, I believe Radek's example of his own statistics through
> Sobi, is a good example to look at: http://sigsiu.net/statistics
> Fx. Did you know 12.5% cant use JSON? I surely didnt.

Iw as really shocked when I looked at a few data points, than I saw this:

Last update: Thu Oct 8 10:00:29 CEST 2009

That puts things back into perspective.

Best regards
Rouven

Donald Gilbert

unread,
Jun 11, 2012, 8:37:25 AM6/11/12
to joomla-...@googlegroups.com
Please explain what you mean by host level deduplication.

My thinking is that even if there are several installs of Joomla on the same server, it wouldn't matter. We're not looking to see what type of infrastructure the host is supporting, we're trying to see what type of environment that Joomla users utilize. 

Chad Windnagle

unread,
Jun 11, 2012, 8:47:41 AM6/11/12
to joomla-...@googlegroups.com
I think he means that, if say, hosting company X has 1000 Joomla installs on their servers, do we need to record all their environments, since it's all the same. (thus reducing data storage, improving performance etc...)

Regards,
Chad Windnagle 
s-go Consulting, LLC
Mobile: 607-229-6260
--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/rocwbduS1wsJ.

Victor Drover

unread,
Jun 11, 2012, 8:54:25 AM6/11/12
to joomla-...@googlegroups.com
If the goal is to help developers know what systems they are developing for, then the # of installs in a specific environment seems pretty important. Can't you add a field in the DB to simply increment for each identical instance? That way, we get all the info and should still have efficient data storage.

-V

Daniele Rosario

unread,
Jun 11, 2012, 8:55:36 AM6/11/12
to joomla-...@googlegroups.com
Just to point something out, there are several hosts that on the same server have different php.ini setups because they allow users to change that from their panel.

This may be something to keep into account

Daniele Rosario

Phil Brown

unread,
Jun 11, 2012, 8:58:59 AM6/11/12
to joomla-...@googlegroups.com
I wonder if the number of users on X hosts comes into the equation at all.  Wouldn't we simply be more concerned with the environment that is available to each individual install?
Especially as a host may support multiple multiple PHP versions and also upgrade said versions on that server?

Also the abundance of data could be processed later into a more aggregated format segregated into weeks, months etc?

Regards,

Phill Brown
M  04 2481 9754
Bathurst Software Solutions
-------------------------------------------------------------------------------------------------------------------



--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/rocwbduS1wsJ.

Victor Drover

unread,
Jun 11, 2012, 9:02:42 AM6/11/12
to joomla-...@googlegroups.com
Simple Example:

Lets say we identify 1000 different hosting environments and only 10 are using PHP5.2. 

990 are using PHP5.3.

As developers, you might conclude that only 1% of people are still using PHP5.2.

However, if we recored the # of installs in each server environment, we might find say 10,000 sites, with 5000 using PHP5.2 and 5000 using PHP5.3. 

IMO without the # of installs, the data is not very informative.

Sam Moffatt

unread,
Jun 11, 2012, 12:16:23 PM6/11/12
to joomla-...@googlegroups.com
I didn't say we don't need to record each distinct site however we
don't need to store 1000 copies of the same data, we can create a
single host record (or variant of it for those hosts who permit
customisation there) and link the individual install to it. That gives
us both statistics: unique hosting environments and their
configuration and configuration by number of installs. And we don't
need to store 1000 copies of the same data, just one and link to it.
Effectively being able to track hosts allows some minor database
normalisation to reduce redundant data storage and provide deeper
statistics.

Cheers,

Sam Moffatt
http://pasamio.id.au


Victor Drover

unread,
Jun 11, 2012, 12:19:26 PM6/11/12
to joomla-...@googlegroups.com
that's what i suggested earlier Sam. Has anyone suggested recording 1000 identical records? I may have missed that.

Beat

unread,
Jun 12, 2012, 4:25:52 AM6/12/12
to Joomla! CMS Development
Hi Radek,

Re:

> http://sigsiu.net/statistics

Wow ! really nice stats! Well done!

Are they still from 2009 ?
If yes, it would be great if you habe time to update your stats pages
to latest stats, and maybe also have a second set which only takes in
account Joomla 2.5.x installations for the server settings ?

That way older servers can get ignored, and we get a more forward-
looking image.

But yes, I see the point of collecting more stats. This kind of
anonymous stats are very useful to developers! :)

Well done!

Best Regards,
Beat
http://www.joomlapolis.com/

Donald Gilbert

unread,
Jun 13, 2012, 8:43:13 AM6/13/12
to joomla-...@googlegroups.com
Effectively being able to track hosts significantly decreases anonymity.

A great unique ID system (as already stated) could be the IP address and Url (ie 127.0.0.1+example.com) as a salted and stretched hash would be completely untraceable back to the origination site or host.

The data would need to be periodically merged and purged, like maybe once a month or something. Taking all the info collected and reducing it to just relevant info and then inserting / updating another table that holds all possible configurations with an incrementing `install_count` field or something like that.

Any comments or am I way off base on this?


>> To post to this group, send an email to joomla-dev-cms@googlegroups.com.
>> To unsubscribe from this group, send email to
>> For more options, visit this group at
>> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Joomla! CMS Development" group.
> To post to this group, send an email to joomla-dev-cms@googlegroups.com.
> To unsubscribe from this group, send email to
> For more options, visit this group at
> http://groups.google.com/group/joomla-dev-cms?hl=en-GB.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Joomla! CMS Development" group.
> To post to this group, send an email to joomla-dev-cms@googlegroups.com.
> To unsubscribe from this group, send email to

Federico Rivera

unread,
Jun 13, 2012, 9:18:04 AM6/13/12
to joomla-...@googlegroups.com

Sent: Wednesday, June 13, 2012 7:43 AM
Subject: Re: [jcms] Re: Collecting Install Statistics

To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-cms/-/bJEh-u3n8G8J.
To post to this group, send an email to joomla-...@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cm...@googlegroups.com.

Chad Windnagle

unread,
Jun 15, 2012, 9:52:05 PM6/15/12
to joomla-...@googlegroups.com
Hi all:

Thank you so much for your contributions to this topic so far. It's almost 100 replies and I believe all of them have been positive! That's very exciting.

I have generated a simple proposal style document that I tried give an top-down view of the goal and project, and a detailed list of the information I would like to see stored. 

That document is here:

The public is more than welcome to comment. If you would like to be added as contributor to change some things, please let me know and I'm willing to add you.

Some things in particular I would like to get by generating this list:
  • Settle on a technology to post the data
    • I've been experimenting with cURL. If there is another / preferred / alternative way to do this, I'd like to look into that solution as well. 
  • Confirm list of data to be stored
    • I have a very lengthy list of info, but I don't know if it's all necessary. I tried to take info that I thought would be relevant to the development community. As fellow experts, please read through the list and look for information that is either unimportant or missing so we can refine this list.
Thanks all for your help and comments thus far. I hope we can make this a fruitful prospect.

-Chad

Regards,
Chad Windnagle
Fight SOPA


Matt Lipscomb

unread,
Jun 16, 2012, 2:13:06 AM6/16/12
to joomla-...@googlegroups.com
Chad, 

I like the idea of an API for charting/graphing.  There are some OpenSource HTML5 graphing tools out there that I've used in the past and that we are deploying in JED (if you are interested in that when you get to that point).  see; https://github.com/HumbleSoftware/Flotr2

Since the data will be open to everyone (either through a static public output or through a registered API call) then that alleviates the concerns I had with a conflict between the core having a call-home function and it not being allowed for extension developers.  Basically - if the data is public/api available, then there's no good reason a developer would need to have their own call-home doing the same thing.  (I feel the same way about the new extension updater - but that's a different discussion.)

One thing you said in the doc could probably be expanded:
  • A tool will be placed in the Joomla global configuration screen that will allow a Super-Admin to agree to send server and Joomla-instance specific information to a central server.

It would be more practical IMO to have this as an opt-in on the installation screen (checked by default).  Of course still in the global config, but the majority of people wouldn't even notice it was enabled so wouldn't disable it as it doesn't effect anything on their site.

A point of clarification on the data though:  When you mention collecting statistics of extensions used, are you suggesting data such as 22 plugins, 5 components, 40 modules - or what specific extensions and their versions are used?

~Matt Lipscomb
It is loading more messages.
0 new messages