Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Count unique data in column

1,743 views
Skip to first unread message

kev

unread,
May 25, 2006, 3:16:13 PM5/25/06
to
Hi,

I have an optimization question. Is there a way to count instances of
each unique entry in a column (not unique entries)? Let me explain
using the column count.

count
3
3
4
5
2
3
2

Question:
Is there a way to return an associative array of total instances of
unique entries?
Desired Result: {2}->2, {3}->3, {4}->1, {5}->1

The current script requests all data (SELECT *) and sorts them in a
script. I want to offset the count to the SQL server (good idea?).

Kev

Bill Karwin

unread,
May 25, 2006, 3:33:37 PM5/25/06
to
kev wrote:
> Is there a way to return an associative array of total instances of
> unique entries?
> Desired Result: {2}->2, {3}->3, {4}->1, {5}->1

SELECT `count`, COUNT(`count`) FROM tablename GROUP BY `count`

This is made a little bit confusing by the fact that your column is
named `count`.

Regards,
Bill K.

kev

unread,
May 25, 2006, 6:03:59 PM5/25/06
to

Query Result:

2 - 2
3 - 3
4 - 1
5 - 1

Simple! Thanks Bill.

kev

unread,
May 25, 2006, 6:08:29 PM5/25/06
to
I have a concern about my request on top of that since we are on the
topic of optimization.

What is a good way to approach SQL queries?

Should I connect to the server on every query? (looks like a bad idea
because you will increase server bandwidth assuming not localhost)

Example:
Start session

On every query: Connect => Query => Disconnect

End session

Should I use pconnect (OR connect and save connection handler) and
disconnect after the session has ended? How does the server handle the
event when users are on a pconnect and the browser is closed? (would
pconnect disconnect once session is over?) I do not want to leave
multiple connections open to timeout.

Example:
Start session => Connect

On every query: Query

End session = Disconnect

Feedback would be great. Just want to establish a standard. I am
currently using the first method but I am wondering if it should worry
about scalability. Thanks in advance

Kev

Bill Karwin

unread,
May 25, 2006, 7:03:51 PM5/25/06
to
kev wrote:
> Should I connect to the server on every query?

No, I would not recommend this. Connecting costs a large amount of
overhead. It has to find the host, establish a socket, authenticate
your username and password, and set up other resources. This is
actually pretty expensive to do, and you should avoid doing it for every
query.

> Should I use pconnect (OR connect and save connection handler) and
> disconnect after the session has ended?

I assume you mean mysql_pconnect(), part of the PHP mysql package.
http://www.php.net/function.mysql-pconnect

Yes, this is a good thing to do for greater scalability. This reduces
the number of times that PHP needs to create a new connection to MySQL
The web server (Apache) that runs PHP manages the persisent connections
and lets them be re-used by later PHP requests.

However, if you use PHP in CGI mode, starting a new process for each PHP
request, the persistent connections won't last anyway.

> How does the server handle the
> event when users are on a pconnect and the browser is closed? (would
> pconnect disconnect once session is over?)

According to the manual, mysql_close() does not close connections opened
by mysql_pconnect(). The persistent database connections will be used
by later requests for different web sessions.

Also note that subsequent PHP requests are _not_ likely to use the same
database connection. So if you create connection-based things like
temporary tables or user variables, use them only during the same PHP
request, and then you should drop that table or set that variable to
NULL, or else some other web session will get to see it.

> I do not want to leave multiple connections open to timeout.

You can force MySQL to time out long connections using the
"interactive-timeout" option in the my.cnf (my.ini) configuration file.
The default for this timeout is 28800 seconds. See
http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html

Regards,
Bill K.

Jerry Stuckle

unread,
May 25, 2006, 9:42:08 PM5/25/06
to

Kev,

I don't generally use pconnect() - it keeps connections open even if no one is
using the website (or php code). It might be find if you're running a site with
100K hits/hr. But none of mine do that.

When I need MySQL, I just connect at the start of the page and close at the end
of the page. Nice, clean, no muss, no fuss.

Additionally, if you're using shared hosting, persistent connections may be
disabled.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstu...@attglobal.net
==================

Axel Schwenke

unread,
May 30, 2006, 10:57:42 AM5/30/06
to
Bill Karwin <bi...@karwin.com> wrote:
> kev wrote:
>> Should I connect to the server on every query?
>
> No, I would not recommend this. Connecting costs a large amount of
> overhead. It has to find the host, establish a socket, authenticate
> your username and password, and set up other resources. This is
> actually pretty expensive to do, and you should avoid doing it for every
> query.

I agree that one should not connect for every single query.
OTOH MySQL connects are pretty cheap, probably cheaper than the
underlying TCP connection setup in the network stack.

>> Should I use pconnect (OR connect and save connection handler) and
>> disconnect after the session has ended?
>
> I assume you mean mysql_pconnect(), part of the PHP mysql package.
> http://www.php.net/function.mysql-pconnect
>
> Yes, this is a good thing to do for greater scalability.

No, it is not. Contrary, using mysql_pconnect() is considered harmful,
as it tends to hog the MySQL server with idle connections. A very
detailed (but German) discussion of the topic is here:

<http://kris.koehntopp.de/artikel/webtune/>

In a nutshell:

a) many connections:

This problem arises if you use different connections (that is <host,
user, password> tupels) to connect to the database. If your scripts
(*all* scripts running in one Apache instance) use N different
connections, PHP will use up to N * max_clients connections to the
database - independent from the real webserver load.

Contrary, if each of your scripts uses only M different connections
at a time (in most cases M=1) and refrains from using persistent
connections, you will have at most M * max_clients open connections
to the database. Plus, you will only reach that limit if your web-
server is fully loaded.

Idle connections will eat memory and file descriptors. Not good.


b) MySQL sessions have state

There are lots of objects in MySQL with session scope: locks, user
variables, temporary tables. Currently there is no way to reset a
session, so the best way to destroy those objects is to close the
connection. Additionally there is a hidden attribute - the selected
database (mysql_select_db(), USE <db>). If you re-use a MySQL
connection handle in PHP, you cannot know the selected database.
Therefore many database abstraction layers for PHP call
mysql_select_db() *mandatory* right before each mysql_query().

Forgotten locks may render your database useless. Forgotten temp.
tables will eat memory and/or disk space. User variables may spread
sensitive information between web applications. Very bad.


Conclusion: mysql_pconnect() solves a nonexistent problem. Also it
creates new problems. You should not use it.


XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/

Bill Karwin

unread,
May 30, 2006, 12:32:48 PM5/30/06
to
Awesome! Thanks for the information, Axel.

Regards,
Bill K.

kev

unread,
Jun 2, 2006, 12:49:49 PM6/2/06
to

Good to know. Thanks for the info. Now I have to figure out how to pass
the SQL connection handle across pages/the session.

Jerry Stuckle

unread,
Jun 2, 2006, 2:59:59 PM6/2/06
to

Kev,

You can't pass connections across pages - the connection will automatically
close at the end of the PHP script if you don't close it yourself (it's a good
idea to close it, though).

But that's OK - you really don't want to keep the connection open while the user
goes to lunch. Just connect before your first MySQL and close the connection
after the last one (or at the end of the page).

kev

unread,
Jun 2, 2006, 4:51:15 PM6/2/06
to

I see. So its connection per page request. :) Thanks again.

Kevin

0 new messages