Google Groups Home
Help | Sign in
Run query in background while showing "Please wait"
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  15 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
kdecapite  
View profile
 More options Mar 26, 6:04 pm
From: kdecapite <decap...@creationsite.com>
Date: Wed, 26 Mar 2008 15:04:47 -0700 (PDT)
Local: Wed, Mar 26 2008 6:04 pm
Subject: Run query in background while showing "Please wait"
I have a CakePHP CRM application with 30,000+ contact records. Each
Contact hasMany Address records. One feature of my app is to allow my
client to search for duplicate entries within the contacts table
before importing new contacts. In short, I am querying the entire
Contact table and LEFT JOINing with the Address table. I store the
result set into an array and then use PHP's array functions to filter
potential duplicates from a "pending contacts" table.

What I want to do is somehow run this entire process (the database
query and array comparison) in the background (perhaps with a PHP
exec() call) and show some type of "Loading, please wait..." interface
to the application user. I'm thinking something similar to Expedia's
"Please wait while we search for your flight" interface.

The problem is, how would I do this? Any ideas or suggestions are
welcome! Thanks in advance...

- Kevin


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
bingo  
View profile
 More options Mar 26, 6:49 pm
From: bingo <ragra...@gmail.com>
Date: Wed, 26 Mar 2008 15:49:54 -0700 (PDT)
Local: Wed, Mar 26 2008 6:49 pm
Subject: Re: Run query in background while showing "Please wait"
you will need to use Ajax....
search on ajax + cakePHP

On Mar 26, 6:04 pm, kdecapite <decap...@creationsite.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jonknee  
View profile
 More options Mar 26, 7:14 pm
From: jonknee <m...@jongales.com>
Date: Wed, 26 Mar 2008 16:14:01 -0700 (PDT)
Local: Wed, Mar 26 2008 7:14 pm
Subject: Re: Run query in background while showing "Please wait"

> I have a CakePHP CRM application with 30,000+ contact records. Each
> Contact hasMany Address records. One feature of my app is to allow my
> client to search for duplicate entries within the contacts table
> before importing new contacts. In short, I am querying the entire
> Contact table and LEFT JOINing with the Address table. I store the
> result set into an array and then use PHP's array functions to filter
> potential duplicates from a "pending contacts" table.

What exactly are you doing? It can probably be handled pretty quickly
by the database. You shouldn't have to manually fish out dupes.

I always thought the please wait while we search messages were either
a [poor] attempt to seem extra high-tech or a sign that they have a
really inefficient system. Google searches the whole web--not just a
day's commercial flights--and comes back in a flash.

I have used a PHP app with a valid reason for it, it was a mailing
list program and it accomplished sending large amounts of email out by
batching (the page would keep reloading with incrementing GET
variables, such as start=0&num=100). It wouldn't work in your case.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kdecapite  
View profile
 More options Mar 26, 7:39 pm
From: kdecapite <decap...@creationsite.com>
Date: Wed, 26 Mar 2008 16:39:06 -0700 (PDT)
Local: Wed, Mar 26 2008 7:39 pm
Subject: Re: Run query in background while showing "Please wait"
What I have is an ImportContact model which is simply a table
containing basic information about a person (name, email, address,
etc). This table gets populated either from a standard website
"Contact Us" form or by my client manually importing a CSV file. Now,
once this ImportContact model has new records in it, I want to display
a list of *potential* duplicates by various criteria. The "master"
contacts model is simply called Contact and contains 30,000+ records.
The Contact model does NOT include addresses, there is a separate
Address model for this (again, with 30,000+ records).

The reason I can't have the database do the searching for me is
because the criteria I want to search against to find potential
duplicates doesn't have a direct one-to-one correlation to the
database table fields.

For example this "duplicate search" criteria works great using Cake:

- Show potential duplicates based on field `email1`
- Cake will generate a SELECT...IN query, listing ALL email addresses
from the ImportContact model and returning all records from the
Contact model containing anyone of the listed emails

However, this example does NOT work:

- Show potential duplicates based on fields `fn` and `ln` (a person's
full name)
- Since I have to CONCAT(`fn`, `ln`) AS `full_name`, the SELECT...IN
syntax no longer works

So what I did was query ALL records from both ImportContact and
Contact models and store each of these result sets in their own
arrays. Then I perform the search using PHP array functions and store
the potential duplicates in a third array. This is working great, but
it's just on the slow side (sometimes 3+ minutes). It's not just the
array searching that's taking a while either, it's actually the
database query retrieving 30,000+ records that's slowing things down
as well.

I'm just giving a brief overview here and can provide specific code if
you like. Thanks for the quick replies, maybe I can get this thing to
run more efficiently but I'm not quite sure how :-/

- Kevin

On Mar 26, 4:14 pm, jonknee <m...@jongales.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jonknee  
View profile
 More options Mar 26, 8:56 pm
From: jonknee <m...@jongales.com>
Date: Wed, 26 Mar 2008 17:56:36 -0700 (PDT)
Local: Wed, Mar 26 2008 8:56 pm
Subject: Re: Run query in background while showing "Please wait"

> The reason I can't have the database do the searching for me is
> because the criteria I want to search against to find potential
> duplicates doesn't have a direct one-to-one correlation to the
> database table fields.

Hmm. Is there a way you could make them directly correlate? It's a
pretty fast DB query if you can get them to line up. If I'm
understanding it correctly, you just have a full name in the import
model and broken out first and last names in your contact model. This
can work, though you'll probably need some manual SQL. Instead of
CONCAT you want to use CONTACT_WS(' ', 'fn', 'ls) which will result in
a space between the first and last names.

SELECT * FROM contactImport WHERE full_name IN (SELECT CONCAT_WS(' ',
first_name, last_name) FROM contact)

That will match a record in contactImport with a full_name of "Joe
Blow" when there is a record in contact with a first name of "Joe" and
a last name of "Blow".

The beauty of doing the heavy lifting in SQL, even if you can't do it
in the ORM, is that you only have to bring the matches across the
wire. So instead of a minimum of 30,000 records coming out of the DB
(and each getting messed around with Cake's ORM) you get only the
dupes.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kdecapite  
View profile
 More options Mar 27, 12:59 pm
From: kdecapite <decap...@creationsite.com>
Date: Thu, 27 Mar 2008 09:59:53 -0700 (PDT)
Local: Thurs, Mar 27 2008 12:59 pm
Subject: Re: Run query in background while showing "Please wait"

> If I'm understanding it correctly, you just have a full name in the import
> model and broken out first and last names in your contact model.

I may have not explained the setup properly. I don't actually have a
"full_name" field in either table (import_contacts and contacts are
the actual MySQL table names). Each of these tables have "fn" and "ln"
fields, however. So the problem is that I need to somehow find all
records in the contacts table which have the same values in the "fn"
and "ln" fields as the import_contacts table.

> Hmm. Is there a way you could make them directly correlate? It's a
> pretty fast DB query if you can get them to line up.

The only idea I have at the moment is to create a field in both tables
which simply stores the concatenated value of a record's "fn" and "ln"
fields. This would require me to write a script to retro-fit the
existing 30k contact records, as well as update some logic in my "add
contact" controller. Not to mention this would also create a redundant
field in the table and still doesn't ultimately solve the problem
because what if in the future I want to find duplicates matching "fn",
"ln" and "mi" (middle initial)? I would have to create yet another
"dummy" field in my table.

> The beauty of doing the heavy lifting in SQL, even if you can't do it
> in the ORM, is that you only have to bring the matches across the
> wire. So instead of a minimum of 30,000 records coming out of the DB
> (and each getting messed around with Cake's ORM) you get only the
> dupes.

Oh I hear you on this one :) Once I realized I *couldn't* do it with
SQL I was quite bummed! Of course, maybe there's still a way...

- Kevin


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
b logica  
View profile
 More options Mar 27, 3:33 pm
From: "b logica" <foo.log...@gmail.com>
Date: Thu, 27 Mar 2008 15:33:58 -0400
Local: Thurs, Mar 27 2008 3:33 pm
Subject: Re: Run query in background while showing "Please wait"

On Thu, Mar 27, 2008 at 12:59 PM, kdecapite <decap...@creationsite.com> wrote:

>  I may have not explained the setup properly. I don't actually have a
>  "full_name" field in either table (import_contacts and contacts are
>  the actual MySQL table names). Each of these tables have "fn" and "ln"
>  fields, however. So the problem is that I need to somehow find all
>  records in the contacts table which have the same values in the "fn"
>  and "ln" fields as the import_contacts table.

I could be misinterpreting your needs, but ...

SELECT c.id, c.fn, c.ln FROM contacts AS c
INNER JOIN import_contacts AS ic ON c.fn = ic.fn AND c.ln = ic.ln;

This will select the id, fn, and ln from contacts where it also exists
in import_contacts.

>  The only idea I have at the moment is to create a field in both tables
>  which simply stores the concatenated value of a record's "fn" and "ln"
>  fields. This would require me to write a script to retro-fit the
>  existing 30k contact records, as well as update some logic in my "add
>  contact" controller. Not to mention this would also create a redundant
>  field in the table and still doesn't ultimately solve the problem
>  because what if in the future I want to find duplicates matching "fn",
>  "ln" and "mi" (middle initial)? I would have to create yet another
>  "dummy" field in my table.

UPDATE TABLE contacts ADD COLUMN full_name VARCHAR(128);  -- or whatever size
UPDATE TABLE contacts SET full_name = CONCAT_WS(' ', fn, ln);

BUT, you'd probably be far better off simply creating an index on the tables:

CREATE INDEX contacts_fullname ON contacts (fn, ln);
CREATE INDEX import_contacts_fullname ON import_contacts (fn, ln);

This is all for MySQL, which i'm assuming you're using.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jonknee  
View profile
 More options Mar 27, 8:15 pm
From: jonknee <m...@jongales.com>
Date: Thu, 27 Mar 2008 17:15:50 -0700 (PDT)
Local: Thurs, Mar 27 2008 8:15 pm
Subject: Re: Run query in background while showing "Please wait"

> I may have not explained the setup properly. I don't actually have a
> "full_name" field in either table (import_contacts and contacts are
> the actual MySQL table names). Each of these tables have "fn" and "ln"
> fields, however. So the problem is that I need to somehow find all
> records in the contacts table which have the same values in the "fn"
> and "ln" fields as the import_contacts table.

That should still be able to be handled pretty easily with SQL. You
got me interested so I created a test DB formatted like you said
(first_name, last_name) and this query works fine. Only spits out the
dupes (when first and last names are the same, if one of the two is
different it doesn't show up):

SELECT * FROM contactImport WHERE first_name IN (SELECT first_name
FROM contact) AND last_name in (SELECT last_name FROM contact)

b logica's way would work too, so you could just check whatever is
faster. I don't have 30,000 records to test that out on :P.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dardo Sordi Bogado  
View profile
 More options Mar 27, 8:44 pm
From: "Dardo Sordi Bogado" <dardoso...@gmail.com>
Date: Thu, 27 Mar 2008 21:44:58 -0300
Local: Thurs, Mar 27 2008 8:44 pm
Subject: Re: Run query in background while showing "Please wait"

>  SELECT * FROM contactImport WHERE first_name IN (SELECT first_name
>  FROM contact) AND last_name in (SELECT last_name FROM contact)

That is a really bad query (for 30k records) and even don't solve the issue.

>  b logica's way would work too, so you could just check whatever is
>  faster. I don't have 30,000 records to test that out on :P.

blogica's one seems to be right.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kdecapite  
View profile
 More options Mar 27, 9:40 pm
From: kdecapite <decap...@creationsite.com>
Date: Thu, 27 Mar 2008 18:40:57 -0700 (PDT)
Local: Thurs, Mar 27 2008 9:40 pm
Subject: Re: Run query in background while showing "Please wait"
Thanks to everyone who has contributed so far, it's much appreciated!

Now, here's where we stand...

jonknee's query took about 25-27 seconds and returned 154 records.

b logica's query took about 6 seconds and returned 24 records

My original query takes 18 seconds and returns ALL records and is
exactly this:

SELECT CONCAT(`Contact`.`ln`, `Contact`.`fn`) AS `full_name`,
`Contact`.`id`, `Contact`.`created`, `Contact`.`modified`,
`Contact`.`fn`, `Contact`.`ln`, `Contact`.`co`, `Contact`.`email1`,
`Contact`.`phone_home`, `Address`.`line1`, `Address`.`line2`,
`Address`.`city`, `Address`.`st`, `Address`.`zip` FROM `contacts` AS
`Contact` LEFT JOIN `addresses` AS `Address` ON `Contact`.`id` =
`Address`.`contact_id` WHERE `Contact`.`ln` <> '' AND `Contact`.`fn`
<> '' AND `Contact`.`ln` IS NOT NULL AND `Contact`.`fn` IS NOT NULL
ORDER BY `Contact`.`ln` ASC, `Contact`.`fn` ASC

Then I store the results into an array and run this query:

SELECT * FROM `import_contacts` AS `ImportContact` ORDER BY `ln` ASC,
`fn` ASC

I store those results into an array, too, then I loop through the
ImportContact array looking for a matching key (PHP's array_key_exists
function). I can do this because I format the two result set arrays
using a concatenated string as the key of the array. For instance, a
record with a first name of "Mark" and last name of "Smith" gets
stored as:

array("MarkSmith" => array("field1" => "value1", "field2" =>
"value2"));

This is working great, it's just timing out my PHP script because it
can take 3+ minutes to execute the whole thing across the 30k records
I have :-/

On Mar 27, 5:44 pm, "Dardo Sordi Bogado" <dardoso...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jonknee  
View profile
 More options Mar 27, 11:39 pm
From: jonknee <m...@jongales.com>
Date: Thu, 27 Mar 2008 20:39:36 -0700 (PDT)
Local: Thurs, Mar 27 2008 11:39 pm
Subject: Re: Run query in background while showing "Please wait"

> Now, here's where we stand...

> jonknee's query took about 25-27 seconds and returned 154 records.

> b logica's query took about 6 seconds and returned 24 records

Sounds about right, except for the difference in number of dupes. How
many are there in total?

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
b logica  
View profile
 More options Mar 28, 9:45 pm
From: "b logica" <foo.log...@gmail.com>
Date: Fri, 28 Mar 2008 21:45:17 -0400
Local: Fri, Mar 28 2008 9:45 pm
Subject: Re: Run query in background while showing "Please wait"