full text search

156 views
Skip to first unread message

amber

unread,
Jul 17, 2012, 2:55:04 AM7/17/12
to google-cloud...@googlegroups.com
Hi , 

I imported a DB (~100k rows) on which I am doing full text searches, 
In CloudSQL they are taking more than 30 seconds causing 
Apennine to exceed the request time limit. 

Any suggestions? Or are full text searches just too slow to be done in 
the GCS?

Razvan Musaloiu-E.

unread,
Jul 17, 2012, 3:05:03 AM7/17/12
to google-cloud...@googlegroups.com
Do you mind providing a dataset and some examples of slow queries? Feel fre to contact me off list for this.

-- Razvan ME

Reinaldo Aguiar

unread,
Jul 17, 2012, 10:44:46 AM7/17/12
to google-cloud...@googlegroups.com
Too slow, I tried this and had to do a custom solution.

What I did: 

- Get an amazon EC2 linux instance, install sphinx, create a snapshot image of the machine.
- Create a web service that exposes your indexed data in xml format with a kill-list (the xml saved to blobstore but served trough a password protected app engine handler).
- Create a cronjob on the ami, that fetches de XML every n hours and runs the indexer.
- Create a webservise on the Ami (I used py's SimpleHTTPServer) to expose the search server.  Re-save the ami.
- Query the search service using appengine's url fetch service and get the actual objects from mysql by ID.

Actually, once I was at this point I switched to Datastore completely because it is faster and more reliable. I only was using mysql for the full text searches anyways ...

Joe Faith

unread,
Jul 17, 2012, 11:41:53 AM7/17/12
to google-cloud...@googlegroups.com

Hi Amber

What storage engine were you using? Full text search is currently (MySQL 5.5) only available in myISAM, but the performance of InnoDB is much better in general.

One solution is to keep just the tables that require search in myISAM and move the rest.

J

amber

unread,
Jul 18, 2012, 1:39:19 AM7/18/12
to google-cloud...@googlegroups.com
Hi all,

thanks to ur quick reply,

i m using nested queries to join to two table and  in googel cloud sql, we stored 100k of records and now while searching through google app its taking too much time and page not display anything.

and in log its showing following error,

This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application. 

amber

unread,
Jul 18, 2012, 1:44:43 AM7/18/12
to google-cloud...@googlegroups.com
Hi Joe,

I m using google cloud sql and myISAM engine and searching through google app engine application.

Razvan Musaloiu-E.

unread,
Jul 18, 2012, 1:52:07 AM7/18/12
to google-cloud...@googlegroups.com
Can you run EXPLAIN on one of the queries and examine the query plan? If the query uses temporary tables and the result is too big then mysql will spill the temporary tables to disk and that can cause a significant slowdown.

Some good references on EXPLAIN:

amber

unread,
Jul 18, 2012, 4:09:40 AM7/18/12
to google-cloud...@googlegroups.com
Hi,

tables structure are as follows,

CREATE TABLE table1
(UniqueID VARCHAR(255) NOT NULL PRIMARY KEY, Company VARCHAR(255) NULL, Category1 VARCHAR(255) NULL, Category2 VARCHAR(255) NULL, Category3 
VARCHAR(255) NULL, Category4 VARCHAR(255) NULL, Category5 VARCHAR(255) NULL,
 SubCategory1 VARCHAR(255) NULL, SubCategory2 VARCHAR(255) NULL, SubCategory3 VARCHAR(255) NULL, SubCategory4 VARCHAR(255) NULL,
FULLTEXT (Company,Category1,Category2,Category3,Category4)
) ENGINE=MyISAM DEFAULT CHARSET=utf8; 


CREATE TABLE table2 
(SubID VARCHAR(255) NOT NULL PRIMARY KEY , UniqueID VARCHAR(255) NOT NULL FOREIGN KEY REFERENCES table1(UniqueID), COUNTRYCODE VARCHAR(127) NULL, STATUS VARCHAR(255) NULL, TITLE MEDIUMTEXT NULL,  ABSTRACT LONGTEXT NULL, CLAIMS LONGTEXT NULL, PUBLICATIONDATE DATE NULL, EARLIERPRIOTITYDATE DATE NULL, DESCRIPTION LONGTEXT NULL
FULLTEXT (TITLE,ABSTRACT,CLAIMS,DESCRIPTION)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Sample Query,

select title,SubID from table2 where SubID in(select max(SubID) from table2 where Abstract like '%systemt%' group by UniqueID)

Result:
display the max subid in group of uniqueid records which match 'system' keyword in abstract column 

Razvan Musaloiu-E.

unread,
Jul 18, 2012, 12:49:02 PM7/18/12
to google-cloud...@googlegroups.com
LIKE works with InnoDB tables. Do this tables change often? Could you maintain a copy of them in InnoDB format?

Another idea: why don't you use MATCH() ... AGAINST instead of LIKE?

-- Razvan ME

amber

unread,
Jul 19, 2012, 12:33:57 AM7/19/12
to google-cloud...@googlegroups.com
Hi Razvan,

I used  MATCH() ... AGAINST also but still performance not improved,... its taking time near about 52 seconds for 1300 records but actual size of our database will be more than 100k records so that's time it wont work Please suggest the any solution to increase speed of accessing data through google app engine application  

Joe Faith

unread,
Jul 19, 2012, 11:10:37 AM7/19/12
to google-cloud...@googlegroups.com
Hi Amber

We would strongly suggest you use InnoDB tables wherever possible.


J
--
Joe Faith | Product Manager | Google Cloud

amber

unread,
Jul 20, 2012, 3:37:59 AM7/20/12
to google-cloud...@googlegroups.com
Hi Joe,

First I used Innodb engine only then that time performance for accessing data was very low so later i thought to used myISAM engine and implementd fulltext  due to this little bit performance is increased but still we require good speed to access the data from colud sql through google app within 2 to 3 seconds.

E.g for 1300 rows:-
for Innodb - its taking more than  one minute so request time out occur and page is not displayed in google app (1300 rows)
for myISAM - its taking near to 52 seconds (1300 rows)

but actual size of our data will be more than 1300 rows near about 100k rows
Reply all
Reply to author
Forward
0 new messages