Make a mysql table with frequent reads and frequent writes perform well?

87 views
Skip to first unread message

matt thomson

unread,
Aug 7, 2013, 11:56:38 PM8/7/13
to nzp...@googlegroups.com
Hi everybody,

I'm working on a forum like web app where I record what topics have been read by a logged in user. Usually for a many-to-many relation like this, I would use a 3rd table, and have a user_id and a topic_id column in the table. I'm not sure if this is scalable in this case though, the table will meet three conditions:

It will be written to on almost most page views (to say the current topic has been read).
It will be read from on most page views (to check which topics have been read and make this in the html).
It may get very large (eg 1 million+ records).

Usually, if a table is going to be this big and queried a lot, I will put an index on the columns that are queried. However, as well as being read from frequently, the table is also being written to frequently. As indexes take time to update on write, then updating the index constantly might end up slowing the table down so much that this method is not scalable.

Does anyone know a way around this issue?

Thanks.

ashley etchell

unread,
Aug 8, 2013, 12:03:01 AM8/8/13
to nzp...@googlegroups.com

Would using INSERT DELAYED help with the write process?

You could consider storing viewed posts per user in a serialized or csv text file or look at usi g merge tables somehow.

Regards
Ash

--
--
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to nzp...@googlegroups.com
To unsubscribe, send email to
nzphpug+u...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "NZ PHP Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nzphpug+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Ivan Kurnosov

unread,
Aug 8, 2013, 12:05:40 AM8/8/13
to nzp...@googlegroups.com
serialized solution would behave even worse: you need resources to:

1. deserialize
2. modify
3. store back

And as user uses the forum every run will take more CPU (serialization), memory (to store unserialized data), network IO (to transfer from DBMS to the application)
--
With best regards, Ivan Kurnosov

David Neilsen

unread,
Aug 8, 2013, 12:06:19 AM8/8/13
to nzp...@googlegroups.com
An index would be pretty typical even for many million rows.

Have you tried it, and is it a bottleneck? If not, don't prematurely optimise. 


David Neilsen | 07 834 3366 | PANmedia ®

Ivan Kurnosov

unread,
Aug 8, 2013, 12:09:15 AM8/8/13
to nzp...@googlegroups.com
My solution:

1. Store it normalized with `user_id | thread_id | last_view_time` columns (with composite index that covers first 2 columns)
2. As soon as you open some forum section - fetch all the thread states on the page in a single query and put in a session
3. As soon as you need to get status of if the thread was read or not - check it in session first
4. On thread open modify session and put the task to update in some queue. Then some worker flushes changes for all users to the database (in a bulk manner)

Jochen Daum

unread,
Aug 8, 2013, 12:48:16 AM8/8/13
to PHPUG
Hi,

I don't know a direct way around it, but depending on the sort of
environment you are working in, you can try different storage engine:
- InnoDB will lock less data, so it may perform better with many
processes, however if you lock in different directions you may get
deadlocks
- Memory will update very fast, but there are some issues with server
reboots and you need to have everything stored in memory. This could
be feasible for the 10% subset of your forum that is actually being
looked at (barely anyone looks at old posts)
- (I think) Falcon engine has fixed write performance, not sure about
reads, but you have more options to optimise read performance (for
example with caching)

One other thing to think of is what percentage of your table is being
read regularly. If this is too big (> about 15%) your indexes won't
work. use a technique like this to get around it:
http://dba.stackexchange.com/questions/4191/mysql-query-optimization-indexing-and-pagination/4192#4192
Your user_id index part will probably get your around this, unless
your power users read all posts (not unlikely in a forum)

You could also look at nested sets to store your forum topics and
build the "read" table as user_id|topic_id|message_id

HTH, Jochen
> --
> --
> NZ PHP Users Group: http://groups.google.com/group/nzphpug
> To post, send email to nzp...@googlegroups.com
> To unsubscribe, send email to
> nzphpug+u...@googlegroups.com
> ---
> You received this message because you are subscribed to the Google Groups
> "NZ PHP Users Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to nzphpug+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
P.S.: Newsletter - The business case for custom built software -
http://eepurl.com/w0Kcv

Kind Regards,

Jochen Daum

"There is no shortcut to anywhere worth going" - Beverly Sills

Automatem Ltd
Tauranga: +64 7 281 1289
Auckland: +64 9 630 3425
Mobile: +64 21 567 853
Email: j...@automatem.co.nz
Website: www.automatem.co.nz
Skype: jochendaum
http://nz.linkedin.com/in/automatem
http://twitter.com/automatem

Berend de Boer

unread,
Aug 8, 2013, 3:09:26 AM8/8/13
to nzp...@googlegroups.com
>>>>> "matt" == matt thomson <mthom...@gmail.com> writes:

matt> Usually, if a table is going to be this big and queried a
matt> lot, I will put an index on the columns that are
matt> queried. However, as well as being read from frequently, the
matt> table is also being written to frequently. As indexes take
matt> time to update on write, then updating the index constantly
matt> might end up slowing the table down so much that this method
matt> is not scalable.

This is really a non-problem, unless you approach the size of
Facebook.


--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/

Matt Thomson

unread,
Aug 8, 2013, 7:09:19 PM8/8/13
to nzp...@googlegroups.com
Thanks for all the info, I think what I will do is make a test table and write a script to insert lots of rows with a similar pattern to what I would expect in the real world, then do some benchmarking of writing a new row at 100,000 rows, 200,000 rows, 500,000 rows... etc, and see how the table performs. I just assumed to index would slow down the frequent writes here but I'm thinking I should test that assumption, I'll post the benchmark results back next week...

Simon Holywell

unread,
Aug 9, 2013, 1:59:38 AM8/9/13
to nzp...@googlegroups.com

In addition I would look at how PHPBB, PunBB, Vanilla and other OS forums get around this problem. Perhaps even Discourse might be worth looking at.

Matt Thomson

unread,
Sep 4, 2013, 10:57:55 PM9/4/13
to nzp...@googlegroups.com
Just following up with the results of my testing here, it looks like this is actually a non-issue as suggested, I made a test table with 500,000 rows and profiled an insert query, and it looks like the time taken to update the index is very small. I haven't done query profiling before, can someone suggest to me if this is a conclusive way to measure the query performance. This is from my mysql console:

mysql> SET profiling=1;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO index_update_test (user_id, topic_id) VALUES (61368,536857);
Query OK, 1 row affected (0.04 sec)

mysql> SHOW PROFILES;
+----------+------------+-----------------------
------------------+
| Query_ID | Duration   | Query
                  |
+----------+------------+-----------------------
------------------+
|        1 | 0.04045325 | INSERT INTO index_upda
ES (61368,536857) |
+----------+------------+-----------------------
------------------+
1 row in set (0.00 sec)

mysql> SHOW PROFILE FOR QUERY 1;
+------------------------------+----------+
| Status                       | Duration |
+------------------------------+----------+
| starting                     | 0.000135 |
| checking permissions         | 0.000012 |
| Opening tables               | 0.000034 |
| System lock                  | 0.000013 |
| init                         | 0.000017 |
| update                       | 0.000143 |
| Waiting for query cache lock | 0.000004 |
| update                       | 0.000013 | 
| end                          | 0.000004 |
| query end                    | 0.039789 | <<<<------ Looks like this is the index update.
| closing tables               | 0.000017 |
| freeing items                | 0.000266 |
| logging slow query           | 0.000005 |
| cleaning up                  | 0.000003 |
+------------------------------+----------+
14 rows in set (0.00 sec)


On Fri, Aug 9, 2013 at 5:59 PM, Simon Holywell <si...@holywell.com.au> wrote:

In addition I would look at how PHPBB, PunBB, Vanilla and other OS forums get around this problem. Perhaps even Discourse might be worth looking at.

--
--
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to nzp...@googlegroups.com
To unsubscribe, send email to
nzphpug+u...@googlegroups.com
---
You received this message because you are subscribed to a topic in the Google Groups "NZ PHP Users Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nzphpug/U8epbiswwgc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nzphpug+u...@googlegroups.com.

Berend de Boer

unread,
Sep 4, 2013, 11:07:43 PM9/4/13
to nzp...@googlegroups.com
>>>>> "Matt" == Matt Thomson <mthom...@gmail.com> writes:

Matt> I haven't done query profiling before, can someone
Matt> suggest to me if this is a conclusive way to measure the
Matt> query performance.

That an individual query is very fast, wouldn't surprise me. But you
would want to test a more real world setting: i.e. hundreds of users
doing inserts.

On the other hand, I suggest you don't have to bother. This is not the
80s, relational database are fine unless you are the size of Google or
so.
Reply all
Reply to author
Forward
0 new messages