mapper performance

72 views
Skip to first unread message

Matthew Pocock

unread,
Jul 5, 2011, 8:56:55 AM7/5/11
to lif...@googlegroups.com
Hi,

I have an application where I'm using the mapper interface to write rows into a single table (scala 2.9.0-1, lift 2.4-M1). It seems to be processing about 30 records a second. I've had a look with top and iotop, and have found the following:

      cpu   blocked in io
java  ~2%   ~0%
mysql ~2%   <1%

So, it looks like neither mysql nor java are anywhere near CPU bound or IO bound. However, the HD light is on fairly constantly. I'm noticing that [jdb2/sda1-8] has an IO of >95% which drops down to <1% when I terminate the upload. This is the journaling for my ext4 file system. So perhaps the mysql process is causing journaling to go into over-drive. According to iotop, mysql process is writing about 1/4M a sec to the disk, with the total write being ~ 1/2M a sec. However, it's not showing up any other processes as making up the remaining 1/2 M. These disks should, I presume, be able to handle speeds several orders of magnitude faster than this.

$ sudo hdparm -tT /dev/sda
/dev/sda:
 Timing cached reads:   2580 MB in  2.00 seconds = 1289.76 MB/sec
 Timing buffered disk reads: 182 MB in  3.03 seconds =  59.99 MB/sec
$ uname -a
Linux tawaif 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

Just to underline the degree to which nothing much is happening over a protracted period of time, here's the 'time' result when I last killed the process due to boredom.

real    54m8.864s
user    1m45.130s
sys     0m19.970s

I'm wondering what the problem could be. Either it's something to do with the ext4 journalling getting upset with mysql, or there's a problem with the mapper classes feeding mysql with data to write, or repeatedly getting itself in a suspended state. Perhaps someone has seen something like this before?

Thanks,

Matthew

--
Dr Matthew Pocock
Visitor, School of Computing Science, Newcastle University
(0191) 2566550

TylerWeir

unread,
Jul 5, 2011, 11:25:26 AM7/5/11
to lif...@googlegroups.com
I think you'll need to post an example before we can do anything.

People may be able to make an educated guess, but other than that it will be speculation.

Matthew Pocock

unread,
Jul 5, 2011, 12:06:34 PM7/5/11
to lif...@googlegroups.com
Hi,

I can't paste the program itself as it's covered by an NDA. However, here's a program that follows exactly the same pattern. I've not compiled it, so it may contain typos.

  def main(args: Array[String]) {
    // initialize database stuff
    val boot = new Boot
    boot.boot()

    val file = args(0)
    parseAll(new FileReader(file)) { row =>
      PersonAddress.fromRow(row).save()
    }
  }

  class PersonAddress private () extends LongKeyedMapper[PersonAddress] with CreatedUpdated with IdPK {
    def getSingleton: KeyedMetaMapper[Long, PersonAddress] = PersonAddress

    object firstName extends MappedString(this, 20)
    object lastName extends MappedString(this, 20)
    object fullName extends MappedString(this, 50)
    object address1 extends MappedString(this, 50) 
    object address2 extends MappedString(this, 50)
    object city extends MappedString(this, 30)
    object postcode extends MappedString(this, 10)
  }

  object PersonAddress extends PersonAddress with LongKeyedMetaMapper[PersonAddress] with CRUDify[Long, PersonAddress] {
    def fromRow(row: Row): PersonAddress = {
      val (ftN::ltN::flN::addr1::addr2::cty::pc::Nil) = row.cols
      firstName := ftN
      lastName := ltN
      fullName := flN
      address1 := addr1
      address2 := addr2
      city := cty
      postcode := pc
    }
  }

In the real application, there are several tables and the data model for representing the content of a file is richer. However, none of that has much to do with the key part, which is that in a loop (once per row of a file) I am creating an object and saving it. It looks like I am not getting throughput much above 30 objects a second but the CPU is not maxed out for either Java or Mysql and although my disk is grunching, it is two orders of magnitude below top write speed. Does that help?

Matthew

On 5 July 2011 16:25, TylerWeir <tyler...@gmail.com> wrote:
I think you'll need to post an example before we can do anything.

People may be able to make an educated guess, but other than that it will be speculation.

--
You received this message because you are subscribed to the Google Groups "Lift" group.
To view this discussion on the web visit https://groups.google.com/d/msg/liftweb/-/ufIntGKItHwJ.
To post to this group, send email to lif...@googlegroups.com.
To unsubscribe from this group, send email to liftweb+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/liftweb?hl=en.

MettaDas

unread,
Jul 5, 2011, 1:36:42 PM7/5/11
to lif...@googlegroups.com
On Tue, Jul 5, 2011 at 9:06 AM, Matthew Pocock <turingate...@gmail.com> wrote:
Hi,

I can't paste the program itself as it's covered by an NDA. However, here's a program that follows exactly the same pattern. I've not compiled it, so it may contain typos.


If you haven't compiled it, how do you know it demonstrates the problem?  And if it does not demonstrate the problem, how is it going to help anyone help you?

Frank. 

David Pollak

unread,
Jul 5, 2011, 1:54:32 PM7/5/11
to lif...@googlegroups.com
This code has a separate commit for each record save.  This means that there's a full IO Operation (IOOp) per record.  Given how long it takes for a hard drive to rotate, it means that you're not going to get very many commits per second.

If you put a DB.use around your parseAll call, all the saves will be in a single transaction and you should see much better throughput.

On Tue, Jul 5, 2011 at 9:06 AM, Matthew Pocock <turingate...@gmail.com> wrote:



--
Lift, the simply functional web framework http://liftweb.net

TylerWeir

unread,
Jul 5, 2011, 1:58:17 PM7/5/11
to lif...@googlegroups.com
Just about to say the same thing, saving inside an IO loop will not be close to optimal.

So I think your numbers are skewed.

David Pollak

unread,
Jul 5, 2011, 1:59:36 PM7/5/11
to lif...@googlegroups.com
On Tue, Jul 5, 2011 at 10:58 AM, TylerWeir <tyler...@gmail.com> wrote:
Just about to say the same thing, saving inside an IO loop will not be close to optimal.

Heh... great minds think alike... and so do we. ;-)
 

So I think your numbers are skewed.

--
You received this message because you are subscribed to the Google Groups "Lift" group.
To view this discussion on the web visit https://groups.google.com/d/msg/liftweb/-/RqWiGPGF9cwJ.

To post to this group, send email to lif...@googlegroups.com.
To unsubscribe from this group, send email to liftweb+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/liftweb?hl=en.

TylerWeir

unread,
Jul 5, 2011, 2:25:23 PM7/5/11
to lif...@googlegroups.com


On Tuesday, 5 July 2011 13:59:36 UTC-4, David Pollak wrote:


On Tue, Jul 5, 2011 at 10:58 AM, TylerWeir <tyler...@gmail.com> wrote:
Just about to say the same thing, saving inside an IO loop will not be close to optimal.

Heh... great minds think alike... and so do we. ;-)

:)

Matthew Pocock

unread,
Jul 5, 2011, 3:25:33 PM7/5/11
to lif...@googlegroups.com
I've had a play about with jprofiler. It shows that the main thread is nearly always waiting for Net IO. Approx 66% of the full application time and 98% of the time when blocking in net io is within java.sql.Connection.commit. Connection.commit is directly called iby net.liftweb.db.DB$class.net$liftweb$db$DB$$releaseConnectionNamed which ultimately is called from save() once for every one of my objects.

I don't know why mysql is finding it so hard to efficiently deal with a large volume of commit() messages. Perhaps I should investigate another rdbms. Is there some tweak I can do to lift that will improve this?

Matthew

David Pollak

unread,
Jul 5, 2011, 3:44:39 PM7/5/11
to lif...@googlegroups.com
On Tue, Jul 5, 2011 at 12:25 PM, Matthew Pocock <turingate...@gmail.com> wrote:
I've had a play about with jprofiler. It shows that the main thread is nearly always waiting for Net IO. Approx 66% of the full application time and 98% of the time when blocking in net io is within java.sql.Connection.commit. Connection.commit is directly called iby net.liftweb.db.DB$class.net$liftweb$db$DB$$releaseConnectionNamed which ultimately is called from save() once for every one of my objects.

I don't know why mysql is finding it so hard to efficiently deal with a large volume of commit() messages.

Because of the rotational latency of physical hard drives and the "D" in ACID.
 
Perhaps I should investigate another rdbms. Is there some tweak I can do to lift that will improve this?

I gave you the answer... wrap the parseAll call inside a DB.use call.
 



--

Matthew Pocock

unread,
Jul 6, 2011, 11:28:12 AM7/6/11
to lif...@googlegroups.com
Hi,

Thanks for the suggestions. Moving to postgresql with the identical code gave me something in the order of a 10x speedup. Wrapping parseAll in DB.use() has got me something like a further 10x, and the Java main thread is now running roughly at 50% with postgresql also at 50%. I suppose there are still some issues with Java blocking for postgresql to do its thing, but they are nothing like they where.

It is not immediately obvious that DB.use is relevant. It takes a connection identifier and runs a function that takes a (super) connection, which I can't pass on to .save(). There's no documentation on DB.use or that I can find through google to indicate that this connection will be used for nested operations like .save().

Thanks for your help, my data now loads in minutes rather than hours.

Matthew

David Pollak

unread,
Jul 6, 2011, 2:36:05 PM7/6/11
to lif...@googlegroups.com
On Wed, Jul 6, 2011 at 8:28 AM, Matthew Pocock <turingate...@gmail.com> wrote:
Hi,

Thanks for the suggestions. Moving to postgresql with the identical code gave me something in the order of a 10x speedup. Wrapping parseAll in DB.use() has got me something like a further 10x, and the Java main thread is now running roughly at 50% with postgresql also at 50%. I suppose there are still some issues with Java blocking for postgresql to do its thing, but they are nothing like they where.

It is not immediately obvious that DB.use is relevant.

Lift commits a transaction when the last block that allocates a DB connection is exited.  By wrapping DB.use around your code, there's a single DB transaction rather than one transaction per record.  Transactions require disk writes and that means that each transaction is limited by the speed of the disk media.  If you have one transaction for all the records you're uploading, the RDBMS can buffer the records and only hit disk when the transaction is committed.
 

Matthew Pocock

unread,
Jul 6, 2011, 5:51:54 PM7/6/11
to lif...@googlegroups.com
On 6 July 2011 19:36, David Pollak <feeder.of...@gmail.com> wrote:
On Wed, Jul 6, 2011 at 8:28 AM, Matthew Pocock <turingate...@gmail.com> wrote:
Hi,

It is not immediately obvious that DB.use is relevant.

Lift commits a transaction when the last block that allocates a DB connection is exited.  By wrapping DB.use around your code, there's a single DB transaction rather than one transaction per record.  Transactions require disk writes and that means that each transaction is limited by the speed of the disk media.  If you have one transaction for all the records you're uploading, the RDBMS can buffer the records and only hit disk when the transaction is committed.
 

That makes things clearer. Is this documented anywhere that is google-indexed other than in this thread? Both you and Frank knew this - was this insider knowledge, or is my google foo lacking? If you tell me where, I can update the lift wiki.

Thanks,
Matthew
 

Derek Chen-Becker

unread,
Jul 7, 2011, 8:46:42 AM7/7/11
to lif...@googlegroups.com
I think this has been discussed before, but it wouldn't hurt to put it in the Mapper section of the Wiki as a note.

Derek

Reply all
Reply to author
Forward
0 new messages