Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Migrating 2+ years of version 7 data takes an hour and counting...
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  16 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Austen Ito  
View profile  
 More options Aug 24 2007, 6:23 am
From: "Austen Ito" <austen....@gmail.com>
Date: Fri, 24 Aug 2007 00:23:52 -1000
Local: Fri, Aug 24 2007 6:23 am
Subject: Migrating 2+ years of version 7 data takes an hour and counting...
Hi all,
  I was testing sending massive amounts (over 2 years worth) of
Hackystat 7 data to the Hackystat 8 sensorbase service and came across
a couple interesting issues:

1. Loading hundreds of thousands of entries in the sensorshell and
sending the data throws an exception.  I found that the
putSensorDataBatch method in
org.hackystat.sensorbase.client.SensorBaseClient creates a huge XML
string that cannot be sent to the server.  I can send data from 5000
or so entries just fine.

My current approach is a bogus one.  In order to avoid the
SensorBaseClientException thrown by sending mass amounts of data, I am
invoking send() after every 1000 entries.

As Aaron says:  akihisa56: that is bogus.

My approach is especially bogus since the exception is due to the size
of the entries rather than the total amount.  If one file has
n-entries with a lot of attributes, the same exception may be thrown.
Philip, is there a way to send a large data representation with REST
in batches that are small enough that the server can receive the data?
 I'm not sure how data is received in REST and if the client's sending
data can "know" what the maximum amount of data that can be sent.

org.hackystat.sensorbase.client.SensorBaseClientException: 1001: Unable to compl
ete the HTTP call due to a communication error with the remote server. Error wri
ting request body to server
        at org.hackystat.sensorbase.client.SensorBaseClient.putSensorDataBatch(S
ensorBaseClient.java:621)
        at org.hackystat.sensorshell.command.SensorDataCommand.send(SensorDataCo
mmand.java:71)
        at org.hackystat.sensorshell.SensorShell.send(SensorShell.java:610)

2. It takes so long to send all of the data that Autosend is invoked
while I'm sending data to the server.  This causes an exception to be
thrown.  An example error case may be if data is being sent to the
server by an ant task in the background and DevEvent data from Eclipse
is sent by Autosend.

Exception in thread "Timer-2" java.util.ConcurrentModificationException
        at java.util.AbstractList$Itr.checkForComodification(Unknown Source)
        at java.util.AbstractList$Itr.next(Unknown Source)
        at org.hackystat.sensorshell.command.SensorDataCommand.send(SensorDataCo
mmand.java:68)
        at org.hackystat.sensorshell.SensorShell.send(SensorShell.java:612)
        at org.hackystat.sensorshell.command.AutoSendCommand$AutoSendCommandTask
.run(AutoSendCommand.java:98)
        at java.util.TimerThread.mainLoop(Unknown Source)
        at java.util.TimerThread.run(Unknown Source)

The above cases are rare.  I don't think people will be sending years
worth of data unless they are migrating over to Version 8.  Migrating
the data will also most likely be a one-shot deal.  I'm going to let
the migration run while I'm sleeping.  Maybe I'll find more
interesting things in the morning. ;)

austen


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philip Johnson  
View profile  
 More options Aug 24 2007, 12:26 pm
From: Philip Johnson <john...@hawaii.edu>
Date: Fri, 24 Aug 2007 06:26:51 -1000
Local: Fri, Aug 24 2007 12:26 pm
Subject: Re: [hackystat-dev] Migrating 2+ years of version 7 data takes an hour and counting...
Hi Austen,

I am really interested in this experience, and want to put some
effort into massaging the SensorShell so that it can deal with this
kind of situation gracefully and efficiently.

My initial thoughts are:

* There's nothing bogus about needing to invoke send() periodically
in order to 'clean out' the buffer.  Indeed, that's much more optimal
than somehow loading up an arbitrary amount of data on the client end
and then, in one gigantic http PUT, sending it to the server.
Instead, by breaking it down into smaller chunks, both the client and
server can be busy at the same time, which should reduce the overall
time required.

* What is bogus, as you and Aaron note, is that the invocation of
send() is based upon the number of entries rather than the size of
the payload.  As you note, this is brittle.   What we want to achieve
is a kind of balance where the client and server are both working
efficiently together.

* I never thought about the AutoSend issue!  The solution is to set
AutoSend to 0 before starting the data migration.

What I'd like to do, if you're agreeable, is the following:

- Let me look at the SensorShell/SensorBase code, make some
adjustments, and then let you know of the new version(s).

- You re-try the test migration, and see if my hacks result in any
improvements.

- When we're satisfied with the results, then we write up a Wiki page
on "Version 7 to Version 8 migration", that documents how to do it
and what people can expect.  For example, no one (including me) would
think that setting AutoSend to 0 would be necessary.

Let me know if you come up with other issues when you wake up.

Cheers,
Philip

--On August 24, 2007 12:23:52 AM -1000 Austen Ito


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Austen Ito  
View profile  
 More options Aug 24 2007, 2:03 pm
From: "Austen Ito" <austen....@gmail.com>
Date: Fri, 24 Aug 2007 08:03:53 -1000
Local: Fri, Aug 24 2007 2:03 pm
Subject: Re: [hackystat-dev] Re: Migrating 2+ years of version 7 data takes an hour and counting...
Hi Philip,
  Sounds good.  I try again when you have massaged the
sensorshell/sensorbase code a bit.  I checked migration in the morning
and it sadly failed due to an OOM exception.  I might have to look at
managing the objects that are loaded into the shell.

austen

On 8/24/07, Philip Johnson <john...@hawaii.edu> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philip Johnson  
View profile  
 More options Aug 26 2007, 5:54 pm
From: Philip Johnson <philipmjohn...@gmail.com>
Date: Sun, 26 Aug 2007 21:54:01 -0000
Local: Sun, Aug 26 2007 5:54 pm
Subject: Re: Migrating 2+ years of version 7 data takes an hour and counting...
First off, kudos to Austen for an amazingly great posting on how to
reproduce the problem:

http://code.google.com/p/hackystat-sensor-xmldata/wiki/DeveloperRelea...

This is textbook.  I was able to reproduce the error perfectly in
about 5 minutes.

Before I dive into this deeper, I do notice one immediate design
problem from looking at the output:

$ java -jar  xmldata-cli.jar -migration ../hackystat-data-test/users
testUser foo bar
Hackystat SensorShell Version: 8.0.825
SensorShell started at: Sun Aug 26 11:39:43 HST 2007
Using Sensor Properties in:
Type 'help' for a list of commands.
Host: http://localhost:9876/sensorbase/ is available.
User ad...@hackystat.org is authorized to login at this host.
AutoSend set to 10 minutes
Processing /hackystat-data-test/users/testUser/data/Activity/
2006-10-08.xml
Processing /hackystat-data-test/users/testUser/data/Activity/
2006-10-09.xml
Processing /hackystat-data-test/users/testUser/data/Activity/
2006-10-10.xml
Processing /hackystat-data-test/users/testUser/data/Activity/
2006-10-12.xml
Processing /hackystat-data-test/users/testUser/data/Activity/
2006-10-13.xml
About to send the following sensor data:
<Timestamp SDT Owner Tool Resource Runtime {Properties}>
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

The question I have is: why did you choose to read in _all_ of the
data before sending _any_ of it?

Could you avoid the problem by simply interleaving the reading and the
sending on a file-by-file basis? So, the output might look like:

Processing /hackystat-data-test/users/testUser/data/Activity/
2006-10-08.xml
About to send the following sensor data:
Processing /hackystat-data-test/users/testUser/data/Activity/
2006-10-09.xml
About to send the following sensor data:
Processing /hackystat-data-test/users/testUser/data/Activity/
2006-10-10.xml
About to send the following sensor data:
Processing /hackystat-data-test/users/testUser/data/Activity/
2006-10-12.xml
About to send the following sensor data:
Processing /hackystat-data-test/users/testUser/data/Activity/
2006-10-13.xml
About to send the following sensor data:

If you're careful to ensure that you make all of the intermediate data
structures available for GC after each process+send cycle (i.e. set
instance variables to null, clear() collection instances, etc), then
this might be about the simplest way to make the system scalable to an
indefinite number of files.

Austen, could you please try that design change out and let me know
what happens?  I'm happy to look into this further if that change to
the design doesn't fix the problem, but to me it's the necessary first
step.

One other small issue:  I noticed when running this that the system
appears to get the <email> and <password> from the command line, but
the sensorbase <host> from the v8.server.properties file.  That seems
a little weird to me.  If we're going to override the
v8.sensor.properties <email> and <password> values, shouldn't we just
go all the way and override the <host> property as well by supplying
it on the command line?

Cheers,
Philip


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Austen Ito  
View profile  
 More options Aug 27 2007, 3:46 am
From: "Austen Ito" <austen....@gmail.com>
Date: Sun, 26 Aug 2007 21:46:23 -1000
Local: Mon, Aug 27 2007 3:46 am
Subject: Re: [hackystat-dev] Re: Migrating 2+ years of version 7 data takes an hour and counting...
Hi Philip,

  I'm glad you were able to reproduce the error.  Here are my replies
to your questions/comments.

> The question I have is: why did you choose to read in _all_ of the
> data before sending _any_ of it?

> Could you avoid the problem by simply interleaving the reading and the
> sending on a file-by-file basis?

The distribution I posted on the wiki was my previous implementation
before the "bogus" one where I would send data every 1000 entries.  I
gave you that one so you could see the exception.  The problem with
reading a file and immediately sending data is that some files have a
large number of entries that will cause the exception to occur.  For
example, I just tested sending data from one Referentia Version 7 data
file that has over 12,000 entries.  That caused an OOM exception if I
didn't increase the heap and the data did not send once I did increase
the heap.

> If you're careful to ensure that you make all of the intermediate data
> structures available for GC after each process+send cycle (i.e. set
> instance variables to null, clear() collection instances, etc), then
> this might be about the simplest way to make the system scalable to an
> indefinite number of files.

The test of sending data every 1000 entries eventually caused an OOM
exception to get thrown about 2.5 hours in.  After we get this issue
resolved, I will need to make sure to cleanup the data structures
correctly.

> One other small issue:  I noticed when running this that the system
> appears to get the <email> and <password> from the command line, but
> the sensorbase <host> from the v8.server.properties file.  That seems
> a little weird to me.  If we're going to override the
> v8.sensor.properties <email> and <password> values, shouldn't we just
> go all the way and override the <host> property as well by supplying
> it on the command line?

Yes I agree that it is a bit weird.  It also felt a bit weird when I
was coding it.  I'll fix that up.

Let me know what you think the next steps should be.  Thanks for the
quick response ;)

austen

On 8/26/07, Philip Johnson <philipmjohn...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philip Johnson  
View profile  
 More options Aug 27 2007, 5:01 am
From: Philip Johnson <john...@hawaii.edu>
Date: Sun, 26 Aug 2007 23:01:40 -1000
Local: Mon, Aug 27 2007 5:01 am
Subject: Re: [hackystat-dev] Re: Migrating 2+ years of version 7 data takes an hour and counting...
Hi Austen,

Thanks for the prompt response.  I took a look at the code and here's
what I observe:

First, the issue can be localized to the structure of a very small
amount of code in the MigrationOption.execute() method.  Basically,
this code can be paraphrased as follows:

for (File sensorDataFile : directory) {
  JAXBContext context = JAXBContext.newInstance(class);

Unmarshaller unmarshaller = context.createUnmarshaller();
  Data data = (Data) unmarshaller.unmarshal(sensorDataFile);
  for (entry : data) {
    massageEntry(entry);
    shell.add(entry);
  }

}

shell.send();

The problem is out of memory errors. There are three basic scaling
issues with this code

[1] The entire contents of each data file must be read into memory at
once (due to the call to unmarshall();
[2] The entire contents of each file is added to the shell before any
of it is sent;
[3] The entire contents of the directory is added to the shell before
any of the data is sent;

Problem [3] is trivial to fix (and I suggested it in my last email),
which is to send the data after each file is loaded. This just
requires moving the send() call into the loop:

for (File sensorDataFile : directory) {
  JAXBContext context = JAXBContext.newInstance(class);

Unmarshaller unmarshaller = context.createUnmarshaller();
  Data data = (Data) unmarshaller.unmarshal(sensorDataFile);
  for (entry : data) {
    massageEntry(entry);
    shell.add(entry);
  }
  shell.send();

}

This doesn't solve the situation in which a single data file is very
large, which unfortunately occurs in your circumstances.

To solve this, you have to address either or both of issues [1] or
[2].  The first thing I would do is some diagnosis.  Get your very
big data file, and run it over your migration code, but comment out
the sensorshell stuff:

for (File sensorDataFile : directory) {
  JAXBContext context = JAXBContext.newInstance(class);

Unmarshaller unmarshaller = context.createUnmarshaller();
  Data data = (Data) unmarshaller.unmarshal(sensorDataFile);
//  for (entry : data) {
//    massageEntry(entry);
//    shell.add(entry);
//  }
//  shell.send();

}

What this will tell you is whether reading the entire data file into
memory (i.e. Issue [1]) is at least one of the sources of the memory
problem. It might be useful to run your system with a few different
heap values, to see just how much heap you need to allocate to simply
read the big XML file into memory successfully.

If the system blows up with the sensorshell code commented out, then
I can suggest two ways to resolve it:

a.  Buy more RAM for your machine (or move to a machine with more RAM
for the migration), enabling you to increase the heap size to the
point where the entire data file can be marshalled into memory at
once. Then your JAXB approach is OK.

b.  Replace the use of JAXB with a custom written SAX event-driven
parser.  What this effectively allows you to do is intermingle the
XML processing of your file with the sensorshell sending of the data,
somewhat like the following:

for (File sensorDataFile : directory) {
  SAXParserFactory factory = SAXParserFactory.newInstance();
  factory.newSAXParser().parse(new File(filename), handler);

}

where 'handler' is a callback to a method that will process a single
data entry:

massageEntry(entry);
shell.add(entry);

The good news with SAX is that the contents of the entire XML file is
never required to be in memory all at once, only a single entry is,
so your code can scale to an arbitrarily large V7 data file without
having to scale your hardware to arbitrarily large heap size. :-)

The final issue, [2] potentially still remains.

Let's say that you discover that if you comment out the sensorshell
stuff, then the MigrationOption does not throw an out of memory
exception even with a nominal setting for the heap size when parsing
the largest of your data files. That would be great. Or, that it does
throw the exception but that you've fixed it by moving to a SAX
parser.

When you now uncomment the SensorShell lines, the out of memory
problem starts up again.  What this means is that you've now isolated
the problem as being Issue [2].

I can think of two reasons for this:

a. The SensorShell is implemented correctly, but given the available
heap size, you are exceeding it by adding too many entries before
sending.  In this case, the solution is simple. Either add more heap,
or invoke send() more frequently such that the problem goes away.
This really isn't bogus, it's just reality. If you give Java 50MB of
heap, then you can't store a 51MB string in it no matter how hard you
try.

b. The SensorShell is implemented incorrectly, such that no matter
how frequently you invoke send(), an out of memory error occurs.
This would be due to the SensorShell not releasing resources
appropriately after a send().  I've looked at the code and it appears
to be OK in this regard, but I could be missing something. To test
this, you just run the system a few times with a counter that invokes
send() after N entries are added.  If the system blows up around the
same time regardless of whether the counter is set to 1000, 100, or
10, then there's a problem with the SensorShell.

All right, I hope that gives you some additional ideas to play with.
Let me know how it goes and what I can do to help.

Cheers,
Philip


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philip Johnson  
View profile  
 More options Aug 27 2007, 2:57 pm
From: Philip Johnson <philipmjohn...@gmail.com>
Date: Mon, 27 Aug 2007 11:57:58 -0700
Local: Mon, Aug 27 2007 2:57 pm
Subject: Re: Migrating 2+ years of version 7 data takes an hour and counting...
It occurred to me that another possible memory problem in your
situation can be created by the fact that the SensorShell echos all of
the data it is about to send (in one single string, no less) prior to
sending.

This behavior was simply a bootstrapping /debugging mechanism, which
is no longer needed now that we have other mechanisms for detecting
sensor data transmission (i.e. the sensordataviewer).

So, I've just removed those lines and committed the changes to the
SensorShell.  Austen, you might want to update your local copy and use
this version in your current development.

Cheers,
Philip


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Austen Ito  
View profile  
 More options Aug 27 2007, 7:24 pm
From: "Austen Ito" <austen....@gmail.com>
Date: Mon, 27 Aug 2007 13:24:51 -1000
Local: Mon, Aug 27 2007 7:24 pm
Subject: Re: [hackystat-dev] Re: Migrating 2+ years of version 7 data takes an hour and counting...
Hi Philip,

I found a huge error on my part.  Read below for more details...

> [1] The entire contents of each data file must be read into memory at

once (due to the call to unmarshall();

This turned out to not be a problem at all.  I commented out the
sensorshell code and invoked the sensor with different heap arguments.
 The large file, the one with 12,000+ entries, only caused an OOM
exception when i dropped the heap argument to less than 20MB.

> When you now uncomment the SensorShell lines, the out of memory
> problem starts up again.  What this means is that you've now isolated
> the problem as being Issue [2].

While I was looking at the code I found a _huge_ mistake on my part.
I was loading sensor shell with a new key-val map after each attribute
in an XML entry was parsed.  This caused the original OOM exception
that would occur before I invoked send().  I can now invoke the
sensor, without increasing the JVM heap size, to read through large
data files.

The problems that still exist are:

[1] Figuring out a good estimate of when to clear the sensorshell
"buffer" by invoking send.  After talking with a coworker, I may
investigate the increase of object sizes to find out a good time to
invoke send().  He pointed me to a link:
http://rrandomized.blogspot.com/2005/09/yahoo-sizeof-function-in-java...
.  This may be a better approach than invoking send after an arbitrary
amount of entries have been parsed.

[2] Finding the reason why an OOM exception still occurs when sending
data for a long period of time.  I decided to send the large amount of
test data using the fixed code and found that an OOM exception occurs
after an hour or so.  I was invoking send after 1000 entries and did
not increase the heap at all.  It may be the case that I can increase
the heap and the OOM will not occur.  In any case,  I'm thinking of
profiling my sensor to see where the problem is.

Due to my error, I think that you are right that SensorShell is
implemented correctly.  I'm going to do some investigation on object
sizes and profiling to see if I can get the sensor working.  I'm
curious to see where the problem is that causes the sensor to blow up.
 It looks like you can go back to work on the high-level analysis
stuff ;)

Thanks Philip.

austen

On 8/26/07, Philip Johnson <john...@hawaii.edu> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philip Johnson  
View profile  
 More options Aug 27 2007, 9:16 pm
From: Philip Johnson <john...@hawaii.edu>
Date: Mon, 27 Aug 2007 15:16:13 -1000
Local: Mon, Aug 27 2007 9:16 pm
Subject: Re: [hackystat-dev] Re: Migrating 2+ years of version 7 data takes an hour and counting...

--On Monday, August 27, 2007 1:24 PM -1000 Austen Ito <austen....@gmail.com> wrote:

> [1] Figuring out a good estimate of when to clear the sensorshell
> "buffer" by invoking send.

I'm not sure whether this is really worth optimizing.  Basically, an "optimal" solution
only saves you a certain number of HTTP request overheads.  HTTP request overhead is
typically very small, so for it to be noticable, you have to be able to reduce it by at
least 2-3 orders of magnitude.  For example, let's say I have to send 1000 entries to the
server.

If I call save() after each entry is added, then I've generated 1000 HTTP requests.

I can reduce by 1 order of magnitude by calling save() after every 10 requests, which
generates only 100 HTTP requests.

I can reduce by 2 orders of magnitude by calling save() after every 100 requests, which
generates only 10 HTTP requests.

And, finally, by 3 orders of magnitude if I call it only once.

Of course, the space-time tradeoff is that you need 1, 2, or 3 orders of magnitude larger
buffers to hold these entries prior to the transmission.

For a file of 10,000 entries, if I send() after every 100 entries is added, I generate
100 HTTP requests.  If I send() after every 1000 entries is added, I only eliminate 90
requests.  If the request overhead is, say, 0.25 seconds, then I've saved 22.5 seconds,
which just doesn't seem like a lot to me.

What I would propose you do is a send() after a "reasonable but conservative" number of
entry additions--maybe 250 or so.  Unless you've got your heap size set really low and/or
your entries have a gimungous number of attributes, that should not cause an out of
memory exception if the rest of the code is working right.  For a big migration, it might
be that you end up waiting around one additional hour for it to complete, but it might
take you 6 hours to write and debug the code to save you that hour of waiting. :-)

> [2] Finding the reason why an OOM exception still occurs when sending
> data for a long period of time.  I decided to send the large amount of
> test data using the fixed code and found that an OOM exception occurs
> after an hour or so.  I was invoking send after 1000 entries and did
> not increase the heap at all.  It may be the case that I can increase
> the heap and the OOM will not occur.  In any case,  I'm thinking of
> profiling my sensor to see where the problem is.

Given that I'm saving you time on Issue [1] by proposing that you set it to 250 and
forget about it, I am hoping you can spend your newly freed up time to problem [2], which
seems more interesting.

I can think of a few scenarios to explain what you're seeing:

(a) After about an hour, the migration code encounters some "weird" data (maybe some
sensor data where each entry is, indeed a few orders of magnitude larger than anything
that came before).  That produces the OOM.  In which case, changing the order in which
the migration mechanism encounters the data changes the time at which it crashes.

(b) It has nothing to do with the data, no matter what order it is encountered, it always
crashes after an hour or so.

(c) 1000 entries, in combination with your heap size, puts the system too near the limit.
Reducing the buffer size to 250 fixes things and the system can run indefinitely.

This looks like a very nice situation in which to try out JConsole:
<http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html>

which comes for free in Java 5.

Once you've connected JConsole to your running migration code, you can generate a chart
of heap size over time:

<http://java.sun.com/developer/technicalArticles/J2SE/jconsole/HeapMem...>

Shoots, it looks like you can invoke the garbage collector occasionally if you want!

When doing these tests, I would try to minimize the amount of output from my migration
code. Maybe just generate a timestamp and the name of the XML data file when it is read
in, and another time stamp when the system completes the process of sending all of it to
the V8 server. (perhaps along with the total number of entries that were processed and
sent in that file.)  Then you can correlate the time info in your JConsole chart with the
data that's being sent, and get a better idea of whether heap is being used up slowly but
consistently, or whether there's a sudden spike in heap usage when it encounters a
specific kind of data file that puts it over the edge.

This sounds like fun.  Wish I was there. :-)  Let me know what you find out!

Cheers,
Philip


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philip Johnson  
View profile  
 More options Aug 27 2007, 9:36 pm
From: Philip Johnson <philipmjohn...@gmail.com>
Date: Tue, 28 Aug 2007 01:36:29 -0000
Local: Mon, Aug 27 2007 9:36 pm
Subject: Re: Migrating 2+ years of version 7 data takes an hour and counting...
Unfortunately, you've now got me thinking about this problem.

I am guessing that the behavior of this application involves the
following cycle:

- spend a few seconds reading in the data and getting ready to send it
- spend a few seconds _waiting_ for the HTTP request to complete.
- go back to reading in the data and getting ready to send it.
- spend a few seconds _waiting_ for the HTTP request to complete.

It occurs to me that it might well be that over the course of a run,
up to half of the time the client is essentially idle, waiting for its
HTTP request to complete.  That's the real cause of inefficiency in
the system.

Now, _after_ you've totally gotten rid of the out of memory errors,
you might want to recreationally think about how to speed this sucker
up.  The goal, I think, is to minimize the time the system spends in
an idle state just waiting for an HTTP request to complete.

There are lots and lots of ways to think about this problem.  One way
would be to have just two threads and kind of alternate between them:
as soon as the first thread emits the send(), the second one goes off
and starts reading in data, and vice versa.

Another way might be to have a pool of N "worker" threads with one
"master" thread, and divide up the work among them. Let's say there
are 10 worker threads and 1000 files to process.  The master thread
divides up the 1000 files into 10 batches of 100 files each, and gives
each thread its own batch of files to process.  In this case, the odds
that all 10 would be waiting at the same time become low.  Of course,
you now want to make sure your server process can handle the
onslaught!

There are a bunch of others--nonblocking HTTP, and so forth.

I never thought migration would be so interesting!

Cheers,
Philip


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Austen Ito  
View profile  
 More options Sep 1 2007, 4:29 pm
From: "Austen Ito" <austen....@gmail.com>
Date: Sat, 1 Sep 2007 10:29:25 -1000
Local: Sat, Sep 1 2007 4:29 pm
Subject: Re: Migrating 2+ years of version 7 data takes an hour and counting...
Hi Philip,
  The XmlData sensor works by increasing the JVM heap size.  The
details can be found at:
http://austenito.blogspot.com/2007/08/xmldata-results-are-in.html

Also, I noticed that the pMap attributes are included in some entries:

pMap="000600018current-statements000026000015current-methods000021300010cla ss-name00060com.referentia.iahm.ststranslator.filedata.StsReportFileData"

Could you tell me the delimiter between each key-value pair so I can
separate them out when migrating the data?  I think that would be
faster than me looking through data files trying to find a pattern ;)

austen

On 8/27/07, Austen Ito <austen....@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philip Johnson  
View profile  
 More options Sep 1 2007, 4:47 pm
From: Philip Johnson <john...@hawaii.edu>
Date: Sat, 01 Sep 2007 10:47:36 -1000
Local: Sat, Sep 1 2007 4:47 pm
Subject: Re: [hackystat-dev] Re: Migrating 2+ years of version 7 data takes an hour and counting...

--On September 1, 2007 10:29:25 AM -1000 Austen Ito

<austen....@gmail.com> wrote:

> Hi Philip,
>   The XmlData sensor works by increasing the JVM heap size.  The
> details can be found at:
> http://austenito.blogspot.com/2007/08/xmldata-results-are-in.html

That's excellent!  Actually, it looks like increasing the heap size
to 1024M was a bit of overkill---while the default (64M) was not
enough, it looks like 128M would have been sufficient.

> Also, I noticed that the pMap attributes are included in some
> entries:

> pMap="000600018current-statements000026000015current-methods0000213
> 00010class-name00060com.referentia.iahm.ststranslator.filedata.StsR
> eportFileData"

> Could you tell me the delimiter between each key-value pair so I can
> separate them out when migrating the data?  I think that would be
> faster than me looking through data files trying to find a pattern
> ;)

Actually, it's easier than that.  First, copy the implementation of
the SensorDataPropertyMap class into your migration code. The JavaDoc
for that class is here:

<http://hackydev.ics.hawaii.edu/hackyDevSite/build_log/20070501/Hackys...>

Then, feed the string from the encoded pMap into the constructor, and
you've now got a SensorDataPropertMap instance you can extract the
data from and then feed into the v8 SensorData getProperties() return
value.

Let me know if you have any questions.

Cheers,
Philip


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Austen Ito  
View profile  
 More options Sep 1 2007, 5:49 pm
From: "Austen Ito" <austen....@gmail.com>
Date: Sat, 1 Sep 2007 11:49:21 -1000
Local: Sat, Sep 1 2007 5:49 pm
Subject: Re: [hackystat-dev] Re: Migrating 2+ years of version 7 data takes an hour and counting...

> Actually, it's easier than that.  First, copy the implementation of
> the SensorDataPropertyMap class into your migration code. The JavaDoc
> for that class is here:

Oh good.  That is _much_ easier.  Thanks Philip.

austen

On 9/1/07, Philip Johnson <john...@hawaii.edu> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philip Johnson  
View profile  
 More options Sep 2 2007, 2:44 pm
From: Philip Johnson <philipmjohn...@gmail.com>
Date: Sun, 02 Sep 2007 18:44:40 -0000
Local: Sun, Sep 2 2007 2:44 pm
Subject: Re: Migrating 2+ years of version 7 data takes an hour and counting...
So, Austen, now that you've solved the memory issues and are cleaning
up details (i.e. property maps), it would be very interesting (at
least to me) to insert some timing code to find out what proportion of
time is spent waiting on HTTP requests (I suspect it could be as much
as 50% of the execution time).   It would also be interesting to see
how much time, on average, it takes to migrate a single sensor data
entry.

If you're interested in checking this out, it should be pretty
simple.  First, create a long that you increment each time you migrate
a single sensor data item.  Second, create a Date instance when you
start execution and right before you end execution.  Subtract those
two to get total wall clock time in milliseconds.  Divide that by the
total number of sensor data items to get average milliseconds per
sensor data item migration.

Finally, generate two Date instances on either side of the
SensorShell.send() call, then subtract them immediately after each
send() to find out how long the HTTP request took to process. Keep a
running tally of those milliseconds in some global counter.

Just before executing, print out the statistics:  how many
milliseconds total, how many sensor data transmissions total, how many
milliseconds per sensor data transmission, how many milliseconds total
spent waiting for send() to complete, and what percentage of the total
time was spent waiting for send().

It might be interesting to do a couple of test runs with a subset of
the data where you change the number of sensor data entries that you
buffer before sending from 250 to maybe 100 and then maybe 500, just
to see if this changes these runtime characteristics.

I'm hoping this is just 15 minutes of coding, and the benefit is that
it gives us a concrete sense for how much we could speed up
performance of the SensorShell by implementing a multi-threaded
solution to reduce the wait time for the client.   I could see this as
being helpful to us, not only in migration scenarios, but anytime
there are large amounts of data being sent to the server.

If you don't get around to doing this, no worries!

Cheers,
Philip


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Austen Ito  
View profile  
 More options Sep 3 2007, 4:27 pm
From: "Austen Ito" <austen....@gmail.com>
Date: Mon, 3 Sep 2007 10:27:27 -1000
Local: Mon, Sep 3 2007 4:27 pm
Subject: Re: [hackystat-dev] Re: Migrating 2+ years of version 7 data takes an hour and counting...
Hi Philip,
  I did the test runs per your request.  It seems that the majority of
the time is spent sending the data.  I do believe that a
multi-threaded solution would not only be useful but fun as well.

I ran 6 test cases, each sending 33417 entries and varying buffer
clearing entry amounts.  Here are the stats:  (I rounded all numbers
to the nearest tenth decimal place)

First run: 1 entry buffer size
Average Time Per Migration: 45.9 milliseconds
Total Time Spent Sending: 1495070 milliseconds
Total Percentage of the Time Spent Sending: 97.4%
Total Execution Time: 1535025 milliseconds

Second run: 100 entry buffer size
Average Time Per Migration: 5.8 milliseconds
Total Time Spent Sending: 175604 milliseconds
Total Percentage of the Time Spent Sending: 91.0%
Total Execution Time: 193101 milliseconds

Third run: 250 buffer size
Average Time Per Migration: 5.5 milliseconds
Total Time Spent Sending: 166854 milliseconds
Total Percentage of the Time Spent Sending: 90.7%
Total Execution Time: 183999 milliseconds

Fourth run: 500 entry buffer size
Average Time Per Migration: 5.0 milliseconds
Total Time Spent Sending: 152260 milliseconds
Total Percentage of the Time Spent Sending: 90.7%
Total Execution Time: 167914 milliseconds

Fifth run: 1000 entry buffer size
Average Time Per Migration: 4.9 milliseconds
Total Time Spent Sending: 147598 milliseconds
Total Percentage of the Time Spent Sending: 90.3%
Total Execution Time: 163308 milliseconds

Sixth run: 5000 entry buffer size
Average Time Per Migration: 4.5 milliseconds
Total Time Spent Sending: 136131 milliseconds
Total Percentage of the Time Spent Sending: 90.2%
Total Execution Time: 150883 milliseconds

Assuming my calculations are correct,  it seems that the majority of
the time is spent waiting on HTTP requests.  The results seem to
correlate with each other because when reading larger amounts of data
before sending, the total execution time drops.  When the amount of
requests are increased, for example by sending a request after 1
entry, the execution time balloons. Although the the execution time
drops, the rate of change from the 2nd test run on is much less.

Hope that is helpful.  If the results seem bogus, I can go ahead and
commit another developer release with my benchmarking code.  Let me
know if you have any questions.

austen

On 9/2/07, Philip Johnson <philipmjohn...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Philip Johnson  
View profile  
 More options Sep 3 2007, 5:43 pm
From: Philip Johnson <john...@hawaii.edu>
Date: Mon, 03 Sep 2007 11:43:55 -1000
Local: Mon, Sep 3 2007 5:43 pm
Subject: Re: [hackystat-dev] Re: Migrating 2+ years of version 7 data takes an hour and counting...
Hi Austen,

Fascinating!  Here's what I'm seeing:

* The results seem internally consistent.  For example, the changes
in average time per migration correlate quite well with the buffer
size, which indicates to me that your counters are working correctly.

* I guessed "as much as 50%" of the execution time would be waiting
on send(). I was off by a factor of 2---it spends almost 100% of the
execution time waiting on send(). :-)

* Changing the buffer size from 250 to 5000 has about a 20% impact on
overall throughput.  On the other hand, if we could change the system
so that a thread is always processing data and the client is never
completely paused waiting for a send() to complete, we could
theoretically improve throughput by 900%.  (Under the assumption that
the server would service requests just as fast even though the load
on it would be substantially higher.)

* Our baseline right now (with a 250 buffer size) is 5.5 milliseconds
per migration of a single sensor data entry, or about 11,000 sensor
data instances per minute. That's really not too shabby by itself and
I am sure way better than Version 7.

Thus, a non-multithreaded solution that sets the buffer size at 250
is quite reasonable in terms of minimizing the required heap size yet
getting decent performance.  Austen, I would recommend that you
complete the SensorDataProperty stuff, package everything up, and
make a "single threaded" release so that we can start to build on
that version.

Once we've got a stable, functionally complete version of the system
in single thread mode, then the fun can begin on throughput
optimization.  If it were me, I might start by trying a solution that
divides the files into N batches and spawns an individual thread to
process each of them.  I would also make sure to put my SensorBase on
a separate machine so that the experiments are not affected by
context switching back and forth (unless you're running on a quad
core system or something.)  Collect the same data on a per thread
basis, but also get the overall start and end times.  Then re-run the
system, starting with N=1 (the base condition that should replicate
your current findings), then N=2, 3, 4, 5, 6, 9, 12, 15, and 20.  You
might even want to make 2 or 3 runs at each N just to make sure the
results are consistent for a given N.

What I hypothesize you'll find is the following:

* For small N, speedup should be almost linear.  The per thread data
on time average time per migration should not change as N increases
(indicating the server can handle the load for small N.) Individual
threads will continue to spend almost 90% of their time waiting,
which is fine.  The overall client, however, will complete much more
quickly because while one thread is blocked, another is doing
something.

* As N gets higher, two things should happen at some point: (1) The
server starts to feel the load, which will be evidenced by the
average time per migration for an individual thread getting higher.
(2) The overall client is now working near 100% of the time and thus
further increases in N don't utilize it any more effectively.

If this is true, then spawning just 3 threads, for example, will
result in up to a 300% improvement in throughput, to about 30,000
sensor data instances per minute.

If this is actually what happens, then we should (a) make a new
release of the multi-threaded migration package because it works, and
(b) start thinking about how to make a multi-threaded strategy
available to other applications who need high throughput.

Cheers,
Philip


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google