Hi,
Extending John’s mail on data for performance testing, I want to add my views to the questions raised by him.
First I will try to define the problem
Aim:
the aim of this exercise was to generate sufficient data to do load tests for Mifos in particular collection sheet entry
Problem definition:
1. Identify a method to generate sufficient data which can be used as a baseline for the test(750,000 to 1,000,000 records). The data should be heterogenenous
and realistic ie should represent how production data will be in MFIs.
2. Identify a method to generate data required for the test. The time taken by this should be less, to make sure
multiple tests can be planned in a day(The expected size of Mifos DB with 1M records will be 100 GB).
3. Identify a method to rollback the database to the baseline state after each test. Again this should be quick enough to enable executing
multiple tests
Approaches for data insert:
Few of the approaches we discussed are
1. Create Business Objects using Java and persist them
Pros:
Easy maintenance
Cons:
Needs time to develop
Might be slow
Might not be useful to some who can do Performance testing and don’t know Java
2. Extend the currently used stored procs to support these requirements
Pros:
Will be the fastest
easy to modify in case any changes are required
Cons:
Maintenance is difficult
Knowledge of MySQL procs needed
Approaches for rollback:
1. Create delete scripts/procs for the data inserted
pros:
the fastest way to drop the data
cons:
what data to be deleted should be known to the tester
2. Rollback using the backup of the baseline data
pros:
will ensure the bug free rollback
cons:
might be the slowest of all methods. Restoring a 100 GB database will take days to complete
3. Shadow copy and restore the folder where MySQL DB files are stored
This is yet to be explored
Let me know your ideas/suggestions on this.
Aravind Deivendran • Project Lead • SunGard • Technology Services • 6th Floor, Embassy Icon, #3, Infantry Road, Bangalore, India
Tel +91 80 30913200 3144 • Mobile +919980962300 • www.sungard.com/sts
Thanks for kicking off this conversation, Aravind. I hope others who
might have ideas will be willing to share their suggestions or previous
experiences which might help direct our efforts.
>Approaches for data insert:
>Few of the approaches we discussed are
>1. Create Business Objects using Java and persist them
Seems like there are dependencies that must be followed for this to work
-e.g. must create an office before creating a center, etc. Maybe this
aligns with the same sequencing that the current stored procedures
follow? I like the potential of being able to create data on the fly,
which would possibly assist other testing efforts including our
automated acceptance tests.
I also like this option because it helps test our application from the
business layer and below.
> 2. Extend the currently used stored procs to support these
requirements
Any thoughts on how to incorporate the associated transactions,
scheduled meetings, etc. into stored procs that would come along with 2x
users?
A couple of other approaches just to throw into the list. I wanted to
mention in case others have good experiences with one of these ideas:
3. We discussed taking existing customer data, and trying to externally
manipulating it to grow it in size. This ensures very realistic data,
but I have concerns about whether we could do this accurately, and
manipulation of such large files would be difficult.
4. One used previously by IBM for performance testing - generating test
data by driving the UI using a testing tool like Load Runner or
Selenium. This seems slow when trying to build such a large dataset.
5. Use an external tool that is build for generating data? I don't have
experience with these tools. One example would be "Advanced Data
Generator" - http://www.upscene.com/products.adg.index.php.
>Approaches for rollback:
another approach to add -
4. If we're using a virtual environment, we don't need to rollback the
data right? When the test is finished, we just shutdown the virtual
image and save the time required to do the rollback.
Jeff
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
> >Approaches for data insert:
> >Few of the approaches we discussed are
> >1. Create Business Objects using Java and persist them
Way back during the September discussion I put up a patch that
generated test data using a business object creation method used in
some tests (TestObjectFactory). At the time Van said there were
problems with the way that worked. He was right. It is inaccurate
and shouldn't be used in this case.
Keith Woodlock was working on another, easier way to create centers
and customers etc for testing. I've used his method for some tests
but I don't think it has been widely enough implemented to build the
test data you are talking about.
So, as far as I can see, at the moment the business object creation
method wouldn't be available unless coding work was put into it.
On the 'restore'. Because rollback can take so long for large
insertions/deletions/updates, I still can't look past the operating
system 'file copy' type operation Adam M originally mentioned. I've
haven't ever done shadow copying but certainly it is common place
(for speed) in testing on top of many database systems to just copy
over the folder(s) / file(s) with the base data.
bit more soon.
John
Just wanted to jump in on this discussion and respond to some of the
points made.
This area of load/performance testing is vitally important and when
done on top of proper unit, integration and acceptance/functional
testing, gives us the confidence we need to release to production.
Aravind provided us with a description of the aim of what is trying to
be achieved:
> Aim:
>
> the aim of this exercise was to generate sufficient data to do load tests
> for Mifos in particular collection sheet entry
>
but I would change the aim to be something like:
The aim is to be able to do load/performance testing on the mifos
application. By making this easier/cheaper to do it will enable
load/performance testing to be executed more often (and ideally as
often as we run our other test suites).
As Aravind points out, this testing has to be done of top of a large
dataset as this more reflective of production environments and will
catch problems such as:
1. Database problems like no indexes etc
2. Hibernate usage or Query/Algorithm problems
Ideally this large dataset minics closely data that is seen in production.
So this brings us to Aravind's first problem:
> Problem definition:
>
> 1. Identify a method to generate sufficient data which can be used as a
> baseline for the test(750,000 to 1,000,000 records). The data should be
> heterogenenous
>
> and realistic ie should represent how production data will be in MFIs.
>
I can thing of two ways of getting here:
1. We just take a copy of a current large production dataset that
has sufficient size (1M, 2M records or whatevers needed). Of course
might be data issues here etc.
2. We create all the data needed!
- I am not sure that time is an issue here. Even if it takes two
weeks to generate all the data needed (using whatever technique,
leveraging current domain model and services or by using stored
procedures). Because at the end of it we have our asset that we can
call our baseline and just keep reusing it. We won't need to keep
recreating the large dataset constantly.
Aravind's second problem was:
> 2. Identify a method to generate data required for the test. The time taken
> by this should be less, to make sure
> multiple tests can be planned in a day(The expected size of Mifos DB with 1M
> records will be 100 GB).
>
This is where we run into the question of what approach to take to get
the data we needed to be inserted on top of our 'baseline' data. The
inserted data for the test must also be reflective of how the
application works and reflective of the type of data typically seen in
production. The example being used was the 'Collection sheet entry'
work.
In the end, I believe stored procedures were created to get the data
into a database to perform the load/performance testing. As john
pointed out in the previous mail, we showed an example of using java
to insert the data. The problem with that approach was the leveraging
of 'TestObjectFactory' which does not duplicate how the application
works correctly. But there was no problem with speed really, it was
pretty quick to get a 'good enough' dataset into the database so tests
could be carried out on it. In reality we should of being doing
exactly what the application code does and leveraging its domain model
and services for creating the 'collection sheet entries'. The code for
doing this did exist in the application code, it just wasn't nicely
put together in a 'servicey' way that was obvious and easy to use. I
believe John has being putting the finishing touches to that.
So in short, the application already inserts data into the database so
the code does exist that can be called. The problem is that much of
the business logic is typically up at the Action/Struts layer and not
presented nicely in any 'service' but you could just call the struts
method if needed.
The alternative approach of using stored procedures while it too will
work is very problematic in my view because:
1. It needs to duplicate the domain/business logic that already
exists with the java code.
2. It will take time to create this procedure and more time to
verify it works as the application does!
When we change/refactor any code in java land we will have to be
mindful of the performance tests and may break them without knowing
thus introducing another reason to be afraid of refactoring the domain
model.
So basically I am completely against the stored procedure approach but
maybe we should have a more thorough discussion on it before making a
decision on which way to go.
John also said the following:
>
> Keith Woodlock was working on another, easier way to create centers
> and customers etc for testing. I've used his method for some tests
> but I don't think it has been widely enough implemented to build the
> test data you are talking about.
>
> So, as far as I can see, at the moment the business object creation
> method wouldn't be available unless coding work was put into it.
>
The approach john is talking about is a 'Builder' approach to creating
complex domain objects. The purpose of these is purely to clean up the
unit/integration tests and make them more commutative and less
fragile. They are not intended for any other use. As I have stated
earlier, any approach using Java should leverage the production codes
domain/services and not any test code.
For more on these just read Nat Pryce's blog entries:
http://nat.truemesh.com/archives/000714.html
As for rollback/restore I also agree with what john is saying below
about what Adam M was saying.
>
> On the 'restore'. Because rollback can take so long for large
> insertions/deletions/updates, I still can't look past the operating
> system 'file copy' type operation Adam M originally mentioned. I've
> haven't ever done shadow copying but certainly it is common place
> (for speed) in testing on top of many database systems to just copy
> over the folder(s) / file(s) with the base data.
>
Hope this is some bit helpful.
Regards,
Keith W
Folks,
Elaborating on this idea a bit, if we placed the business objects in a
separate jar, we could write programs that used them.
To do this, one simple way would be to place all the classes in the
application/ module into a jar. This seems like a straightforward
change in maven. Another way to do this would be to make a separate
module that held these objects.
An external program could use multiple threads to do concurrent
inserts at high speed.
-adam
--
Adam Feuer <adamf at pobox dot com>
However, I realize I'm overly biased towards "collection sheet save"
in GK. So, I find it hard, at the moment, to see the bigger
scaleability picture. But I'm going to try to.
Having declared my bias, I also see myself as a customer of the great
load testing asset that Sungard has created.
Currently, I want to be able to measure the value (or not) of changes
I've been making and intend to make to the save collection sheet
process. For me, the existing GK data (with 5 to 10 concurrent
users) is optimal (that's my belief... I'm often wrong). Even the
adhoc data from one GK center I got from Raghav pinpointed a few
obvious and significant areas of improvement that I hadn't seen by
looking through the code or from the generated data.
So my initial thought is that my needs (or the needs of anyone who is
interested in GK collection sheet save performance) can be satisfied
by allocating maybe a 100 active GK centers to each of the 5 or 10
concurrent users. Whether this meets other needs or future needs I don't know.
I understand that the stored procs can be changed to reflect 'aged
data' but that's merely responding to the lesson learnt from looking
at the GK data so you might as well be using GK data (the more recent
the better).
Maybe its because of my bias but I can't get my head around why it's
valuable to generate new data on top of the GK baseline dataset?
If there is a requirement that the data generation be more generic
than GK, fair enough, except that so far the data generation has been
tailored to the GK case. e.g. its basically repayment processing
with no fees or charges or savings accounts that other users may make use of.
John
>Personally, I think you could simultaneously implement a number of
>approaches to performance testing depending on the 'question' under
>test. I'm still not solid on the 'question'. I've imagined that the
>question is to measure how scaleable current and future mifos is...
>but I'm not sure.
>
Hi John,
Good observation! What is the question we're trying to answer? In
short, we are trying to jump ahead of our largest customers to ensure
that when they do arrive at 500,000 or 1 million clients we'll have good
understanding how Mifos will scale. Hopefully this draft scalability
test plan will give better context to this discussion -
http://www.mifos.org/developers/testing/scalability-test-strategy.
Sorry for not mentioning this document on the list sooner.
To optimize a specific area, the existing test data sets may be
perfectly fine. But we'd also like to get a grasp for what the Mifos
performance will be for defined scenarios as we grow the database to 2x
or 4x its current size.
Appreciate any feedback on the document mentioned above.
Regards,
Jeff
http://www.mifos.org/developers/testing/scalability-test-strategy.
John,
The two primary "stress points" that I see are:
* the Collection Sheet Entry flow - used by the bank branches to enter
their daily transactions
* the batch jobs - that do various calculations
When we increase the number of clients, we increase the number of
branches that are doing data entry, and we increase the number of
accounts that the batch jobs need to iterate over.
Do you see other primary (business critical and already on the verge
of breaking) stress points?
Some secondary places increased clients puts stress on the system are:
* Slow running time for reports
* Slow running time for various operations (adding a client or
account; adding a custom field; etc.)
We're mainly concerned about the primary stress points, but we do want
to collect data about the secondary ones.
What are your thoughts?
cheers
adam
--
Adam Feuer <adamf at pobox dot com>
------------------------------------------------------------------------------
I think we should add an additional stress point of entering individual
transactions through the standard loan detail page. Not all MFI's are using
the collection sheet report and are doing more of a teller model (ie, ENDA).
Ryan
On 11/23/09 13:40, "Adam Feuer" <ad...@pobox.com> wrote:
> On Wed, Nov 18, 2009 at 9:29 PM, John Woodlock <jo...@nassles.com> wrote:
>> However, I realize I'm overly biased towards "collection sheet save"
>> in GK.
>
> John,
>
> The two primary "stress points" that I see are:
>
> * the Collection Sheet Entry flow - used by the bank branches to enter
> their daily transactions
> * the batch jobs - that do various calculations
>
> When we increase the number of clients, we increase the number of
> branches that are doing data entry, and we increase the number of
> accounts that the batch jobs need to iterate over.
>
> Do you see other primary (business critical and already on the verge
> of breaking) stress points?
>
> Some secondary places increased clients puts stress on the system are:
>
> * Slow running time for reports
> * Slow running time for various operations (adding a client or
> account; adding a custom field; etc.)
>
> We're mainly concerned about the primary stress points, but we do want
> to collect data about the secondary ones.
>
> What are your thoughts?
>
> cheers
> adam
--
Ryan Whitney
Mifos Technical Program Manager
rwhi...@grameenfoundation.org
Mifos - Technology that Empowers Microfinance (www.mifos.org)
Our mission is to enable the poor, especially the poorest, to create a world
without poverty.
<http://grameenfoundation.org/take-action/ingenuity-fund-challenge/>
P please consider the environment before printing this e-mail.
1. You mention data volume, transaction throughput, and time as 3
factors. I'd like to confirm I see what you mean by those three. For
the data volume question, it's essentially what is the response for
single user with current data size vs. 2X or 4X? Scenarios that might
fall in this category include batch job performance (more records to
process) or actions against large data like the issue Jakub fixed for
Gazelle B - https://mifos.dev.java.net/issues/show_bug.cgi?id=2410.
Thanks Jakub!
2. I'm not sure I quite follow the time factor. When you say compare a
2008 database to a 2009 database, I think you mean the transactional
data (for example) has grown for a given account? How is that different
that the data volume factor?
3. I like the idea of making the data bigger by adding generic test
data and then acting on the actual GK data.
Much to discuss here for sure!
Thanks,
Jeff
1. You mention data volume, transaction throughput, and time as 3
factors. I'd like to confirm I see what you mean by those three. For
the data volume question, it's essentially what is the response for
single user with current data size vs. 2X or 4X? Scenarios that might
fall in this category include batch job performance (more records to
process) or actions against large data like the issue Jakub fixed for
Gazelle B - https://mifos.dev.java.net/issues/show_bug.cgi?id=2410.
Thanks Jakub!
2. I'm not sure I quite follow the time factor. When you say compare a
2008 database to a 2009 database, I think you mean the transactional
data (for example) has grown for a given account? How is that different
that the data volume factor?
Hi!
I was just looking at some opensource tools available for data generation. .Came across this http://databene.org/databene-benerator/ .. also there are some other generators that are mentioned in this site. .. Wanted to know if anyone has used any of these, and how complicated it is configure it to actually generate data for a relational database such as mifos? The documentation seems pretty extensive.
regards
Chandan
Hi Chandan,
This does look like a nice tool. I too would be interested in hearing if anyone has used this tool or a similar tool on testing projects.
The drawback I see with using a data generator for Mifos is the business rules we have around building loan schedules, meeting schedules, etc. In talking with other test engineers that have done a lot of test data generation, they have recommended to me that we use the business layer (i.e. an API) to generate test data as it’s a more robust method and also stays in sync with changes to your application over time. For an application that has more basic data model – e.g. a set of customers and their purchases – a data generator seems like the right answer.
Regards,
Jeff
Hi
In this thread john had briefly mentioned of how there was a testobjectfactory class but with its share of problems. So I kinda got curious and asked him about it. I have quoted his reply in this mail.
He had pointed out one problem regarding the fees, I was able to fix that. I added an extra function that could retrieve the fee objects if they already existed instead of creating a new object every time
public static List<FeeView> getFees(List<Short> feeIds) //I pass fee ids that need to be used {
List<FeeView> fees = new ArrayList<FeeView>();
FeeView fee;
for (Short Id : feeIds) {
fee=new FeeView(getContext(), testObjectPersistence.getFee(Id));
fees.add(fee);
}
return fees;
}
This fixes the problem with the fees ( Maybe getFees() can be made a bit more intelligent and not create new objects all the time.. that would be simpler I guess)
But as john suggests, there are apparently more problems with using this class? Couldn’t really make any headway there ( apart from the fact that there are a few functions that would be useful, that can be added, if a similar class is used to generate data)
Thank you and Regards
Chandan
---------------------------------------------------------------------------------------------------------------------------------------------------
From: John Woodlock
[mailto:john.w...@gmail.com]
Sent: Monday, January 11, 2010 5:17 PM
To: Rao, Chandan
Subject: Re: regarding testobjectfactory
Chandan,
Yes. A while ago I put up a patch for generating data using the
TestObjectFactory approach.
http://groups.google.com/group/mifosdeveloper/msg/4bfb45e5ef067ada
I think Van mentioned he knew of a few problems with using this
TestObjectFactory approach and I think Keith W felt it was better to use the
production objects (or apis but there's few of those).
Afterwards, I came across a problem or two with the TestObjectFactory whilst
using it in making integration tests. Unfortunately, I can't recall
exactly what they were (date related I think) except I think it was in the loan
area (it only creates 6 schedules but that wasn't my problem). They might
or might not matter in data generation.
The specific problem I had with my own data generator patch was running out of
numbers for fee ids! (the primary key is a Short) From what I remember my
use of TestObjectFactory was creating (under the hood) a new fee for each loan
and for each customer account I think. So I came to a halt far quicker
than I would have liked. I got about 300 nice big centres which was
ok for me but not for GK size.
Fees are not really present much in GK so maybe there's a use of
TestObjectFactory that doesn't need to create fees. I haven't really
looked into it since.
John
On Mon, Jan 11, 2010 at 5:55 PM, <Chand...@sungard.com> wrote:
Hi John
In the data generator thread you had mentioned that there was a problem with using the testobjectfactory class. I didn’t really get you there.. does the problem still exist and could you clarify what exactly the problem is?
Thank you and Regards
Chandan
Chandan Rao H • Associate Software Engineer • SunGard Technology Services •Embassy Icon, 6th Floor, Infantry Road, Bangalore 560001 India • Tel +91-80-2222-0501 Extn:3240 , Mobile +91-9686601284• http://www.sungard.com/sts
Email: Chand...@sungard.com
Error! Filename not specified.
CONFIDENTIALITY: This email (including any attachments) may contain confidential, proprietary and privileged information, and unauthorized disclosure or use is prohibited. If you received this email in error, please notify the sender and delete this email from your system. Thank you.
Hi Chandan,
TestObjectFactory is used in the Mifos integration tests. Although it constructs objects that can be used to test particular parts of the application, there are some methods which construct objects that are not in a valid state and/or do not follow the business rules. This happens when data has been forced into an object to test some particular scenario.
In some cases, forcing data into an object allows a particular feature to be tested in a valid way, in other cases tests based on data manipulation like this are suspect. When considering using TestObjectFactory for generic data generation, great care would need to be taken in order to avoid using methods that do this kind of forcing data into an object for a particular testing purpose.
In general our plan is to move away from the use of TestObjectFactory and build a cleaner, easier to use and maintain alternative.
Keith Woodlock proposed and starting work on using the Builder pattern as a basis for constructing test objects. The original intent of this approach was for generating data for unit tests and integration tests, but it also seems promising as a way of generating generic test data for use in performance testing.
Take a look at the set of classes such as ClientBuilder, FeeBuilder, LoanAccountBuilder and the like (if you do an “open type” in eclipse and search for “org.mifos.*Builder” you’ll get of list of them. There is still additional work to be done on these and there will probably be some distinctions that need to be made between data constructed for in memory unit tests vs. database based data.
Some code in TestObjectFactory could well be used to help understand what Builder classes will need to do. The nice thing about the Builder pattern is that it provides a clean way to construct both objects using many default settings and objects with very particular settings without having the explosion of similar methods each carrying various defaults that can be seen in TestObjectFactory.
--Van
From:
Chand...@sungard.com [mailto:Chand...@sungard.com]
Sent: Tuesday, January 12, 2010 4:33 AM
To: mifos-d...@lists.sourceforge.net
Subject: Re: [Mifos-developer] Data generator for Mifos performance testing
Hi