How to speed up jbackup

156 views
Skip to first unread message

jawwi

unread,
Jan 20, 2010, 7:03:45 AM1/20/10
to jBASE
Dear All,

We are using jbackup for backing up data files through the following
commands,

find bnk -print | jbackup -S/home/itops2/stats -f/temenos/backups/
jbkp-17112009 -v 2> /home/itop
s2/Aviion_backup_log

jbackup is taking too long almost 3 hours to backup the entire data
files which is 72 GB in size.

Is there any method or procedure or switch thats speed up the process
and also compress the backup file

Stats:

OS : Hp-UX 11i V2
DB : JBASE 5.0.16


Many Thanks in advance.

Jim Idle

unread,
Jan 20, 2010, 11:58:48 AM1/20/10
to jb...@googlegroups.com
Use something other than HPUX? ;-)

Why not just pipe the backup stream through 7zip or bzip2? Bzip2 is pretty good when you can tell it there will be a lot of data for it to compress.

IN terms of speeding up, well that is as fast as jbackup can do it to be honest. There are other ways to backup though. Why not use transaction journaling and backup continuously? If you can bring the system offline or use mirror breaking, then so long as you are assured that there are no writes going on to the database, then you can use raw backups such as disk imaging, tar and so on, which are much faster as they are not formatted backups. Also, jbackup isn't very sophisticated.

Jim

Greg Cooper

unread,
Jan 21, 2010, 10:17:07 AM1/21/10
to jb...@googlegroups.com
I think your problem is most likely to be badly sized files .... read on ....

On my Linux machine, I can backup a 1Gb file, which contains over 4 million items (a total of 617 Mb of data) in 2.5 seconds. That is pretty fast in anyone's language. So your problem isn't jbackup as such.

In the above case, the 1Gb file is pre-fetched in memory so no disc I/O to retrieve the data, and the output is to /dev/null , so no disc I/O to save the output. The intention is just to show the processing power of jbackup when disc I/O isn't an issue.

I did further tests on 3 different files like this

file 1. Very large file, but with very little real data in it.

file 2. Medium size file, perfectly sized.

file 3. Medium size file, not well sized, some of the data is out of group (more explanation on this later)

Without going into too much detail, file 1 and file 2 performed extremely well, with disc I/O being the limiting factor. When using jbackup with the output to /dev/null, so I only used disc I/O to retrieve data, I had a throughput of around 68 Mbytes / second.

file 3 performed very badly. The file wasn't sized too badly, no more than one extra frame per group, but it made a serious impact. Using the same jbackup test, I only achieved around 5.5 Mbytes / second.

Hence, I really think poorly sized files are the issue here. This will also impact your live on-line application as well.

Now I'm not sure if you know what I mean by 'badly sized files' or why this should impact your backup so badly, so I'll assume you understand neither and will try to explain both issues. My apologies if you already know all this and it sounds condescending.

First, what is a "good sized file" ?

When you create a jBASE file (we'll ignore the new JR file type for now), you have to provide a modulo. This is the number of 4096-byte frames of data to allocate initially.

So if you have 1 Gb of data, the modulo you use when you create the file would be approximately 1000000000 / 4096. Approximately. As there is overhead in the file and you are trying to avoid linked frames (more on this later), then the modulo would be more like 1000000000 / 3000.

If you do this, then as you fill the file you would find it all fits neatly within the space allocated. Which is contiguous disc space.

So far so good.

When you use jbackup for this, it will traverse the file reading each 4096-byte frame consecutively. With a bit of luck, these consecutive frames will be physically consecutive sectors on the disc (usually, but not always). Most OS and disc hardware have look-ahead cache, and so because you are reading consecutive sectors from the disc, the read will be very very fast as much of it comes from the look-ahead cache which the disc performs on your behalf for precisely these occasions.

Now lets go back to your file and add some more data. Not much more. Maybe just 20% more. Well, those 4096 byte frames will be pretty full of data already. Now you are trying to add more data than can fit into a 4096 byte frame. You can't fit a quart into a pint pot, as they say in England. So when the 4096 byte frame fills, jBASE will allocate another 4096 byte frame from the end of the file and link the two together. This means we now have 8192 bytes to fit the data in, of which the first 4096 bytes is in the initial space you allocated with the CREATE-FILE, and the second 4096 bytes is in "secondary" space, which we allocate at the end of the file.

Although the file only has say 20% more data than was intended, the consequences for jbackup are disastrous, and it isn't really the fault of jbackup.

What happens is this. jbackup will look at the first 4096 byte frame and read that in. It will see there is another 4096 byte frame linked to it, so it reads that in. The problem is that the second frame on disc is on a sector that is nowhere physically near the first frame on the disc. So the disc heads have to move. And it can't use the look-ahead cache for it.

Whereas in the first case, jbackup scanned a "good sized" file and so the disc I/O mostly came from look-ahead cache with very little movement of the disc heads, in this second case we have to continually move the disc heads back and forth and can't make much use of the look-ahead cache. Hence the slow jbackup

End of lecture !

What can you do about it?

First, you could try re-sizing your files. This is always a good option as badly sized files only only impact your backup, but impact your on-line users as well.

Secondly, if you have mirror discs (and if not, why not?), you could pause the database, split the mirror, and use a standard OS backup tool to back up the off-line version of the files. Whatever you do, if your files are on-line and active, please only use jbackup. The use of OS backup tools should only be used on jBASE files that are not currently being used. The advantage of using OS tools though is that they know nothing about linked frames and so will simply backup contiguous frames using the look-ahead cache effectively.

Do you have the backup being stored on the same disc as your data? If so, then you have a double problem. First, if your disc crashes, you lose the original data AND you lose your backup. Secondly, when doing the jbackup, the disc has to work twice as hard as it not only reads the original data but has to write the backup on the same hard drive. Given the problems I explained about moving the disc heads, and it compounds the problem even more!

One final note. You aren't using the new JR files that were introduced with jBASE release 5 are you? My initial testing with these file types gave extremely poor performance. This might have been fixed with a later release, but you should be wary.

Hope this helps.

Let me know if there is anything else I can clarify.

Regards

Greg

jaro

unread,
Jan 22, 2010, 1:00:45 PM1/22/10
to jBASE
I can't believe the backup of 1GB file in 2.5 seconds on linux or any
other system. possibly in the memory only but usually you can't keep
the whole database in the memory.

however, to speed up your backup and compress it you can use:
tar -cf - bnk | gzip -1 > filename.tar.gz

if you need to make it even faster then you can do it in parallel,
several processes will do specific portion of the files within your
bnk directory.

jaro

Jim Idle

unread,
Jan 22, 2010, 7:26:06 PM1/22/10
to jb...@googlegroups.com

> -----Original Message-----
> From: jb...@googlegroups.com [mailto:jb...@googlegroups.com] On Behalf

> Of jaro
> Sent: Friday, January 22, 2010 10:01 AM
> To: jBASE
> Subject: [SPAM] Re: How to speed up jbackup
>
> I can't believe the backup of 1GB file in 2.5 seconds on linux or any
> other system. possibly in the memory only but usually you can't keep
> the whole database in the memory.

I don't think you quite got what Greg was illustrating. You might also contemplate who wrote the original jbackup.



> however, to speed up your backup and compress it you can use:
> tar -cf - bnk | gzip -1 > filename.tar.gz

I don't think you quite got what I was saying and anyway: tar cvz ... does this if you use GNU tar. But, you can only use tar if the files are offline. If they are online, then your tar backup is useless. Bzip2 is a better compression system for large data streams such as tar, or perhaps 7zip.

> if you need to make it even faster then you can do it in parallel,
> several processes will do specific portion of the files within your
> bnk directory.

Except that at some point you will deflate the read-ahead logic by dancing all over the disks.

Jim

Greg Cooper

unread,
Jan 22, 2010, 4:18:43 PM1/22/10
to jb...@googlegroups.com
Sorry you can't believe me, maybe I've made a mistake.

Let me check in case I don't know what I'm talking about.

Oh no, I was right, it was 2.5 seconds.

My Linux system has 8Gb of memory and the 1Gb file fits easily in it. Because the output was to /dev/null there was no disc needed for output. Hence, all memory based. And only 2.5 seconds.

As it is likely the backup time was all down to disc I/O as I'd explained in the first email, then doing multiple backups in parallel will only work of the original jBASE files are split across 2 or more discs -- if you use parallel backups on a single spindle, it will most likely take longer as it will cause even more disc head movement.

Simon Verona

unread,
Jan 24, 2010, 10:39:24 AM1/24/10
to jb...@googlegroups.com
Greg...

Anybody would think you wrote jbackup or something!!! :) 

Simon 

---------------------------------
Simon Verona
Director
Dealer Management Services Ltd

Sent from my iPhone
--
Please read the posting guidelines at: http://groups.google.com/group/jBASE/web/Posting%20Guidelines
 
IMPORTANT: Type T24: at the start of the subject line for questions specific to Globus/T24
 
To post, send email to jB...@googlegroups.com
To unsubscribe, send email to jBASE-un...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en

jaro

unread,
Feb 5, 2010, 10:23:52 AM2/5/10
to jBASE
I don't fully understand your reaction to my posting, Jim. I think I
didn't say anything wrong.
I'm just trying to advise the initiator of the request.
It was indicated that the database size is about 72GB. I assume it's a
Temenos t24 system. Then I also assume that the data are stored on the
storage array. Usually you build the filesystem of the several
physical discs. I think the customer's database is offline during the
backup. So I don't see any issue to run the backup in parallel. and
it's just a matter of a simple script.
If you need to run backup while the system is accessed by users then
we should forget about jbase, and think about something more serious,
like Oracle etc. and forget the tar, gzip, bzip and other commands.

If the backup is so crucial for the customer then they can search for
the tools provided with the storage systems itself. Like symmetrix
storages from EMC ot others offers similar tools like mirroring data.
then doing short offline for few seconds and split the mirrored pairs.
after the backup can be performed on that mirrored pair without
affecting the primary system.

Jim Idle

unread,
Feb 5, 2010, 1:44:45 PM2/5/10
to jb...@googlegroups.com

> -----Original Message-----
> From: jb...@googlegroups.com [mailto:jb...@googlegroups.com] On Behalf
> Of jaro
> Sent: Friday, February 05, 2010 7:24 AM
> To: jBASE
> Subject: [SPAM] Re: How to speed up jbackup
>

> I don't fully understand your reaction to my posting, Jim.

Clearly :-)

> I think I didn't say anything wrong.

You either didn't read, or didn't understand what Greg was saying. That's all I was pointing out.

> I'm just trying to advise the initiator of the request.
> It was indicated that the database size is about 72GB. I assume it's a
> Temenos t24 system. Then I also assume that the data are stored on the
> storage array. Usually you build the filesystem of the several
> physical discs. I think the customer's database is offline during the
> backup. So I don't see any issue to run the backup in parallel. and
> it's just a matter of a simple script.

No, it isn't. However backups are whatever one believes them to be I suppose?

> If you need to run backup while the system is accessed by users then
> we should forget about jbase, and think about something more serious,
> like Oracle etc.

Sigh. Why don't you try reading that back to yourself? Done a lot of work on the Oracle DBMS source code have you?

Having known people that have written code for Oracle for many years I can assure you that most of it is a pile of dingo's doings held together by bits of string and mediocre programmers. Buy the marketing hype if you like (after all many do), but Oracle does not get you anything better.

Do you know what database Ciridian were using when they gave out 27,000 bank accounts (including mine) last month? http://solutions.oracle.com/partners/ceridian

It's nothing to do with the database itself, it's the dangerous people that think they know what they are doing that are the problem. I imagine many of them go to tea parties and are offended by immaculate confections.

> and forget the tar, gzip, bzip and other commands.

See - you still don't quite understand :-) but don't let that stop you commenting will you?

>
> If the backup is so crucial for the customer

Well I hope it is.

> then they can search for
> the tools provided with the storage systems itself. Like symmetrix
> storages from EMC ot others offers similar tools like mirroring data.
> then doing short offline for few seconds and split the mirrored pairs.
> after the backup can be performed on that mirrored pair without
> affecting the primary system.

My point was that this subject has been done to death many times on this forum and a markmail search will tell you everything you need to know and that Greg posted a lot of useful information in his post but you didn't read it properly so did not see why his comment about the memory to memory transfer rates was relevant. You can reply to me, or you can read his email again. One is more useful to you.

Jim

Reply all
Reply to author
Forward
0 new messages