Jbase Response Delay

62 views
Skip to first unread message

Alan Metz

unread,
Apr 14, 2023, 9:12:54 AM4/14/23
to jB...@googlegroups.com
All,
   I have recently been experiencing sporadic response delays when accessing Jbase, (version 5.6.0.2), from telnet sessions with all users in my company. At first I thought it was a network issue; however, I have tested this with only one user and the Jbase server plugged into a switch and was able to duplicate the hesitation. I am not logging any errors on my AIX server that would indicate a hardware issue. I am not sure how to further trouble-shoot this issue and am asking for suggestions.  This system has been rock solid since 2018.
Thanks,
Al

Jim Idle

unread,
Apr 15, 2023, 8:44:26 AM4/15/23
to jb...@googlegroups.com
I think more details are needed Alan. 

What version of AIX are you running?
Are you really using telnet and not ssh? Telnet is unlikely to be maintained. 
I assume that you have ruled out network configuration changes?
Any upgrades to AIX lately?
Any change to the network load? New devices?

In the absence of any changes, then I would definitely be looking at network problems. When you say you tried with just one user, do you mean literally one device and the server only on the network? If there is a faulty system somewhere, or malware, then that would still eat your network Bandwidth. 

Finally, I presume you have done the obvious and rebooted the server and all the network gear? You’ll probably have to start from first principles with no devices on the network and gradually add them in. 

--
--
IMPORTANT: T24/Globus posts are no longer accepted on this forum.
 
To post, send email to jB...@googlegroups.com
To unsubscribe, send email to jBASE-un...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en

---
You received this message because you are subscribed to the Google Groups "jBASE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jbase+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jbase/CAPPLyKCej9SMfqOPoBnnQLSUVSDWMEsP-1CFsCgMSZya0yS0NQ%40mail.gmail.com.

Bruce Decker

unread,
Apr 15, 2023, 9:12:34 AM4/15/23
to jb...@googlegroups.com
Is the delay between the login prompt and the password prompt?  As jimi asked, more details.

Sent from my iPhone

On Apr 15, 2023, at 8:45 AM, Jim Idle <ji...@idle.ws> wrote:



Kannan Seshadri

unread,
Apr 15, 2023, 11:42:54 AM4/15/23
to jb...@googlegroups.com
Hi,
Is it possible for you to execute whatever you are executing directly on the AIX console with a telnet session?   This will clearly tell you whether you have a network issue or not?

Thanks and Regards

Alan Metz

unread,
Apr 18, 2023, 2:58:26 PM4/18/23
to jb...@googlegroups.com
Well...
I did some more testing. 
btw AIX 6.1, no changes to AIX, and yes, using telnet
I assume that you have ruled out network configuration changes?  (We did increase bandwidth across the entire network recently)

 
I removed the SD-WAN network from the equation over the weekend.
I attached my laptop to a switch and the server ONLY to the same switch - I did notice the delay.
Tried same setup with a different switch - noticed delay
Tried different Ethernet cable from server to switch - noticed delay
(I wrote a program to track the frequency by Executing a LISTPEQS and recording the time it took to render the results, if greater then 1 second I tracked the time - most
iterations are less than 1 second. What I found out was that it appears a ~40 second delay occurs approximately 5 to 6 minutes apart. (there were a few 2 to 3 second pauses
between that I excluded))
Event Date  Start Time    End Time    Delay Seconds
04/18/2023  11:42:10AM  11:42:53AM    43
04/18/2023  11:47:56AM  11:48:39AM    43
04/18/2023  11:53:33AM  11:54:16AM    43
04/18/2023  11:59:28AM  12:00:12PM    44
04/18/2023  12:05:13PM  12:05:54PM    41
04/18/2023  12:11:03PM  12:11:39PM    36
04/18/2023  12:16:53PM  12:17:34PM    41
04/18/2023  12:22:34PM  12:23:17PM    43
04/18/2023  12:28:34PM  12:29:16PM    42
04/18/2023  12:34:05PM  12:34:48PM    43
04/18/2023  12:39:50PM  12:40:34PM    44
04/18/2023  12:45:43PM  12:46:26PM    43
04/18/2023  12:51:26PM  12:52:09PM    43
04/18/2023  12:57:05PM  12:57:49PM    44
04/18/2023  01:02:33PM  01:03:16PM    43
04/18/2023  01:08:26PM  01:09:09PM    43
04/18/2023  01:14:22PM  01:15:05PM    43
04/18/2023  01:19:58PM  01:20:42PM    44
04/18/2023  01:25:42PM  01:26:27PM    45
04/18/2023  01:31:41PM  01:32:25PM    44

My question is can I somehow determine if a background process is causing the hangs?  I do have Phantoms jobs in Jbase running; however, The code has not changed in years and no new Phantoms 
have been added.

I have added more users on  the network over time, but removing the network as mentioned above was tested.  Unfortunately, I didn't write the tracking program until Monday, after the "removing the network" test.
(I will say that the delays didn't appear to be as frequent with just me and the server test - I suppose I could test that this weekend...)

I wish I could provide more information, but I don't know what else to test??
Thanks,
Al


John Ocampo

unread,
Apr 18, 2023, 5:33:57 PM4/18/23
to jb...@googlegroups.com
I was on an older version of AIX, and i seem to remember a file called wtmp that would grow so large it slowed down the login process. 

Not sure if this can help but i figured you should look at that file and see how large it is.

Check the following when the /var file system has become full.


From: jb...@googlegroups.com <jb...@googlegroups.com> on behalf of Alan Metz <alan...@gmail.com>
Sent: Tuesday, April 18, 2023 2:58 PM
To: jb...@googlegroups.com <jb...@googlegroups.com>
Subject: [EXTERNAL] Re: Jbase Response Delay
 
 CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Alan Metz

unread,
Apr 18, 2023, 6:50:57 PM4/18/23
to jb...@googlegroups.com
Yes, I am familiar with wtmp, thanks for the suggestion!  I periodically clear it, now would be a good time to re-visit!!
(I also clear /etc/security/failedlogin as it can grow quite large as well)
Thanks John

Dick Thiot

unread,
Apr 19, 2023, 8:11:55 AM4/19/23
to jB...@googlegroups.com
You might also have a file that is accessed by one of the phantom jobs that is poorly sized. 

Dick

Alan Metz

unread,
Apr 19, 2023, 10:34:32 PM4/19/23
to jb...@googlegroups.com
Dick,
 Yeah, I checked all files on every account, nothing over/under sized. (I check this every 2-3 months as normal maintenance)
Thanks for suggesting,
Al

brucew...@comcast.net

unread,
Apr 19, 2023, 10:39:48 PM4/19/23
to jb...@googlegroups.com

Have you checked the /var file system? It’s been several years since I’ve been on jBase, but I seem to recall running into issues with it that were causing severe degradation.

Jim Idle

unread,
Apr 19, 2023, 11:25:28 PM4/19/23
to jb...@googlegroups.com
Ah, right. I think you can rule out the network then, as you are seeing intermittent stutter in actual program response time here. That's kicked out a lot of issues. To be honest, I should have recognized this as an XY problem from the start - my apologies.

Now, I used to be a dab hand at tuning AIX, especially with jBASE of course, but it's been a "number of years" ;)

So, something has changed, but unless you changed some of your application software, then it is something that has happened over time that has now hit a bottleneck.There have been a few good suggestions here already, so i will assume that you have looked at those by now.

So, the main thing I remember was that there is a kernel tuning parameter for memory flushing of dirty memory buffers, which was (and I assume still is) controlled by a flush daemon, which I think used to be syncd or flushd. The parameter controls how often this daemon runs.

Right now, this is my first guess as to what is happening as this was always the answer back in the day. And, guess what? The default time for this demon to run is either 30 or 40 seconds, which seems to fit the bill

The scenario is as follows:
  • Someone thinks that this kernel parameter should be high and changes it such that the system doesn't try to flush dirty memory to disk until it gets a lot of dirty buffers
  • Nothing seems to change right away, but one day your workload changes slightly and...
  • The syncd (or whatever it is these days) wakes up every 30 seconds or so, sees that 70% of your memory is in need of being written and it tries to do that all at once in one massive glob of writes - everything else has to pause and wait. 
  • The actual setting should be that the flush cycle runs more often, not less often so that you get a smooth, averaged out performance.
  • The setting, especially on a write busy system, should be about 5 seconds  
This was performance problem #1 with jBASE on AIX. AIX is generally a great system, but tuning it is a bit of a nightmare sometimes. I used to have a whole instruction set for people in the field to do this, but I don't have access to that and haven't for a long time. I don't know if maybe someone like Bruce Decker has a copy of that email - he might. If not, then we will need to find out what I used to do starting from first principles.

The daemon is either flushd or syncd (it is called different things on different systems). As I say, the default is 30 seconds or something similar. You want this to run MORE often, not LESS often. Also, have a think about whether the system workload has changed in terms of writes. More users? Extra business? Someone changed the background tasks to do more writes?

I will try and find my notes etc about this, but while I cannot guarantee that this is your issue, I would be willing to bet a pint on it. We would need to run some vmstat and related commands to put this together, but I bet if you ran that command at the same time as your script that measured above, that you will find that the delay corresponds to a massive spike in disk writes.

BTW, your system is quite a bit out of date; AIX has been basically end-of-lifed and we are on AIX 7.1 now I think. I would recommend upgrading, and probably moving to AWS rather than physical hardware. Also, upgrade jBASE and switch to the file type that does not need any sizing maintenance. My own tests show those files to be the fastest we ever had. I don't know how many users you have, but even if you wanted local hardware, I think it would be a trivial cost to move to a decent rack based modern system with Linux. probably save the money on power costs!

There is no work out there in the world right now, so if this is a big issue for you, then I am available for hire on a no win no fee basis ;)

Jim




Jim Idle

unread,
Apr 21, 2023, 3:41:07 AM4/21/23
to jb...@googlegroups.com
So, I can no longer find much documentation on AIX 6.1 because it is end-of-lifed. But I think that you can determine if this is the cause of your problem by:

edit the file /sbin/rc.boot using sudo (it may be /etc/rc.boot on your version of AIX), and find where it starts the syncd daemon. Start out with 5 seconds - in some cases you can make this shorter. It looks like the -i option is what you need:

start /usr/sbin/syncd -i 5

You can change it for the current system without making it permanent by killing the syncd and restarting it with a new seconds value. That way you can try different values until you get one that suits your system. 

In the AIX 7.1 documentation it also recommends to turn on the random write behind function using the ioo command, but I suspect that that is not there on AIX 6.1. 

I suspect that things hand while this is happening because by default, the process causes locks to be held against the inodes (jBASE files) that have dirty writes outstanding. In AIX 7.1, you can prevent syncd from locking the inode with ioo -o sync_release_ilock=1. See if that is also an option in 6.1.

Please let us know if that helps. 

However, if it does help, then you need to work out why this is now an issue when it was not before. I can only think that there has been a change to your application software, but that is speculation of course.

Jim

Jim Idle

unread,
Apr 26, 2023, 1:42:54 AM4/26/23
to jb...@googlegroups.com
Any progress on this?

Alan Metz

unread,
Apr 26, 2023, 9:53:42 AM4/26/23
to jb...@googlegroups.com
Jim,
   Well, I have been working with a friend of mine, Bob Wyatt; however, we have not been able to find anything definitive.
The sync d runs every 60 seconds.
    As far as system workload, we have about 300 users, many with multiple Accuterm sessions at once, so yes, there is much activity throughout the day. What bothers
me is the fact that I can duplicate the problem with only 1 user attached! (The one thing that I did not try was to eliminate all users, restart AIX and jBASE, and see if the
problem occurs, without anyone running any jBASE programs - to your point about software code changes.  I have been coding our system for over 30 years now, so yes,
some code changes have happened this year...but that is normal operations)
     At this point, I would be inclined to make use of your services to see if you can find anything!  (I do want you to be aware that my company has decided, 2 years ago, to
move away from Pick/jBASE in early 2024 - much to my dismay - which is why I haven't upgraded the system.)
Let me know if you are interested.
Thanks,
Al

Jim Idle

unread,
Apr 26, 2023, 11:51:41 PM4/26/23
to jb...@googlegroups.com
Say hello to Bob for me. :)

It is common for companies to try and move away. It’s also common for that to fail if it is a rewrite. But I understand why you e not upgraded. 

I think you do need to try a restart. One user may be a red herring here as if there are background programs running, then it could be one of those programs causing all the writes. Difficult to say without a look at the system itself. 

Alan Metz

unread,
Apr 27, 2023, 7:32:59 AM4/27/23
to jb...@googlegroups.com
Jim,
   I am willing to give you access to the system if you are willing to take a look.
Thanks,
Al

Reply all
Reply to author
Forward
0 new messages