Queue manager journal file

abrsvc

unread,

May 9, 2022, 10:10:19 AM5/9/22

to

Just had a case where the journal file exploded to over 3Million blocks with no indication of any problems. Restarting the system (for other reasons) brought this to our attention because the queue manager wouldn't start. Renaming the journal file out of the way, allowed the system to start and not hang.

Anyone know if an examination of the journal file contents would yield a reason for the expansion or why the manager would hang on startup?

Thanks,
Dan

Stephen Hoffman

unread,

May 9, 2022, 11:08:25 AM5/9/22

to

Queue manager hangs arise when storage is insufficient, or when the
cluster is incomplete or misconfigured, or when the local SCSNODE host
name changes, or when you've encountered a queue manager bug.

The SCSNODE hang is probably the most common cause of a startup hang,
but that's unrelated to the journal file size. (It's a hang that should
have been removed decades ago, too. Display an error, and move on.) The
SCSNODE doesn't seem to match this case, though.

Check for patches for whatever OpenVMS version is involved, as
DEC/Compaq/HP/HPE OpenVMS versions had patches. Based on a quick look,
VSI doesn't (yet?) seem to have any queue manager or job control
patches.

Haven't seen a queue manager journal that large in a while, though.

--
Pure Personal Opinion | HoffmanLabs LLC

abrsvc

unread,

May 9, 2022, 11:40:06 AM5/9/22

to

This is a single system (no cluster, sorry I should have included that).

Space was not the issue as a simple rename of the journal file resolved the problem and things continued with the reboot where it hung before.

Without spending too much time, I was just wondering is there was anything that could be determined by looking at the contents of the journal file. If not, than it will be deleted. If so, what is there that might provide clues to what happened?

Thanks,

Stephen Hoffman

unread,

May 9, 2022, 12:08:35 PM5/9/22

to

On 2022-05-09 15:40:04 +0000, abrsvc said:

> This is a single system (no cluster, sorry I should have included that).

For the SCSNODE mess, that doesn't matter. The queue manager will
happily wedge the system startup on a standalone node, if SCSNODE
changes. But again, that's seemingly not a factor here.

Presumably no run-away batch job submissions? Batch queues are a poor
solution for process control and process management, though are still
widely used for that.

I've seen a few environments fail when the queue manager job entry
numbers exceeded expectations.

Also crazy-big journal files can arise with storage or I/O errors
involving the queue manager files or queue manager storage device, but
all of those files are usually all on the same volume.

> Without spending too much time, I was just wondering is there was
> anything that could be determined by looking at the contents of the
> journal file. If not, than it will be deleted. If so, what is there
> that might provide clues to what happened?

Not that I'm aware of. The format of that file was never published. VSI
might be interested in looking, but that assumes a VSI OpenVMS version
and VSI support.

Simon Clubley

unread,

May 9, 2022, 1:48:11 PM5/9/22

to

On 2022-05-09, abrsvc <dansabr...@yahoo.com> wrote:
> Just had a case where the journal file exploded to over 3Million blocks with no indication of any problems. Restarting the system (for other reasons) brought this to our attention because the queue manager wouldn't start. Renaming the journal file out of the way, allowed the system to start and not hang.
>
> Anyone know if an examination of the journal file contents would yield a reason for the expansion or why the manager would hang on startup?
>

This is where knowing the exact VMS version is absolutely critical Dan.

If it's a modern VMS version, I have no answers, but if it's a really
old version (1990s era), then I have personally experienced this.

There was a bug in a VMS version from that era that caused the queue
manager to explode in size, and you could fix it on a temporary basis
with a workaround command and it was fixed properly in a patch.

Update: This jogged a few memories about the workaround command I used
and I went searching. I found this:

https://community.hpe.com/t5/Operating-System-OpenVMS/SYS-QUEUE-MANAGER-QMAN-JOURNAL-file/td-p/3655181

I don't recall it ever actually hanging the system startup however
(at least for me).

Also, while that article talked about it happening during an upgrade,
I seem to recall this actually happening to me during normal system
operations (and with a larger resulting journal file), so there might
have been more than one such bug.

However, like I said, it was a long time ago and I have forgotten
the details. :-)

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.

abrsvc

unread,

May 9, 2022, 1:58:46 PM5/9/22

to

If you read the original posting, there was a "fix" which was to rename the file out of the way so that the startup created a new journal file.
I didn't post the version for a reason. I fully researched the version and any patches associated with it to address the problem. Having a fix in place to allow for hte system to function normally, I was more interested in exploring why it occurred rather than a "solution". If it is not possible, to determine why, that's fine. This is the first time this has occurred throughout many years and many version upgrades.

Stephen Hoffman

unread,

May 11, 2022, 10:34:05 AM5/11/22

to

On 2022-05-09 17:58:45 +0000, abrsvc said:

> I didn't post the version for a reason. I fully researched the version
> and any patches associated with it to address the problem.

Something to ponder...

What was posted was a "hey, is there any reason the queue manager might
fill the journal over the entire lifetime of OpenVMS and VAX/VMS and
across all versions, and would you mind researching the various patches
across a whole range of OpenVMS versions?"

Please consider an alternative "hey, I'm running in the unsupported
V6.x range and with no patches available and moving to a VSI supported
version is unlikely, and {description}".

The narrower your question around versions or more generally, the less
folks have to dig around, if somebody does want to try to answer.

> Having a fix in place to allow for hte system to function normally, I
> was more interested in exploring why it occurred rather than a
> "solution". If it is not possible, to determine why, that's fine.
> This is the first time this has occurred throughout many years and many
> version upgrades.

Usually one or more submitting or self-submitting batch jobs that
malfunctioned, and that probably massively re-submitted. The queue
manager unfortunately does sometimes malfunction when used (misused?)
as a job and processor manager.

Volker Halle

unread,

May 11, 2022, 11:58:47 AM5/11/22

to

Dan,

there is (was ?) the internal QMALLET utility, which could be used to look at the QMAN journal file with formatted output.

Regards,

Volker.

abrsvc

unread,

May 11, 2022, 1:07:52 PM5/11/22

to

If you have a location for that tool, great. IF not no big deal. The problem was solved by renaming it out of the way and no other issues have arisen since. This query was more of a "can I find out what happened" rather than a search for a solution.

Not critical at this point.

Thanks,
Dan

Peter Weaver

unread,

May 20, 2022, 4:54:13 PM5/20/22

to

If you're running VMS 6.2 then I highly recommend having a batch job that runs once a month (or week, or quarter depending on your needs) that does a simple;

$ directory/size=all CLUSTER_COMMON:SYS$QUEUE_MANAGER.QMAN$JOURNAL
$ mcr jbc$command diagnostic 7 ! Reduce the size of QMAN$JOURNAL so we don't run out of space again
$ directory/size=all CLUSTER_COMMON:SYS$QUEUE_MANAGER.QMAN$JOURNAL

Modify the CLUSTER_COMMON:SYS$QUEUE_MANAGER.QMAN$JOURNAL to point to where your journal lives. I remember adding these lines to a batch job solving a lot of issues back in those days.

abrsvc

unread,

May 20, 2022, 5:01:31 PM5/20/22

to

Thanks, I will keep this in mind. The version in this case is V8.4-2l2 (VSI version) and the issue was the journal size being in the millions of blocks when normally it is around 200 or so. The file was kept, I just haven't had time to look into it further. The "solution" was to rename it out of the way so a new one was created. No issues since the new file was created and this customer has 10000s of jobs per day.

Dan