PTGui Pro 12.26 and 13 beta 8 hang consistently

49 views
Skip to first unread message

foooobar

unread,
Sep 20, 2024, 2:32:43 AMSep 20
to PTGui Support
Hi,

PTGui Pro 12.26 (licensed) and 13 beta 8 (trial) hang consistently at the same point in the stitching process on Windows 11. 

System info
OS: Windows 11 Pro
CPU: 2 socket, 24 cores per socket
RAM: 1.5 TB
GPU: NVidia Quadro RTX 5000 (Nvidia Studio driver 32.0.15.6109)
GPU RAM: 16 GB
Disk (OS + Application): 2TB SSD (< 50% full)
Disk (image data, tmp dir, batchlist dir): 4x2TB SSD RAID0 (< 10% full)

Panorama 1 
Input: 864 100MP 16-bit TIFF files (< 700 GB peak RAM use during stitching)
Output: 125,000 x 62,000 px

Panorama 2 
Input: 198 45MP 16-bit TIFF files (< 250 GB peak RAM use during stitching)
Output: 100,000 x 45,000 px

Both panoramas successfully complete on an Intel Macbook Pro with 64GB RAM, PTGui 12 beta 12, and GPU acceleration enabled. It takes hours of the machine running very hot but it works.

The Problem
On Windows, everything works great until actually creating the panorama. I create a new project, add the images, align the images, and edit the panorama. When I create (stitch) the panorama it gets very far into the process ("Stitching panorama..." progress bar is ~80-90% complete). At this point CPU usage briefly jumps to 100% on all cores and the Windows machine becomes unresponsive. The mouse cursor doesn't move and the clock in the taskbar does not change. PTGui is the only application running. There is plenty of available RAM and disk.

What I've tried
* Different output formats: TIFF/PSB, compressed/uncompressed, 8-bit/16-bit. All combinations hang.
* Using the project file and images that worked on the MacBook. Both projects hang.
* "Create" and "Save and send to batch stitcher". Both approaches hang.
* PTGui Pro 12.26 and PTGui Pro 13 beta 8. Both versions hang.
* GPU acceleration on/off. Both projects hang on both settings.
* GPU Studio and Gaming drivers: Both drivers hang.
* "Create Panorama", "Save and send to Batch Stitcher": Both hang
* Wait overnight for panorama to complete: Computer is still hung in the morning.
* LDR (only Blended option checked) / HDR (only HDR Panorama checked)

Is there any other information I can add? Is there a crash log, debug mode, or anything else to help figure things out? 

I'm running out of ideas. Help?

PTGui Support

unread,
Sep 20, 2024, 3:06:37 AMSep 20
to pt...@googlegroups.com
Hi,

I'm afraid I don't have many ideas, I'm sorry.

A regular application (like PTGui) shouldn't be able to lock up a
computer completely. The application itself might lock up or crash due
to bugs, but the OS should always keep running. If entire computer locks
up, that's indicative of a driver, hardware or OS problem.

On less powerful machines, temporary lockups might occur due to the
computer running out of RAM and the OS starting to swap heavily. And
problems may occur if both RAM and swap space are depleted at the same
time. But that can't be the issue with this amount of RAM.

GPU acceleration also sometimes causes problems, OpenCL has not always
been stable in the GPU drivers. But you've already tried disabling this.

Have you done a memtest? Bad RAM is common, I've experienced it on
multiple machines. With this amount of RAM chances of a bad stick are
higher.

Kind regards,

Joost Nieuwenhuijse
www.ptgui.com

On 20-09-2024 05:45, foooobar wrote:
> Hi,
>
> PTGui Pro 12.26 (licensed) and 13 beta 8 (trial) hang consistently at
> the same point in the stitching process on Windows 11.
>
> *System info*
> *OS*: Windows 11 Pro
> *CPU*: 2 socket, 24 cores per socket
> *RAM*: 1.5 TB
> *GPU*: NVidia Quadro RTX 5000 (Nvidia Studio driver 32.0.15.6109)
> *GPU RAM*: 16 GB
> *Disk (OS + Application)*: 2TB SSD (< 50% full)
> *Disk (image data, tmp dir, batchlist dir)*: 4x2TB SSD RAID0 (< 10% full)
>
> *Panorama 1*
> *Input*: 864 100MP 16-bit TIFF files (< 700 GB peak RAM use during
> stitching)
> *Output*: 125,000 x 62,000 px
>
> *Panorama 2*
> *Input*: 198 45MP 16-bit TIFF files (< 250 GB peak RAM use during stitching)
> *Output*: 100,000 x 45,000 px
>
> Both panoramas successfully complete on an Intel Macbook Pro with 64GB
> RAM, PTGui 12 beta 12, and GPU acceleration enabled. It takes hours of
> the machine running very hot but it works.
>
> *The Problem*
> On Windows, everything works great until actually creating the panorama.
> I create a new project, add the images, align the images, and edit the
> panorama. When I create (stitch) the panorama it gets very far into the
> process ("Stitching panorama..." progress bar is ~80-90% complete). At
> this point CPU usage briefly jumps to 100% on all cores and the Windows
> machine becomes unresponsive. The mouse cursor doesn't move and the
> clock in the taskbar does not change. PTGui is the only application
> running. There is plenty of available RAM and disk.
>
> *What I've tried*
> * Different output formats: TIFF/PSB, compressed/uncompressed,
> 8-bit/16-bit. All combinations hang.
> * Using the project file and images that worked on the MacBook. Both
> projects hang.
> * "Create" and "Save and send to batch stitcher". Both approaches hang.
> * PTGui Pro 12.26 and PTGui Pro 13 beta 8. Both versions hang.
> * GPU acceleration on/off. Both projects hang on both settings.
> * GPU Studio and Gaming drivers: Both drivers hang.
> * "Create Panorama", "Save and send to Batch Stitcher": Both hang
> * Wait overnight for panorama to complete: Computer is still hung in the
> morning.
> * LDR (only Blended option checked) / HDR (only HDR Panorama checked)
>
> Is there any other information I can add? Is there a crash log, debug
> mode, or anything else to help figure things out?
>
> I'm running out of ideas. Help?
>
> --
> You received this message because you are subscribed to the Google
> Groups "PTGui Support" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to ptgui+un...@googlegroups.com
> <mailto:ptgui+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ptgui/43b97f62-63ae-48d9-8889-7a525906aee2n%40googlegroups.com <https://groups.google.com/d/msgid/ptgui/43b97f62-63ae-48d9-8889-7a525906aee2n%40googlegroups.com?utm_medium=email&utm_source=footer>.

foooobar

unread,
Sep 22, 2024, 11:01:33 PMSep 22
to PTGui Support
Thanks for the reply Joost, much appreciated. Some thoughts ...

Would it be possible to include a debug mode in a future beta? It would be nice to see progress in the batch stitcher console window and be able to report a problem as occurring between step X and step Y. Or avoid console output altogether and regularly flush progress to an encrypted/obfuscated file that could be attached to a bug report.

The system is an HP Z8 workstation so it's (ostensibly) coherently engineered and tested with qualified parts (i.e., not a collection of components purchased separately). The system was burned in for 4 days with no errors. BIOS-level diagnostic tests from HP for CPU, RAM, motherboard, and storage all currently pass. I use Z8's at my day job (compiler engineer) and routinely push memory, CPU, and storage limits for hours with no problems 

Also, I did some more experiments over the weekend.

* Two tests to validate system memory. (1) The HP BIOS memory diagnostic and (2) Windows Memory Diagnostic (basic test). Both of these showed no errors, although I'm not ruling out faulty memory until a full memtest86 run. This will take days (1 week+?) on 1.5TB of memory so I'm trying everything I can to isolate the issue before committing to this.

* Clean install of Windows 11 23H2 with only Windows updates, NVidia Drivers, and PTGui installed. Nothing changed; PTGui still consistently hangs at the same point when stitching.

* Ran with PTGui, input TIFFs, tmp dir, batchlist dir, and output all on the C:\ drive. No effect, still hangs.

* Ran with Task Manager visible so running processes would be visible at hang. When the system locked PTGui was at 95% CPU on 48 cores, nothing else stuck out as an obvious problem.

* Installed ProcMon on Windows and filtered to only PTGui events while PTGui was running. This wasn't very helpful as there's some lag between when events occur and when they're displayed. There were a bunch of reads of input TIFF files shortly before hanging but no smoking gun.

* Stitched 1x2 and 2x2 panoramas with images from the  problematic pano. These stitched perfectly into TIFF and PSB files. I'll continue to scale this up until I hit a point where the hang is tripped.

* On the same hardware, I tried to stitch the smaller of the two panoramas from the original message using Linux PTGui. GPU acceleration was turned off and I ran in the following environments: 
  - WSL2
  - Ubuntu 22.04 VM (VMWare Workstation on Windows host)
  - Ubuntu 22.04 Live USB environment

Interestingly, all failed at the same point in stitching with the same symptoms as Windows PTGui. Stitching progress bar at 80-90%, a brief spike to 100% CPU utilization across all cores, and a locked up system. This tells me that we aren't dealing with driver or OS issues. 

I'm a compiler engineer at my day job, but not a particularly good one, so take what I'm about to say with a grain of salt. I wonder if the problem isn't actually faulty memory? I'm not ruling it out but it's not entirely consistent with the symptoms. 

Allocated physical memory is hidden behind virtual addresses. The actual physical memory address is dependent on a lot of things including what is else is running on the system, what else has recently run on the system, the algorithm used by the allocator, page sizes, etc.. Consistently allocating the same faulty physical memory at the same point in program execution on panoramas with very different sizes and across OS installs seems improbable.

I do appreciate the fact that user space programs generally can't lock up a system (at least not since the Windows 9x days). Perhaps the system is misconfigured somehow at a BIOS level and that is letting PTGui hang it?  Or the problem may be some hardware fault that PTGui can consistently reproduce through whatever it is doing around 80%-90% of the way through stitching. I'm at a loss for what the hardware fault could be especially given that 1) this system has been extensively burned in 2) all hardware diagnostic tests currently pass 3) the stability of the system outside of one specific point of PTGui's execution is fine.

When I run out of debugging ideas I'll run memtest86 and will post anything interesting I find.

Do you have any other ideas? Regardless, thanks again for your help.

T

PTGui Support

unread,
Sep 23, 2024, 4:14:37 AMSep 23
to pt...@googlegroups.com
Hi,

On 23-09-2024 05:01, foooobar wrote:
> Thanks for the reply Joost, much appreciated. Some thoughts ...
>
> Would it be possible to include a debug mode in a future beta? It would
> be nice to see progress in the batch stitcher console window and be able
> to report a problem as occurring between step X and step Y. Or avoid
> console output altogether and regularly flush progress to an
> encrypted/obfuscated file that could be attached to a bug report.

Any logging would probably be too coarse, there's a lot going on while
stitching. Getting a stack trace is the only way to find out what's
happening. But it's impossible to obtain one from a hung OS.

> I do appreciate the fact that user space programs generally can't lock
> up a system (at least not since the Windows 9x days). Perhaps the system
> is misconfigured somehow at a BIOS level and that is letting PTGui hang
> it?  Or the problem may be some hardware fault that PTGui can
> consistently reproduce through whatever it is doing around 80%-90% of
> the way through stitching.

Yes that would be my guess. I would have no other ideas, especially
given the fact that your were able to run the stitch on your other computer.

It might look suspect if the crashes are only triggered by PTGui. But
hardware bugs can get triggered by specific workloads only, perhaps a
specific sequence of SSE instructions, who knows.

I can think of a few more things to try:

- configure PTGui to use much less memory. See Options - Advanced - Max
RAM. Try setting it to 64000 MB for a start.

- restrict PTGui to a few specific CPU cores. You can set the CPU
affinity for a process in Windows Task Manager. Try restricting it to a
few cores, or to all cores on 1 of the processors. I think with these
machines half of the RAM is connected to 1 CPU and the other half to the
other. Perhaps things go awry when the CPU interconnect gets stressed.
Just a wild guess though.

- On Linux, try running the 'stress' command, something like this:
stress --vm 1 --vm-bytes 1.3T --vm-keep
perhaps combined with a 48 core cpu stress test.

Joost
Reply all
Reply to author
Forward
0 new messages