On March 10, 2017 at 10:33:03 AM, Calloway, Chris (c...@unc.edu) wrote:
Nathan,
➢ I’m thinking you might see if our test server works for you.
It does indeed! It even renders the entire dataset as nc4 when requested with no constraints (and much faster than a subset).
➢ If it does would you consider upgrading your Hyrax instance to the current release?
Considering that the instance is currently installed from RPM, I would imagine your suggestion is to install from RPM again? Is it possible to do that upgrade to the current RPM installation in place? Or would I need to start a new server VM afresh? This was my concern with installing the RPM originally, that I wouldn’t be able to simply git pull to receive updates and bug fixes.
You should be able to ‘yum upgrade libdap*.rpm bes*.rpm’ (where libdap*.rpm is the name of the Hyrax 1.13.x rpm for libdap, …) just fine.
You will almost certainly need to put both RPM on the same command line.
My apologies for chiming in…
James
--
Sincerely,
Chris Calloway
Applications Analyst
University of North Carolina
Renaissance Computing Institute
(919) 599-3530
Thanks, James.
This is what I did:
[cbc@hyrax01 ~]$ wget https://www.opendap.org/pub/binary/hyrax-1.13.3/centos6.6/libdap-3.18.3-1.el6.x86_64.rpm
…
[cbc@hyrax01 ~]$ wget https://www.opendap.org/pub/binary/hyrax-1.13.3/centos6.6/bes-3.17.4-1.static.el6.x86_64.rpm
…
[cbc@hyrax01 ~]$ sudo besctl stop
Shutting down the BES daemon
There are several different BES processes running: 27154
Successfully shut down the BES
[cbc@hyrax01 ~]$ sudo besctl kill
[cbc@hyrax01 ~]$ sudo yum upgrade libdap-3.18.3-1.el6.x86_64.rpm bes-3.17.4-1.static.el6.x86_64.rpm
…
Complete!
[cbc@hyrax01 ~]$ sudo besctl start
Starting the BES
OK: Successfully started the BES
PID: 28503 UID: 0
[cbc@hyrax01 ~]$
In a browser I went to:
And back to square one with Internal Server Error.
In /var/log/bes/bes.log:
[EST Fri Mar 10 11:12:23 2017 id: 28828] 28828 from ip 127.0.0.1, port 46056 request received
[EST Fri Mar 10 11:12:23 2017 id: 28828] 28828 from ip 127.0.0.1, port 46056 [set context errors to xml;] received
[EST Fri Mar 10 11:12:23 2017 id: 28828] 28828 from ip 127.0.0.1, port 46056 [show catalog for /ff2a6f87817544a08c82ebcf119bae80/data/contents/NLDAS_NOAH0125_D_002_EVPsfc_CONUS.nc;] received
[EST Fri Mar 10 11:12:30 2017 id: 28828] 28828 from ip 127.0.0.1, port 46056 request received
[EST Fri Mar 10 11:12:30 2017 id: 28828] 28828 from ip 127.0.0.1, port 46056 [set context xdap_accept to 3.2;] received
[EST Fri Mar 10 11:12:30 2017 id: 28828] 28828 from ip 127.0.0.1, port 46056 [set context dap_explicit_containers to no;] received
[EST Fri Mar 10 11:12:30 2017 id: 28828] 28828 from ip 127.0.0.1, port 46056 [set context errors to dap2;] received
[EST Fri Mar 10 11:12:30 2017 id: 28828] 28828 from ip 127.0.0.1, port 46056 [set context max_response_size to 0;] received
[EST Fri Mar 10 11:12:30 2017 id: 28828] 28828 from ip 127.0.0.1, port 46056 [set container in catalog values catalogContainer,/ff2a6f87817544a08c82ebcf119bae80/data/contents/NLDAS_NOAH0125_D_002_EVPsfc_CONUS.nc;] received
[EST Fri Mar 10 11:12:30 2017 id: 28828] 28828 from ip 127.0.0.1, port 46056 [define d1 in default as catalogContainer with catalogContainer.constraint="latitude[0:1:194],longitude[0:1:461],time[0:1:365],crs,mean[0:1:365][0:1:194][0:1:461]";] received
[EST Fri Mar 10 11:12:30 2017 id: 28828] 28828 from ip 127.0.0.1, port 46056 [get dods for d1 return as netcdf-4;] received
[EST Fri Mar 10 11:13:52 2017 id: 28828] Child listener caught SIGPIPE (master listener PID: 28505). Child listener Exiting.
However, after upgranding:
/usr/tomcat7/content/opendap/olfs.xml
is now there.
Looking in that file, the bes timeout is commented out:
<!-- Timeout (in seconds) for this BES, defaults to 300 seconds-->
<!-- <timeOut>300</timeOut> -->
Does this mean there is no timeout, or should I uncomment this, or should I change it to something longer? The lag between the request producing the error and the actual error was 89 seconds in the example below.
/dev/mapper/VGos-LVslash 8.8G 3.9G 4.5G 47% /
0)
14:19:50.923 [localhost-startStop-1] INFO opendap.coreServlet.ServletUtil - Using config location: /etc/olfs/
1)
Thought it might be because of the sticky bit on /tmp. But when I created bes_debug in /var/opt/hyrax with no sticky bit, debug all still would not start. I might try to run each
of the debug services individually later
2)
Well, this is odd. BES.UncompressCache.dir was already set to /var/cache/bes, not /tmp. And it doesn’t appear writable by tomcat7:
[cbc@hyrax01 tomcat7]$ ls -l /var/cache
total 56
drwxrwxr-x 2 bes bes 4096 Jan 9 18:44 bes
[,,,{
And it doesn’t appear to have been written to in awhile:
[cbc@hyrax01 tomcat7]$ ls -l /var/cache/bes
total 452
-rw-r--r-- 1 bes bes 8 Dec 14 2015 uncompress_cache.cache_control
-rw-r--r-- 1 bes bes 204960 Dec 14 2015 uncompress_cache#usr#share#hyrax#data#gdal#Atlantic.wind.grb
-rw-r--r-- 1 bes bes 95160 Dec 14 2015 uncompress_cache#usr#share#hyrax#data#gdal#Caribbean.wind.grb
-rw-r--r-- 1 bes bes 150060 Dec 12 2015 uncompress_cache#usr#share#hyrax#data#gdal#CentralAtlantic.wind.grb
[cbc@hyrax01 tomcat7]$
Whereas FONc.Tempdir and FONg.Tempdir were both already /tmp. So maybe a misconfiguration between bes, fonc, and fong?
I set BES.UncompressCache.dir, FONc.Tempdir, FONg.Tempdir to /var/opt/hyrax:
drwxrwxrwx 2 tomcat7 tomcat7 4096 Mar 14 16:32 hyrax
[cbc@hyrax01 ~]$
Restarted bes and tomat7, sent offending request and got internal server error. However there were two cache files in /var/opt/hyrax, which seems odd:
-rw------- 1 bes bes 131914424 Mar 14 16:24 nccj60Va
-rw------- 1 bes bes 131914424 Mar 14 16:24 ncXXBrCF
So I turned on debug after a restart:
[cbc@hyrax01 ~]$ ls -l /var/opt/hyrax
-rwxrwxrwx 1 bes tomcat7 0 Mar 14 16:38 bes_debug.log
And then I got the internal server error again from the offending request. Except this time only one new cache file was created:
[cbc@hyrax01 ~]$ ls -l /var/opt/hyrax/
total 386500
-rwxrwxrwx 1 bes tomcat7 20526 Mar 14 16:50 bes_debug.log
-rw------- 1 bes bes 131914424 Mar 14 16:50 nc2oIPEH
-rw------- 1 bes bes 131914424 Mar 14 16:24 nccj60Va
-rw------- 1 bes bes 131914424 Mar 14 16:24 ncXXBrCF
-rw-r--r-- 1 bes bes 8 Mar 14 16:22 uncompress_cache.cache_control
[cbc@hyrax01 ~]$
Here’s the bes_debug.log which simply shows that the cache was written:
http://people.renci.org/~cbc/bes_debug.log
More tomorrow. Are there other debug options I could run that would be helpful?
I set BES.UncompressCache.dir, FONc.Tempdir, FONg.Tempdir to /var/opt/hyrax:
At the ready. My number below. Or send me your number and an EST time.
OK, Nathan. I went through the steps we discussed yesterday:
1)
The cached files are indeed valid nc4 with the correct data in them. I uploaded a copy of one here:
http://people.renci.org/~cbc/ncLK24tq
2)
Changed olfs.xml to comment out <CatalogCache>. Internal server error on offending request. Debug log here looks the same as always:
http://people.renci.org/~cbc/bes_debug.catalog_cache_commented_out.log
3)
Changed olfs.xml to uncomment <BesManager><timeout> and set it to 0 for no timeout. Debug log here looks the same as always but slightly larger:
http://people.renci.org/~cbc/bes_debug.bes_manager_timeout_uncommented_and_set_to_0.log
4)
Changed olfs.xml to ramp down <ClientPool> attributes to maximum=4 and maxcmds=200. Debug log here looks the same as always:
http://people.renci.org/~cbc/bes_debug.client_pool_max_4_maxcmd_200.log
5)
Changed olfs.xml to ramp up <ClientPool> attributes to maximum=2000 and maxcmds=20000. Debug log here looks the same as always:
http://people.renci.org/~cbc/bes_debug.client_pool_max_2000_maxcmd_20000.log
Aftwards, I restarted BES and OLFS with debug turned off and <ClientPool> returned to the default settings of maximum=200 and maxcmds=2000 (but left the <CatalogCache commented out and <BesManager><timeout> uncommented and set to 0.
So, as you predicted, nothing came of those changes. But hopefully some causes were eliminated. If we do gdb, that will be compiling from source in develop mode on another VM, correct?