SMTP fails after first attempt

17 views
Skip to first unread message

Tom

unread,
Jan 18, 2024, 10:22:31 AM1/18/24
to Milwaukee Linux Users Group
Hey y'all. Got me a weird one.
 
I have several NAS devices. I like to keep track of them with Nagios, an open-source monitoring system. The Thecus NAS has some SMTP hooks, and there are Nagios plugins to check its status via SMTP and some other API methods. So that works pretty good. See photo:
ImageFromClipboard_15
But attepting to monitor my TrueNAS SCALE instance is bothersome. The method used is SMTP, and I can't seem to get the same result twice.
 
# snmpwalk -v1 -c $COMSTR %NASIP -m /usr/share/snmp/mibs/FREENAS-MIB.txt zpoolDescr
FREENAS-MIB::zpoolDescr.1 = STRING: App0
FREENAS-MIB::zpoolDescr.2 = STRING: boot-pool
FREENAS-MIB::zpoolDescr.3 = STRING: p0
FREENAS-MIB::zpoolDescr.4 = STRING: p1
FREENAS-MIB::zpoolDescr.5 = STRING: p2
 
### Well, that worked exactly like it's supposed to.
 
 
# snmpwalk -v1 -c $COMSTR %NASIP -m /usr/share/snmp/mibs/FREENAS-MIB.txt zpoolDescr.3
FREENAS-MIB::zpoolDescr.3 = STRING: p0
 
### That worked too.
 
 
# snmpwalk -v1 -c $COMSTR $NASIP -m /usr/share/snmp/mibs/FREENAS-MIB.txt zpoolHealth.1
FREENAS-MIB::zpoolHealth.1 = INTEGER: online(0)
 
### Also worked as expected.
 
# snmpwalk -v1 -c $COMSTR $NASIP -m /usr/share/snmp/mibs/FREENAS-MIB.txt zpoolHealth.2
Error in packet.
Reason: (genError) A general failure occured
Failed object: FREENAS-MIB::zpoolHealth.2
 
### This the failure that I get-- this same call, or any similar call like any of those above,
### will work sometimes and not others.
 
# snmpwalk -v2c -c $COMSTR $NASIP -m /usr/share/snmp/mibs/FREENAS-MIB.txt zpoolHealth.2
FREENAS-MIB::zpoolHealth.2 = No Such Object available on this agent at this OID
 
### Change to SNMP version 2c and the failure is reported differently, but this
### same step worked before. Wait a few minutes and it will work again:
 
# snmpwalk -v2c -c $COMSTR $NASIP -m /usr/share/snmp/mibs/FREENAS-MIB.txt zpoolHealth.2
FREENAS-MIB::zpoolHealth.2 = INTEGER: online(0)
 
### Try it again, for zpool 3, and it fails several times, then suceeds:
 
# snmpwalk -v2c -c $COMSTR $NASIP -m /usr/share/snmp/mibs/FREENAS-MIB.txt zpoolHealth.3
Error in packet.
Reason: (genError) A general failure occured
Failed object: FREENAS-MIB::zpoolHealth.3
 
### Minutes later:
 
# snmpwalk -v2c -c $COMSTR $NASIP -m /usr/share/snmp/mibs/FREENAS-MIB.txt zpoolHealth.3
FREENAS-MIB::zpoolHealth.3 = INTEGER: online(0)
 
Any clues?
 
 

465. War talk by men who have been in a war is always interesting, whereas moon talk by a poet who has not been in the moon is likely to be dull. --Twain


Tom Peters • a50@MHzHam@gmail.com • N9QQB (amateur radio) --... ...-- / -.. . / -. ----. --.- --.- -...
"HEY YOU" (loud shouting) • Second Tops (Set Dancing) • FIND ME ON FACEBOOK
43° 7' 17.2" N by 88° 6' 28.9" W • Elevation 815' • Grid Square EN53wc
Sr. Systems Administrator • Open-source Dude • Musician • Registered Linux User 385531
Every successful person has had failures but repeated failure is no guarantee of eventual success.

Glenn Holmer

unread,
Jan 19, 2024, 6:09:32 AM1/19/24
to MLUG
looks like a ZFS problem, not an SMTP problem

--
Homepage: http://www.milwaukeelug.org/home
This forum online: https://groups.google.com/forum/#!forum/MilwaukeeLUG
---
You received this message because you are subscribed to the Google Groups "Milwaukee Linux User's Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to MilwaukeeLUG...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/MilwaukeeLUG/5701.607122427.1705591337%40gmail.com.
ImageFromClipboard_15.bmp

Roger M. Jenson

unread,
Jan 19, 2024, 10:19:23 AM1/19/24
to Milwau...@googlegroups.com

Hello Tom,

I suggest that we start by correcting the message subject and other message block entries where you describe what you are using as SMTP (simple mail transfer protocol) and the command line examples show that you are using SNMP (simple network management protocol). This will minimize confusion as we move forward.

Scraping off the rust on my SNMP skills so I need to ask many questions.

You state that you are trying to use SNMP to gather information on a TrueNAS SCALE instance so I will start there. What version of TrueNAS SCALE are you using?

Reviewing the TrueNAS SCALE Documentation Hub I see that Net-SNMP is used to provide SNMP.

https://www.truenas.com/docs/scale/scaletutorials/systemsettings/services/snmpservicescale/

I also see that TrueNAS SCALE can support SNMP v3 but I have not found (YET) what the default version of SNMP is. It looks like I need to review my available hardware and spin up a TrueNAS SCALE instance to follow along on this adventure.

A good SNMPwalk resource that I will use is available at the following URL.

https://www.networkmanagementsoftware.com/snmpwalk-examples-for-windows-linux/

Since SNMP uses UDP reliability of the data transfer is not guaranteed. Is the computer that you are executing the snmpwalk commands on and the TrueNAS SCALE device on the same subnet? Is there latency issues in the network path?

Have you tried snmpwalk on something else like CPU temperature to see if you receive reproducible responses?

Have Fun,
Roger M. Jenson

Tom

unread,
Jan 19, 2024, 10:54:42 AM1/19/24
to Milwau...@googlegroups.com
At 09:19 AM 1/19/2024 -0600, Roger M. Jenson wrote:
Hello Tom,
I suggest that we start by correcting the message subject and other message block entries where you describe what you are using as SMTP (simple mail transfer protocol) and the command line examples show that you are using SNMP (simple network management protocol). This will minimize confusion as we move forward.
 
 
 Oops, absolutely right. Fixed the subject line.  
 
Scraping off the rust on my SNMP skills so I need to ask many questions.
You state that you are trying to use SNMP to gather information on a TrueNAS SCALE instance so I will start there. What version of TrueNAS SCALE are you using?
 
TrueNAS-SCALE-22.02.4
 
Reviewing the TrueNAS SCALE Documentation Hub I see that Net-SNMP is used to provide SNMP.
I also see that TrueNAS SCALE can support SNMP v3 but I have not found (YET) what the default version of SNMP is. It looks like I need to review my available hardware and spin up a TrueNAS SCALE instance to follow along on this adventure.
I have tried v2c and v1, although they are hideously insecure (send passwords in clear text!) and tried many iterations with both/either versions. Either one works, but not very reliably, and I can't tell if there's any statistical difference between one or the other.
 
 
A good SNMPwalk resource that I will use is available at the following URL.
 
 
Since SNMP uses UDP reliability of the data transfer is not guaranteed. Is the computer that you are executing the snmpwalk commands on and the TrueNAS SCALE device on the same subnet? Is there latency issues in the network path?
 
They are on the same subnet. TrueNAS is a physical machine, and Overwatch (where Nagios and snmpwalk run) is a VMWare virtual. I have scoured the logs on the TrueNAS machine and see no evidence that it is overstressed in any way. There are no messages in dmesg or /var/log/messages that have anything to do with snmp. Nothing pops up in any log when these requests are being received, nor for many minutes after. Anything that pops up does so with timing that suggests it's unconnected to the snmp traffic; the events also have nothing to do with it.
 
Have you tried snmpwalk on something else like CPU temperature to see if you receive reproducible responses?
That will be a next step.

Tom

unread,
Jan 23, 2024, 5:56:01 PM1/23/24
to MLUG Listserve
I wrote plugins for Nagios that would bring back a generic "System Description" and "System Uptime" - both using very basic MIBs with SNMP I also wrote one using common MIBs that gets temperature sensor values by an sensor index number. I created these plugins to test my ability to grab stuff via SNMP.
 
The results are still odd and hard to interpret. I need to watch it for a few hours, but so far what seems to be true is that anything using basic common SNMP MIBs works pretty reliably. Those that use the FreeNAS-mib.txt MIB file seem to be flaky- that would be the pool name and status.
 
ImageFromClipboard_16
The tests for truenas_pool_status and truenas_pool_name often return errors, usually "Error in packet" or or something that indicates that I asked for data on a pool that doesn't exist (untrue!)
 
The others usse a similar methodology and so far return reliable results.
 
-Tom
 
 
At 01:33 PM 1/19/2024 -0600, Roger M. Jenson wrote:
Hello Tom,
Thanks for replying with the requested information.
I am not sure that you will see anything about UDP in the logs without increasing the log level. Even then I do not believe anything will show up in the logs.
The next step I would take is a packet capture filtered to capture only ports 161 (SNMP Request) and 162 (SNMP Rely) to focus on desired traffic. Then analyze the capture with Wireshark.
Have Fun,
Roger M. Jenson

Tom

unread,
Jan 30, 2024, 10:10:40 AM1/30/24
to Milwau...@googlegroups.com
I'm monitoring 20 items via snmp on that machine, plus a ping to see if it's up.
 
I think I have the frequency set to 30 minutes, which changes if the monitor comes back with any non-ok status. I think that interval is set to 5 minutes.  
 
The MIB provides for reading the lm-sensors for temperature, and this box has something like 16 temperatures I could read, but I'm watching 8 of them. I also read "systemUptime" from the machine, just because I can. All those SNMP GETs work pretty reliably-- I don't seem to get any "can't read" or command errors.
 
Ten of the SNMP reads I do are for zfs pool names and statuses (there are five pools) and all of them are flakey. Right now, all ten of them show "External command error: Error in packet" and Nagios says they've been in that state for 20 minutes. Sometimes the read for the pool name and the one for its status come back with good information. Sometimes a bunch of them do. At the moment, they are all showing an error in the GET.
 
All the SNMP data is gotten the same way. I have to conclude that TrueNAS has a problem with SNMP, or perhaps just its ZFS reporting via SNMP, but that the underlying OS does not.
 
Thanks all for your input.
 
 
At 09:19 AM 1/19/2024 -0600, Roger M. Jenson wrote:

Glenn Holmer

unread,
Jan 30, 2024, 5:49:23 PM1/30/24
to Milwau...@googlegroups.com
On Tue, Jan 30, 2024 at 9:10 AM Tom <a50m...@gmail.com> wrote:
All the SNMP data is gotten the same way. I have to conclude that TrueNAS has a problem with SNMP, or perhaps just its ZFS reporting via SNMP, but that the underlying OS does not.

I'm still wondering if it's the ZFS pools that are flakey (vs the SNMP reads). What do the standard ZFS utilities tell you about the pools?

--
Glenn Holmer (Linux registered user #16682)
"After the vintage season came the aftermath -- and Cenbe."

Tom

unread,
Jan 31, 2024, 11:15:10 AM1/31/24
to Milwau...@googlegroups.com
Glenn:
 
This is a TrueNAS SCALE box. Its dashboard reports all is well with the pools. It does regular scrubs of all pools. Early on when I had some marginal drives, the scrubs used to return some warnings and/or errors. This led me to swap out some drives (one at a time) in a process that went on for weeks. I haven't had an alert, other than "Dude, can't you freakin' type a password right?" in many months.
 
If there's problems with the pools, I'm hard pressed to find them.
 

 
# zpool list
NAME     SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
App0     460G 4.07G 456G -       -        10%   0% 1.00x ONLINE /mnt
boot-pool 238G 5.57G 232G -      -         0%   2% 1.00x ONLINE -
p0       29.1T 11.4T 17.7T -     -         6%  39% 1.00x ONLINE /mnt
p1       12.7T 1.07G 12.7T -     -         1%   0% 1.00x ONLINE /mnt
p2       29.1T 8.01T 21.1T -     -         0%  27% 1.00x ONLINE /mnt
 

 
root@gibson:/usr/local/sbin# zpool status -v App0
pool: App0
state: ONLINE
scan: scrub repaired 0B in 00:05:51 with 0 errors on Sun Dec 31 14:56:20 2023
config:
 
NAME STATE READ WRITE CKSUM
App0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
 
errors: No known data errors
 

root@gibson:/usr/local/sbin# zpool status -v boot-pool
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:17 with 0 errors on Fri Jan 26 03:45:19 2024
config:
 
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdj3 ONLINE 0 0 0
sdi3 ONLINE 0 0 0
 
errors: No known data errors
 

root@gibson:/usr/local/sbin# zpool status -v p0
pool: p0
state: ONLINE
scan: scrub repaired 0B in 05:38:25 with 0 errors on Sun Dec 31 20:29:58 2023
config:
 
NAME STATE READ WRITE CKSUM
p0 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
 
errors: No known data errors
 

root@gibson:/usr/local/sbin# zpool status -v p1
pool: p1
state: ONLINE
scan: scrub repaired 0B in 01:52:05 with 0 errors on Sun Dec 31 16:44:49 2023
config:
 
NAME STATE READ WRITE CKSUM
p1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
 

errors: No known data errors
root@gibson:/usr/local/sbin# zpool status -v p2
pool: p2
state: ONLINE
scan: scrub repaired 0B in 06:00:27 with 0 errors on Sun Dec 31 20:53:21 2023
config:
 
NAME STATE READ WRITE CKSUM
p2 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
 
errors: No known data errors

 
ImageFromClipboard_18
 --
---
You received this message because you are subscribed to the Google Groups "Milwaukee Linux User's Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to MilwaukeeLUG...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages