I am using UCDSNMP 4.2.3. Yes, I know it's too old, but I've been mandated
by my client that that is the version I need to use.
I am trying to resurrect some old code that used to work as an AgentX
subagent with UCDSNMP 4.1.2.
If I bring up the master agent alone, then I can walk the UCD part of the
tree. So network connectivity and basic functionality is working.
If I add my AgentX subagent, as soon as the walk hits a request for the
subagent's part of the MIB, the walk halts with a timeout. Subsequent
attempts to restart the walk with the UCD part of the tree timeout also, so
the master agent is occupied somehow.
Initialization of the AgentX subagent completes with no error complaints
from the master agent. /var/agentx/master is present and accounted
for. But, my callback in the AgentX subagent for handling SNMP requests is
never called (the tree walk halts somewhere in the master agent without
ever consulting my subagent).
I have tried running the master agent under gdb, and it doesn't
crash. I've tried a -D switch, and I don't see anything obviously bad (the
log file is huge, though).
I tried going back to UCDSNMP 4.1.2 where the code used to work, but it
doesn't work there either. So perhaps I have some dumb configuration problem?
The master agent is configured with AgentX support enabled.
I would appreciate suggestions for debugging the problem.
Thank you,
Howard Spindel
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Net-snmp-users mailing list
Net-snm...@lists.sourceforge.net
Please see the following page to unsubscribe or change other options:
https://lists.sourceforge.net/lists/listinfo/net-snmp-users
Is this for the subagent, or the master agent?
The first thing I'd suggest is that you try with a more recent version
of the agent (probably 5.0.6 or 5.0.7pre2) - ideally for both master and
subagent - to verify whether things work OK there.
If they do - *then* try with the earlier master agent (but the same
subagent).
If they don't - then it would appear that the problem may lie elsewhere.
> If I add my AgentX subagent, as soon as the walk hits a request for the
> subagent's part of the MIB, the walk halts with a timeout. Subsequent
> attempts to restart the walk with the UCD part of the tree timeout also, so
> the master agent is occupied somehow.
I'd strongly suggest that you test the system with *single* requests
rather than a 'walk' - preferably 'get' rather than 'getnext'.
It would also help if you turn off retries, and ratchet up the timeout.
Otherwise the master agent gets bombarded with requests from the client,
and simply can't keep up.
Try
snmpget -r 0 -t 360 .....
That will try *once*, and wait for 6 minutes before failing - should be
plenty of time!
> But, my callback in the AgentX subagent for handling SNMP requests is
> never called (the tree walk halts somewhere in the master agent without
> ever consulting my subagent).
Does the master agent pass anything off to the subagent?
Try running the master agent using '-d', and see whether it generates
any output PDU when the 'snmpget' request arrives.
> I have tried running the master agent under gdb, and it doesn't
> crash. I've tried a -D switch, and I don't see anything obviously bad (the
> log file is huge, though).
There's probably no point in turning on *all* debugging output - either
use '-Dagentx' to look at the AgentX stuff only, or else '-Ddump'
(and/or '-d') to watch the packet traffic.
But it definitely helps to work with *one* PDU at a time. Low timeouts,
multiple retries and 'snmpwalk' commands all serve to complicate matters.
Dave
To answer your first question, it is mandated by my client that the master
agent be UCDSNMP 4.2.3. Perhaps it would be possible for me to develop my
subagent using a new NETSNMP version, but then I'd definitely have to go to
a static link so as not to pick up old libraries from the machine on which
my subagent eventually runs. I've had problems getting a static link to
work - something is not getting picked up right in the SSL stuff. It seems
safer to me to develop the subagent using the same version as mandated for
the master agent anyway.
I have not yet tried the tests with a NETSNMP version as I would also have
to edit the sources to conform to things like new header file names. That
may well be the next step, but I see enough strangeness already that
perhaps I can make progress.
As you suggested, I did try a simple get and a simple getnext after
starting the master agent with -Dagentx and -d. Here is a fairly short
snmpd.log, interlineated with some comments from me:
Turning on AgentX master support.
Note this is still experimental and shouldn't be used on critical systems.
agentx/master: initializing...
agentx/master: initializing... DONE
Sending 44 bytes to 127.0.0.1:162
0000: 30 2A 02 01 00 04 06 70 75 62 6C 69 63 A4 1D 06 0*.....public¤..
0016: 0A 2B 06 01 04 01 8F 65 81 7A 0A 40 04 C0 01 01 .+.....e.z.@.À..
0032: 65 02 01 00 02 01 00 43 01 0C 30 00 e......C..0.
Sending 44 bytes to 192.1.1.6:162
0000: 30 2A 02 01 00 04 06 70 75 62 6C 69 63 A4 1D 06 0*.....public¤..
0016: 0A 2B 06 01 04 01 8F 65 81 7A 0A 40 04 C0 01 01 .+.....e.z.@.À..
0032: 65 02 01 00 02 01 00 43 01 0C 30 00 e......C..0.
Sending 44 bytes to 192.1.1.9:162
0000: 30 2A 02 01 00 04 06 70 75 62 6C 69 63 A4 1D 06 0*.....public¤..
0016: 0A 2B 06 01 04 01 8F 65 81 7A 0A 40 04 C0 01 01 .+.....e.z.@.À..
0032: 65 02 01 00 02 01 00 43 01 0C 30 00 e......C..0.
----> The previous three packets are undoubtedly coldstart traps sent to my
three trapdests.
UCD-SNMP version 4.2.3
Received 68 bytes from 97.114.47.97:12150
0000: 01 01 00 00 00 00 00 00 00 00 00 00 8B 74 0A 1E .............t..
0016: 30 00 00 00 00 00 00 00 04 04 00 00 01 00 00 00 0...............
0032: E5 07 00 00 FA 00 00 00 0A 00 00 00 14 00 00 00 å...ú...........
0048: 55 43 44 20 41 67 65 6E 74 58 20 73 75 62 2D 61 UCD AgentX sub-a
0064: 67 65 6E 74 gent
----> Who the heck is 97.114.47.97? This appears to be a message from my
AgentX subagent, but it should be on localhost or perhaps on 192.1.1.101
which is the IP of the machine that is running snmpd. Is this just an
AgentX funny because it's using a named port instead of an IP addressable port?
agentx:open_agentx_session: open 0x814cfe8
agentx:open_agentx_session: opened 0x814fbf8 = 7
Sending 76 bytes to 97.114.47.97:12150
0000: 01 12 00 00 07 00 00 00 00 00 00 00 8B 74 0A 1E .............t..
0016: 38 00 00 00 92 08 00 00 00 00 00 00 04 00 00 00 8...............
0032: 04 04 00 00 01 00 00 00 E5 07 00 00 FA 00 00 00 ........å...ú...
0048: 0A 00 00 00 14 00 00 00 55 43 44 20 41 67 65 6E ........UCD Agen
0064: 74 58 20 73 75 62 2D 61 67 65 6E 74 tX sub-agent
Received 44 bytes from 97.114.47.97:12150
0000: 01 03 00 00 07 00 00 00 00 00 00 00 8C 74 0A 1E .............t..
0016: 18 00 00 00 FF 7F 00 00 04 04 00 00 01 00 00 00 ....ÿ...........
0032: 1B 03 00 00 0C 00 00 00 02 00 00 00 ............
agentx:register: in register_agentx_list
agentx:register: registered ok
Sending 52 bytes to 97.114.47.97:12150
0000: 01 12 00 00 07 00 00 00 00 00 00 00 8C 74 0A 1E .............t..
0016: 20 00 00 00 93 08 00 00 00 00 00 00 05 00 00 00 ...............
0032: 04 04 00 00 01 00 00 00 1B 03 00 00 0C 00 00 00 ................
0048: 02 00 00 00 ....
----> It appears that the AgentX registration went okay. I supposedly
registered for .1.3.6.1.4.1.795.12.2, which is the base of my MIB. I'm
using the old style register_mib API to register, and I intend to register
just the base OID and have all OIDs underneath the base parsed by code in
my subagent. I've verified with a printf in my subagent code that the base
OID I'm sending is correct. In the packet above, I see the 1.795.12.2
part, but not the 1.3.6.1.4 - is that somehow assumed by the master, or is
this the source of my problems? If it's the source of my problems, what's
going on since I pass the full OID to register_mib()?
Received 44 bytes from 192.1.1.9:1130
0000: 30 2A 02 01 00 04 06 70 75 62 6C 69 63 A0 1D 02 0*.....public ..
0016: 02 53 67 02 01 00 02 01 00 30 11 30 0F 06 0B 2B .Sg......0.0...+
0032: 06 01 04 01 86 1B 0C 02 01 00 05 00 ............
Received SNMP packet(s) from 192.1.1.9
GET message
-- enterprises.795.12.2.1.0
Sending 44 bytes to 192.1.1.9:1130
0000: 30 2A 02 01 00 04 06 70 75 62 6C 69 63 A2 1D 02 0*.....public¢..
0016: 02 53 67 02 01 02 02 01 01 30 11 30 0F 06 0B 2B .Sg......0.0...+
0032: 06 01 04 01 86 1B 0C 02 01 00 05 00 ............
----> A simple get for .1.3.6.1.4.1.795.12.2.1.0, which is the very first
scalar in my MIB, is met by a noSuchName response without consulting the
AgentX subagent.
Received 42 bytes from 192.1.1.9:1136
0000: 30 28 02 01 00 04 06 70 75 62 6C 69 63 A1 1B 02 0(.....public¡..
0016: 02 75 3C 02 01 00 02 01 00 30 0F 30 0D 06 09 2B .u<......0.0...+
0032: 06 01 04 01 86 1B 0C 02 05 00 ..........
Received SNMP packet(s) from 192.1.1.9
GETNEXT message
-- enterprises.795.12.2
agentx/master: request to pass to client: enterprises.795.12.2
agentx/master: request to pass to client: enterprises.795.12.2
agentx/master: request to pass to client: enterprises.795.12.2
agentx/master: request to pass to client: enterprises.795.12.2
----> A getnext request for .1.3.6.1.4.1.795.12.2 is apparently recognized
as requiring a handoff to an AgentX subagent. At this point, though, the
master agent goes into a loop repeating the above four lines indefinitely.
-----------
After that, I did a little single stepping starting from where the "request
to pass to client" message originated. I saw that it was calling something
in the mibII module, so I turned on a little more debugging and obtained
the following loop:
agentx/master: request to pass to client: enterprises.795.12.2
snmp_vars: Returned something
mibII/vacm_vars: vacm_in_view: ver=0, source=090101c0, community=public
mibII/vacm_vars: vacm_in_view: sn=notConfigUser, gn=notConfigGroup,
vn=systemview
agentx/master: request to pass to client: enterprises.795.12.2
snmp_vars: Returned something
mibII/vacm_vars: vacm_in_view: ver=0, source=090101c0, community=public
mibII/vacm_vars: vacm_in_view: sn=notConfigUser, gn=notConfigGroup,
vn=systemview
etc. forever...
I don't know if the additional debugging helps much or not. Looks to me
like the master agent is trying to pass off the request for
enterprises.795.12.2 to something other than my subagent.
Are there any more clues to be gained from these logs? Is there more
debugging I can turn on to help solve this?
Thanks very much,
Howard Spindel
Yes - I understand that.
I'm suggesting that you use a more recent version as part of the debugging
process - not as a final solution. If you can try:
4.2.3 master with 4.2.3 subagent
4.2.3 master with "new" subagent
"new" master with 4.2.3 subagent
"new" master with "new" subagent
then that might give a handle on where the problem actually lies.
> Perhaps it would be possible for me to develop my
> subagent using a new NETSNMP version, but then I'd definitely have to go to
> a static link so as not to pick up old libraries from the machine on which
> my subagent eventually runs.
Don't worry about that just yet - try a new master and a new subagent
on your development machine. Does that exhibit the same problems or not?
If it does, then the fix (whatever it turns out to be) may well be
applicable to the 4.2.3 environment as well.
If the problem goes away, then it's probably a bug that's since been
fixed. (And how you address that will depend on exactly what the
problem turns out to be).
> As you suggested, I did try a simple get and a simple getnext after
> starting the master agent with -Dagentx and -d. Here is a fairly short
> snmpd.log, interlineated with some comments from me:
> Sending 44 bytes to 127.0.0.1:162
> 0000: 30 2A 02 01 00 04 06 70 75 62 6C 69 63 A4 1D 06 0*.....public¤..
^^
[snip]
> ----> The previous three packets are undoubtedly coldstart traps sent to my
> three trapdests.
Yup - 'A4' is the tag for a v1Trap PDU.
> UCD-SNMP version 4.2.3
> Received 68 bytes from 97.114.47.97:12150
> ----> Who the heck is 97.114.47.97? This appears to be a message from my
> AgentX subagent, but it should be on localhost or perhaps on 192.1.1.101
> which is the IP of the machine that is running snmpd.
No - AgentX (by default) runs on a named port, rather than a TCP connection.
(You can run it over TCP, but the default is to use /var/agentx/master)
The v4 line assumes IP connectivity when dumping PDUs - the v5 line handles
this properly regardless. Don't worry about this, it's just a cosmetic
problem.
[snip]
> agentx:register: in register_agentx_list
> agentx:register: registered ok
[snip]
> ----> It appears that the AgentX registration went okay.
Good.
> I'm
> using the old style register_mib API to register, and I intend to register
> just the base OID and have all OIDs underneath the base parsed by code in
> my subagent.
That's fine - it's the "normal" situation (as least with the old API).
> I've verified with a printf in my subagent code that the base
> OID I'm sending is correct. In the packet above, I see the 1.795.12.2
> part, but not the 1.3.6.1.4 - is that somehow assumed by the master, or is
> this the source of my problems?
No - the AgentX protocol includes a very simplistic OID compression
algorithm. The second "04" in "04 04 00 00" indicates the 1.3.6.1.4 prefix
> Received 44 bytes from 192.1.1.9:1130
:
> Received SNMP packet(s) from 192.1.1.9
> GET message
> -- enterprises.795.12.2.1.0
> Sending 44 bytes to 192.1.1.9:1130
:
> ----> A simple get for .1.3.6.1.4.1.795.12.2.1.0, which is the very first
> scalar in my MIB, is met by a noSuchName response without consulting the
> AgentX subagent.
Hmmm... I'd be inclined to concentrate on this one to start with.
If you send a USR1 signal to the master agent, it will dump its internal
MIB registry. Have a look through that, for the bits relating to the
1.3.6.1.4.1.795.12.2 registration. What does that look like?
> Received 42 bytes from 192.1.1.9:1136
:
> Received SNMP packet(s) from 192.1.1.9
> GETNEXT message
> -- enterprises.795.12.2
> agentx/master: request to pass to client: enterprises.795.12.2
> agentx/master: request to pass to client: enterprises.795.12.2
> agentx/master: request to pass to client: enterprises.795.12.2
> agentx/master: request to pass to client: enterprises.795.12.2
>
> ----> A getnext request for .1.3.6.1.4.1.795.12.2 is apparently recognized
> as requiring a handoff to an AgentX subagent. At this point, though, the
> master agent goes into a loop repeating the above four lines indefinitely.
Are these messages printed out in rapid succession, or one .... at .... a
.... time ? How frequently? How many?
It looks suspiciously as if the master agent is timing out before it gets
a response, and so keeps retrying. Which isn't surprising if it never
actually sent the request!
With a bit of luck, tracking down the Get problem should fix this
as well.
> After that, I did a little single stepping starting from where the "request
> to pass to client" message originated. I saw that it was calling something
> in the mibII module, so I turned on a little more debugging and obtained
> the following loop:
Ughh! That output looks horrible....
But it raises one other thing to check - how have you configured the
access control settings in the master agent? If the client isn't
authorised to retrieve values in this OID subtree, that would explain
why the subagent isn't getting queried for the Get request
(though not necessarily the GetNext behaviour).
Thank you for all of your help and suggestions.
The problem turned out to be entirely one of setting up the access in
snmpd.conf correctly. I just got thrown off by all of the strange things I
was seeing in the traces and while using gdb.
Looks to me like UCDSNMP 4.2.3 had some severe problems when it received
requests for objects that it was not authorized to provide. I did not see
the same problems in NETSNMP.
Howard