High EIGRP Pending routes

sonic31ss

unread,

Jan 10, 2009, 12:49:53 PM1/10/09

to

Most of the documentation I can find on "Pending routes" from show ip
eigrp interfaces comes directly from Cisco and is not that helpful.
Their definition is "Number of routes in the packets sitting in the
transmit queue waiting to be sent."
The reason for this question is that we have an EIGRP meltdown on our
core routers (6500s - Sup720) about every three months. We created
scripts to collect more data from the router on a 5 minute interval
and the early indication that EIGRP was in trouble was that there were
52 out of approximately 300 EIGRP interfaces that had as many as 8000
pending routes. The remainder of the interfaces had zero pending
routes. The IP routing table is approximately 4000 routes.
The CPU utilization was 34% at 5 sec with 11% consumed by EIGRP PDM.
Five minutes later the CPU was at 100% and Pending routes were as
high as 100,000 on some of the EIGRP interfaces. All interfaces with
EIGRP neighbors had tens of thousands of pending routes.
This core router (Core_01) has another core router attached (Core_02)
and it did not record a high number of Pending routes.
There was no indication in the syslog of any significant topology
change leading up to or during this event.
So my questions to the group are;
1. Does anyone have a better definition of Pending routes?
2. How often are the counters from show ip eigrp interface updated by
the IOS?
3. Does 8000 Pending routes seem to high considering that there are
only 4000 routes in the IP routing table?

Thanks in advance

bod43

unread,

Jan 10, 2009, 3:20:47 PM1/10/09

to

Caveat - I am not an EIGRP expert. In fact I don't
know much about it at all and have basically
zero operational experience.

Answers:-
1.
I understand that EIGRP requires that routing
updates are acknowledged by the neighbour.
Most acknowledged protocols send some information
then wait on that being acknowledged before
sending more. I guess that pending routes are
routes waiting on previously sent routes being
acknowledged. This could be caused by
communications problems with the neighbours or
CPU overoload on the neighbours.

2.
I don't have a clue about these specific counters.
The bytes in/out etc counters are updated
quite infrequently on some platforms - 20 seconds
is the longest I have seen. From memory it is either
15sec or 20sec on the 6500.

3.
Yes. EIGRP sends *routes*. Perhaps the router has
decided to send one set of updates and then before
they can be sent successfully something triggers
another update. It may be no co-incidence that
8000 = 2 * 4000.

eigrp sends every route out of every interface
(except for split horizon) so you expect to see
(nearly) all routes.

I would approach this as follows:-
Check out if there is something in common
with the 52 interfaces that get the queues first.

Then I would get back to basics.

Check the whole network for
Interface errors on infrastructure ports.
Zero is good.

sh ip eigrp nei
shows srtt and I think also counts missed hellos.
worth a look.

I am not really sure of the significance of this
but I woud fancy checkig that all routers
with more than one link to any destination
have a feasible successor to that destination
or are load sharing.

Remember that with dynamic routing
protocols the key thing is the performance of
your feeblest router. It has to be able to
deal with all of the requests made of it.

The other thing that might go wrong is that
on a slow link updates might not be completed
before another is required. Have you any slow links?
Work out how much data is required to send the 4000
route update and figure out how long
it takes to send it on your slowest link.

Perhaps you could describe the network further.
How many routers are in the EIGRP AS.
How many EIGRP processer are there?
Are you doing summarisation?
Do you manage all of the routers?
Is the network "well designed" or
are new devices and links whacked in
as required? (you dont need to answer that:)

300 EIGRP ports sounds like quite a lot to me.

I'll stop there:)

Stephen

unread,

Jan 10, 2009, 4:24:07 PM1/10/09

to

On Sat, 10 Jan 2009 09:49:53 -0800 (PST), sonic31ss
<soni...@comcast.net> wrote:

>Most of the documentation I can find on "Pending routes" from show ip
>eigrp interfaces comes directly from Cisco and is not that helpful.
>Their definition is "Number of routes in the packets sitting in the
>transmit queue waiting to be sent."
>The reason for this question is that we have an EIGRP meltdown on our
>core routers (6500s - Sup720) about every three months. We created
>scripts to collect more data from the router on a 5 minute interval
>and the early indication that EIGRP was in trouble was that there were
>52 out of approximately 300 EIGRP interfaces that had as many as 8000
>pending routes. The remainder of the interfaces had zero pending
>routes. The IP routing table is approximately 4000 routes.
>The CPU utilization was 34% at 5 sec with 11% consumed by EIGRP PDM.
>Five minutes later the CPU was at 100% and Pending routes were as
>high as 100,000 on some of the EIGRP interfaces. All interfaces with
>EIGRP neighbors had tens of thousands of pending routes.

it sounds like you have some sort of EIGRP update storm going on.

EIGRP is more flexible than some other state based protocols (whether
you like DUAL is a separate discussion).

>This core router (Core_01) has another core router attached (Core_02)
>and it did not record a high number of Pending routes.

The router with all the pending updates is flooding changes to lots of
other routers, and some of them cannot keep up.

Once you get in this state you seem to get "waves" of updates bouncing
around the network.

The fix is to reduce the ways that updates propagate and multiply
around your network - this usually occurs where you have a lot of
loops, say with a dual centred star type topology.

You can set limits to how on which paths are candidates for updates
from the core to edge locations and back that cut down on the paths
for routing updates to propagate.

There is some EIGRP best practice in here (read the whole thing, but
p16 on has ways to control scaling effects in EIGRP):
http://www.cisco.com/application/pdf/en/us/guest/netsol/ns432/c649/ccmigration_09186a00805fccbf.pdf

this is design far campus, but the routing topology issues are the
same in a WAN just made even worse by the latencies and lower
bandwidths.

>There was no indication in the syslog of any significant topology
>change leading up to or during this event.
>So my questions to the group are;
>1. Does anyone have a better definition of Pending routes?
>2. How often are the counters from show ip eigrp interface updated by
>the IOS?
>3. Does 8000 Pending routes seem to high considering that there are
>only 4000 routes in the IP routing table?

EIGRP lets you build arbitary topologies (much more so than the rigid
area structures you need with OSPF and IS-IS).

That is both a blessing when it allows exceptions to a hierarchical
design, and a curse when that gets out of control.

a bit about troubleshooting
http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094613.shtml

Once you have some more info - have a hunt around the cisco site -
most of what you want should be there, but they are good at hiding the
woods behinds the trees......

Use stubs and route summarisation around your topology and you should
see some improvement.

Also, this should improve convergence.
>
>Thanks in advance

some good stuff for Cat6k
http://www.cisco.com/en/US/products/hw/switches/ps700/products_white_paper09186a00801b49a4.shtml
--
Regards

stephe...@xyzworld.com - replace xyz with ntl

Thrill5

unread,

Jan 11, 2009, 3:17:32 AM1/11/09

to

"sonic31ss" <soni...@comcast.net> wrote in message
news:199828b8-905e-4a41...@k36g2000pri.googlegroups.com...

What is your topology that you have 300 interfaces running EIGRP? That
seems very excessive!!! If you have two 6500's for redundancy, each with
same 300 VLANs on them, running EIGRP on all of the interfaces is a very
bad practice. You should only have 1 or 2 neighbor relationship between
the same two routers. In a configuration like that on a 6500, the
best-practice is enable "passive-interface default" under the EIGRP process,
and then enable EIGRP on a one or two of the VLAN's using "no
passive-interface vlan 100", "no passive-interface vlan 101", etc. Each
time a route goes away, it sends a query to each of its neighbors, and then
has to wait for a response from each one. If you have 300 neighbor
connections to the same router, it will send a 300 queries to it, and wait
for 300 responses. Since all the queries are going to th same router, it's
going to get back the same answer 300 times.

I'm sure your "melt-downs" are due to SIA's (stuck in actives) which is
directly related to the above issue.

sonic31ss

unread,

Jan 11, 2009, 10:59:05 AM1/11/09

to

On Jan 11, 3:17 am, "Thrill5" <nos...@somewhere.com> wrote:
> "sonic31ss" <sonic3...@comcast.net> wrote in message

Hi Thrill5,

No, the 300 interfaces are mostly GRE tunnels to field routers. And
the SIAs only begin after the CPU reaches 100%. And we all know that
EIGRP does not behave well under CPU load.

Thanks,
Jeff

Stephen

unread,

Jan 11, 2009, 2:07:28 PM1/11/09

to

FWIW if you manage to have a partially stable EIGRP with 300
adjacencies, you have proved in "conditionally" stable under high
load......

GRE tunnels can be a pain with any routing protocol, since typically
the CPU has to maintain state for each tunnel as well as EIGRP, and
when key interfaces get hit with congestion.

You also need to see if there can be contention for GRE tunnel
bandwidth - since EIGRP will eat up to 50% of bandwidth on an
interface, if lots of GRE tunnels contend with each other, you may be
generating your own congestion storm.......

Thrill5 is going the right way - what you need to do is
1. reduce the number of adjacenies if possible,
2. cut down the info that flows across them,
3. cut down the number of peers that get probed for alternate routes
when the topology changes.

i would add:
4. control the EIGRP traffic - reduce EIGRP allowed bandwidth, and
maybe tweak the timers.

You might want to grab one of the cisco books like "large scale IP
network solutions" - this has a good chapter on EIGRP for big hub
routers.
>
>Thanks,
>Jeff

Thrill5

unread,

Jan 11, 2009, 11:41:29 PM1/11/09

to

"Stephen" <stephe...@xyzworld.com> wrote in message
news:q6gkm4d803r3jbnl3...@4ax.com...

If all of the field offices need to go through this central site then an
easy way to fix this is to send only a summary-default to the remotes. If
all the remotes need to go through this site, then they don't need a full
routing table, only a default route. By doing this, you will solve the
pending routes problem because you will be sending only one route instead of
4000. It will also solve the "melt-downs" because the summary default route
will also prevent the router from sending a query when a route goes away.
(If the route that goes away is within the summary route on the interface,
then a query is not sent.) Basically what is happening is that you have
300 adjacencies, and every time a route changes, every router is queried.
The router must wait for a response from each router. This will drive up
the CPU, causing received queries to be lost. If that happens, the router
will then reset the neighbor and need to send the full routing table to that
router, again driving up the CPU, causing more replies to be lost. You then
have what is commonly referred to as a "cascading failure" and all hell
breaks loose.

To enable the summaries, on the interface to the remote router add the
commad:
ip summary-address eigrp <eigrp process number> 0.0.0.0 0.0.0.0 255

For a default summary you must set the admin-distance to 255 or weird things
will happen if your default route goes away for some reason. For summaries
other than a default you should set the admin-distance to 5.

Another thing you could do is change the remotes to a "stub". You can only
do this if the remote is not a transit point for other routing destinations.

sonic31ss

unread,

Jan 12, 2009, 12:49:15 PM1/12/09

to

On Jan 11, 11:41 pm, "Thrill5" <nos...@somewhere.com> wrote:
> "Stephen" <stephen_h...@xyzworld.com> wrote in message

Hi Thrill5,

All good points. I should have pointed out that we are already using
ip summary-address eigrp on most of the EIGRP interfaces. I have to
admit that we did not have the summaries applied evenly and there were
a few neighbors at the time that were receiving the full routing table
that did not need it.

Best regards,
Jeff

sonic31ss

unread,

Jan 12, 2009, 12:57:19 PM1/12/09

to

> stephen_h...@xyzworld.com - replace xyz with ntl

Hi Stephen,

I should have pointed out that none of the EIGRP interfaces were under
load before or during this event.

Also, going back to your earlier point about an EIGRP update storm.
Why would there be an update storm if there were no changes recorded
in syslog or on our management station in the AS?

I appreciate the recommendation for the book. I have several that
cover EIGRP, including a Cisco QOS book that has a chapter on EIGRP.
It had a slightly better definition of Pending Routes;

"The number of routes that are affected by the queries or updates that
are in the queue is displayed as Pending Routes."

Best regards,
Jeff
Best regards,
Jeff

fugettaboutit

unread,

Jan 12, 2009, 2:13:24 PM1/12/09

to

Hey Jeff,

What version of code are you running on the Cats? I ran into an odd
issue with IOS withdrawing entries from the forwarding table, but the
EIGRP route table seemed OK. My issue stemmed from a default network
advertisement from my perimeter being injected into EIGRP. The Cats then
peered with our WAN aggregation devices (7206VXRs). The 7200 would flag
all internal routes as exterior routes (sh ip eigrp top "network
prefix") instead of just the default network. This is a known bug and
was resolved with an IOS update.

The symptoms I saw was that whenever new routes were added or deleted
(turning up a new location, prefix list tuning, new route summarization
advertisements, etc.), I'd lose connectivity to certain or all locations
on the WAN. The route tables *looked* OK and there was nothing in the
logs that indicated a problem. I was able to recover by soft-resetting
the EIGRP process and forcing neighbor adjacencies to be rebuilt. I
don't have nearly as many routes as your talking about and my failure
scenario was different. Nor do I have the GRE issues to contend with,
either. However, I'd be interested to see if your seeing this particular
bug. I may be totally off, but thought I'd throw this your way.

I'm on 12.4(11)T2 on the 7206VXRs, and 12.2(18)SXF7 on the Cats.

Stephen

unread,

Jan 12, 2009, 5:47:34 PM1/12/09

to

i think some events need to be explicitly turned on to be logged so
you may not see changes to neighbours.

eigrp log-neighbour-changes ???

>
>I appreciate the recommendation for the book. I have several that
>cover EIGRP, including a Cisco QOS book that has a chapter on EIGRP.
>It had a slightly better definition of Pending Routes;
>
>"The number of routes that are affected by the queries or updates that
>are in the queue is displayed as Pending Routes."
>
>Best regards,
>Jeff
>Best regards,
>Jeff
>

--
Regards

stephe...@xyzworld.com - replace xyz with ntl

Message has been deleted

sonic31ss

unread,

Jan 13, 2009, 2:01:53 PM1/13/09

to

On Jan 12, 2:13 pm, fugettaboutit <n...@mas.com> wrote:
> Hey Jeff,
>
> What version of code are you running on the Cats? I ran into an odd
> issue with IOS withdrawing entries from the forwarding table, but the
> EIGRP route table seemed OK. My issue stemmed from a default network
> advertisement from my perimeter being injected into EIGRP. The Cats then
> peered with our WAN aggregation devices (7206VXRs). The 7200 would flag
> all internal routes as exterior routes (sh ip eigrp top "network
> prefix") instead of just the default network. This is a known bug and
> was resolved with an IOS update.
>
> The symptoms I saw was that whenever new routes were added or deleted
> (turning up a new location, prefix list tuning, new route summarization
> advertisements, etc.), I'd lose connectivity to certain or all locations
> on the WAN. The route tables *looked* OK and there was nothing in the
> logs that indicated a problem. I was able to recover by soft-resetting
> the EIGRP process and forcing neighbor adjacencies to be rebuilt. I
> don't have nearly as many routes as your talking about and my failure
> scenario was different. Nor do I have the GRE issues to contend with,
> either. However, I'd be interested to see if your seeing this particular
> bug. I may be totally off, but thought I'd throw this your way.
>
> I'm on 12.4(11)T2 on the 7206VXRs, and 12.2(18)SXF7 on the Cats.
>

We are running 12.2(18)SXF7 on the cats.

sonic31ss

unread,

Jan 13, 2009, 2:06:39 PM1/13/09

to

> i think some events need to be explicitly turned on to be logged so
> you may not see changes to neighbours.
>
> eigrp log-neighbour-changes ???
>

We have eigrp logging neighbor changes.

We are also logging line protocol changes on the 6500, which is not on
by default. ;-)

Thanks,
Jeff

Stephen

unread,

Jan 13, 2009, 5:23:17 PM1/13/09

to

have a look at this old networkers presentation:
http://telekomunikacije.etf.bg.ac.yu/cisco/net99/307.pdf

the comments about F/Relay and reducing the bandwidth also apply to a
VPN system.

finally - the update speed depends on reported link speed below 1.5M
all links are "slow WAN", everything else is LAN speed and gets more
frequent hellos and so on.

alexd

unread,

Jan 15, 2009, 8:13:34 AM1/15/09

to

Stephen wrote:

> finally - the update speed depends on reported link speed below 1.5M
> all links are "slow WAN", everything else is LAN speed and gets more
> frequent hellos and so on.

FTR, I assume this is tunable?

--
<http://ale.cx/> (AIM:troffasky) (UnSoEs...@ale.cx)
13:12:52 up 41 days, 15:25, 1 user, load average: 0.36, 0.15, 0.09
Sexy ladies, and nasty boys, all freaky freakin', to the robot noise