I have 2 brand new switches 1 Netgear GS748tp and a Netgear GS724p. These devices are new out of the box, haven't gone in and done any configurations on them. I have all my pc and other device plugged into the GS748 and everything works fine. I take a standard straight through cable and plug the GS724 into the GS748 and the newtork goes down. It's like I've created a loop with 2 hubs. Never seen this happen with switches. Any ideas?
Problem has been fixed (no thanks to netgear support). By defualt Spanning Tree State is set as enable. I turned it off on one switch while leaving it enabled on the main switch and that fixed the issue. I'm curious to see if I were to leave it enable on both switches and let them sit for a few hours if the problem would go away. Thinking both switches where trying to take control of the network causing DoS type traffic that would have cleared up after a while.
I've not seen this either and it does sound like it's a loop. If there is a way to fix the ports speed and duplex then do that on each port on each side ... I'm guessing these are Gigabit switches, so try and set to 1000 and full duplex and see what happens.
Also check to see if the same happens if you use two of the mid ports i.e. port 3 on both switches just in case the port you are using has some strange config/attributes, and lastly of course try a different cable :)
It will happen with switches; I don't know these models but you may want to check the correct configurations for connecting these; i.e is there a switch problem or have you found the ports you should not use?
With non-managed switch or at least those that do not have some variant of STP (spanning tree protocol) if you have multiple connections between switches, those multiple connections will allow forwarding broadcasts in a loop over the additional links until there is so much traffic being forwarded between the two that legitimate traffic cannot be processsed reliably. Tho9ugh it sounds like this would take hours, in reality, you'll notice an impact within seconds.
Like everyone said, a loop somewhere. I've had exactly the same problem with the same model switches. If your users can handle the interruption, while you double-check each important cable in the connection?
Researching right now. I do have a layer 2 and a couple of dumb switches(little 6 port) plugged into the GS748 with no issues. Is this a layer 3 to layer 3 switching issue. Never worked with layer 3 switches before. Looked on the GS748 web admin pages and not seeing much that looks like it needs to be changed.
I would get rid of any dumb switches as well. Check STP, use a cross-over preferably. If you are planning on redundant connections, create a link aggregation. If you just plug in the two switches, does that freeze the network as well? Maybe something weird is plugged into the first GSM. (seen it before, and I had to go and look for all dumb switches in three buildings, and unplug them)
I know you said fresh out of the box, but could the new switch already have a layer 3 configuration to it that is conflicting with you existing network. I know its unlikely, but 2 devices claiming to be the default gateway would be bad.
If the dumb switches are causing the problem wouldn't I've seen the network issues once I plugged them into the GS748? I installed that switch this past Sunday and the problem only happens when the GS724 is plugged in.
You might try changing the order in the STP protocol. What might be happening is both switches keep fighting to be the boss. I know with my Cisco switches they have a way to "negotiate" their pecking order. I'd bet that both Netgear's have the same default number. Change one of the STP numbers and I'd bet it will work like a champ.
Hey David, I know it looks like you've solved your problem, however I don't think disabling STP is a solution, especially when you decide to expand your network by adding more switches in the future. When I read the title to your question, my first thought was the "BPDU guard" (that's the Cisco-speak at least).
If the switches are set up to automatically set their ports as access ports attached to the native VLAN, then it's likely that the port(s) would shutdown when they receive a BPDU on an access port; this is a common security mechanism to protect against attaching rogue devices to your network. Technically that would mean that only the port connecting the two switches should be shutdown, but then again this may be a vendor-specific reaction.
Setting up the common ports between the two switches as trunk ports (i.e. tagged ports) and creating a separate VLAN for your client devices other than VLAN 1 may be a good practice. This way VLAN 1 would be your "management" VLAN and VLAN x would be your client/access VLAN, and so your inter-switch traffic, would be segregated from your client traffic on the network.
Another option to consider is that you said that you're connecting the switches with a straight-through cable, while (if I'm not mistaken) switches should be connected to one another with cross-over cables since they are devices with the same interface pin-out. This may imply that your switches don't have MDI-X auto-sensing, and so that could be where you're getting that "loop" from.
If you can provide the running config for each switch, that will help a lot. On a whim, since I only work on Cisco Catalyst and HP ProCurve, I'll assume the default configuration of the layer 3 interfaces is causing an IP routing conflict, such that both are using the same IP address. This plays havoc with ARP tables and bridge tables. Another culprit is likely the default config of the new switch may be conflicting with the DHCP server and/or relay agents on the existing network.
NOTE - I know this is a bit of a vague, useless, fluffy question at the moment because I have little idea on the inputs that would affect the speed of a context switch. Perhaps statistical answers would help - as an example I'd guess >=60% of threads would take between 100-10000 processor cycles to switch.
Thread switching is done by the OS, so Java has little to do with it. Also, on linux at least, but I presume also many other operating systems, the scheduling cost does not depend on the number of threads. Linux has been using an O(1) scheduler since version 2.6.
The thread switch overhead on Linux is some 1.2 µs (article from 2018). Unfortunately the article doesn't list the clock speed at which that was measured, but the overhead should be some 1000-2000 clock cycles or thereabout. On a given machine and OS the thread switching overhead should be more or less constant, not a wide range.
Apart from this direct switching cost there's also the cost of changing workload: the new thread is most likely using a different set of instructions and data, which need to be loaded into the cache, but this cost doesn't differ between a thread switch or an asynchronous programming 'context switch'. And for completeness, switching to an entirely different process has the additional overhead of changing the memory address space, which is also significant.
By comparison, the switching overhead between goroutines in the Go programming language (which uses userspace threads which are very similar to asynchronous programming techniques) was around 170 ns, so one seventh of a linux thread switch.
Whether that is significant for you depends on your use case of course. But for most tasks, the time you spend doing computation will be far more than the context switching overhead. Unless you have many threads that do an absolutely tiny amount of work before switching.
Threading overhead has improved a lot since the early 2000s, and according to the linked article running 10,000 threads in production shouldn't be a problem on a recent server with a lot of memory. General claims of thread switching being slow are often based on yesteryears computers, so take those with a grain of salt.
One remaining fundamental advantage of asynchronous programming is that the userspace scheduler has more knowledge about the tasks, and so can in principle make smarter scheduling decisions. It also doesn't have to deal with processes from different users doing wildly different things that still need to be scheduled fairly. But even that can be worked around, and with the right kernel extensions these Google engineers were able to reduce the thread switching overhead to the same range as goroutine switches (200 ns).
df19127ead