--
You received this message because you are subscribed to the Google Groups "mgo-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mgo-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Travis,What error are you actually getting most of the time, and how often that is? All of the errors you report are actual issues being reported by the socket itself, and all of them are not really big issues if seen every once in a while only. Also, have you looked at the server logs to have an idea of what the server reports in these situations?Then, a few side notes about that code:- Please never ignore errors without a great reason to do so (see DialWithTimeout). As a rule of thumb, if an error is being ignored, there should be a comment explaining why.
- There are concurrency bugs in that code. A variable cannot be read/written by multiple goroutines without synchronization (see stop).
- Why Eventual? It doesn't look like the application is organized to use this mode in a correct way. Start with Strong, and walk your way from there once you understand better the consequences of that choice.
- The "err != nil &&" part is unnecessary when checking if err == mgo.ErrNotFound or IsDup. These will already imply not nil.
Hi Travis,
On Sep 10, 2015 11:53 PM, "Travis Beauvais" <tbea...@gmail.com> wrote:
>
> When I turn n up high (more workers) to like 5k, after a minute or so i get a bunch of connection errors and then it stops and resumes work. After a few more minutes I get a bunch of connection errors and work resumes.
And one minute is your connection timeout, isn't it? It might just be overloading the server enough that it is unable to handle everyone's requests in the established timeframe.
How are the errors presented? Which method returns it, and after doing what?
What exactly is in the server logs and in the client logs when this happens?
> Keep in mind this is with 5k workers. Should be more than 5k connection (give or take a few). If I turn down the number of workers to like 2k, it all works fine. No connection errors. The number of master conns vs socks alive is always insync. Only when I start getting the errors do these 2 nums get out of sync.
It's natural for them to not be in sync. It just means that more connections were ever established to the server than are open right now, which exactly what should happen when there's any issue with an existing connection and a new one is open in its place.
gustavo @ http://niemeyer.net