Ok, so I've been bashing my head off the desk (a bit) today, trying to
figure out why I can't get the system working. After a considerable
amount of effort, I've resorted to asking here. Here is where I stand:
1) StarQuery is connected to the vxodbc driver (thanks Peter)
2) The driver does connect to dbus via port 5432
3) Versaplexd does connect to dbus on the same port (5432)
4) A LIST TABLES request from StarQuery does get all the way to
Versaplexd. Versaplex shouts that it's getting asked to list tables and
then proceeds to attempt to contact MSSQL
This is where it breaks down. It seems to not manage to get a reply from
MSSQL and does not report back to dbus. So I looked around the code a
bit, and there seems to be an incriminating bit of instructions along
the lines of:
else if (entry.Method == "tcp")
{
string host = "127.0.0.1";
string port = "5555";
Console.WriteLine("Connecting via TCP: {0}:{1}",host,port);
entry.Properties.TryGetValue("host", out host);
entry.Properties.TryGetValue("port", out port);
socket = OpenTcp(host, Int32.Parse(port));
}
I'm having trouble tracking down what triggers this code being run, but
if I had to guess from the output (I added the writeline line), I would
guess that versaplex first connects to dbus using the abstract method:
if (entry.Method == "unix")
{
....
}
And I know what triggers this first call (the main method).
Then "something else" connects to "something else" on tcp, using the
hard-coded values of localhost:5555. I find this bizarre behaviour, but
I am left assuming that whoever set it up was testing on a local
machine.
So I tried to change the values to my test machine, making this look
like (note, that is the ip of my vm and the port is the port MSSQL is
on):
else if (entry.Method == "tcp")
{
string host = "10.65.1.141";
string port = "1433";
Console.WriteLine("Connecting via TCP: {0}:{1}",host,port);
entry.Properties.TryGetValue("host", out host);
entry.Properties.TryGetValue("port", out port);
socket = OpenTcp(host, Int32.Parse(port));
}
However, this seems to heed no results. I'm rather confused why my
system is not carrying through.
Furthermore, Avery and I have encountered a funny and strange error when
using StarQuery's "Test Connection" button when configuring the driver,
etc.
If you click it once, it says "Success!"
If you click it twice, it says "Success!"
If you click it a third time, it says "Catastrophic Failure!"
Then the cycle repeats, making it fail every third time only. When it's
not failing, dbus reports a happy initialization. Sometimes it reports a
happy shutdown too, but sometimes it gets its connection reset by peer.
My world is falling apart.
Thanks :)
-AT
It sounds like your connection to MSSQL is down. That bit of code is
indeed extremely suspicious, but it's actually not related to
connecting to MSSQL. If you check the users of the DodgyTransport
class (very well named, BTW), you'll see that it's used by Versaplex
to create its DBus connection. So if your DBus connection is actually
working, then it's best to not fiddle with this.
As an aside, until this code gets made sensible, this means that it
makes sense to run your DBus daemon on port 5555, on the same machine
as Versaplex. That's what I happened to have been doing when testing,
so we may get divergent results due to that.
> I'm having trouble tracking down what triggers this code being run, but
> if I had to guess from the output (I added the writeline line), I would
> guess that versaplex first connects to dbus using the abstract method:
>
> if (entry.Method == "unix")
> {
> ....
> }
>
> And I know what triggers this first call (the main method).
Aside: if you want to know what's calling what from where, it's best
not to guess too much. If you wanted to be hacky, you could just
throw an exception from the places you're curious about; if you run
the code with "mono --debug" you'll get line numbers in the stack
trace too. If that doesn't work out, you can run mono with a --trace
parameter, which tells Mono to print a trace of all the functions it
enters. This can be a bit of a firehose by default, but you can trim
down the traced namespaces and classes. See
http://www.mono-project.com/Debugging
> Then "something else" connects to "something else" on tcp, using the
> hard-coded values of localhost:5555. I find this bizarre behaviour, but
> I am left assuming that whoever set it up was testing on a local
> machine.
There's a fair bit of that going around. See below.
> So I tried to change the values to my test machine, making this look
> like (note, that is the ip of my vm and the port is the port MSSQL is
> on):
>
> else if (entry.Method == "tcp")
> {
> string host = "10.65.1.141";
> string port = "1433";
> Console.WriteLine("Connecting via TCP: {0}:{1}",host,port);
> entry.Properties.TryGetValue("host", out host);
> entry.Properties.TryGetValue("port", out port);
> socket = OpenTcp(host, Int32.Parse(port));
> }
>
> However, this seems to heed no results. I'm rather confused why my
> system is not carrying through.
As I mentioned, trying to point DBus at the MSSQL port, while
entertaining, isn't going to get you far :)
The problem is likely due to your database user configuration.
There's some lovely hard-coded connection information hiding in
VxSqlPool.cs; this what I have to extract into a config file sometime
soon. For now, it's probably easiest to create a database user,
password, and db to match that hardcoded stuff; it means you won't
accidentally commit a change that'll break my setup, and also that
information is duplicated in the unit tests for now (*sigh*) so you'd
have to be sure you got it all.
Speaking of which, you'll find life gets a zillion times easier when
you're only trying to debug one of these things at once. To that end,
instead of trying to test Versaplexd by calling it from StarQuery, you
should check it with the unit tests. My setup was to run "wvdbusd
-vvvv tcp:5555" in one window, run
"DBUS_SESSION_BUS_ADDRESS=tcp:host=192.168.207.1,port=5555 mono
--debug ./versaplex.exe" in the next, and in a third, run
"DBUS_SESSION_BUS_ADDRESS=tcp:host=192.168.207.1,port=5555 mono
--debug t/versaplex.test.exe". If that doesn't work, then you've
greatly narrowed down your range of possible problems; in fact, even
if it works, you've still greatly narrowed it down.
There should be a fair bit of debug information you can distill from
all the console output. For reading the unit test output, you can
pipe it to wv/wvstreams/wvtesthelper to get a graphical summary, or
you can pipe it to wv/wvstreams/wvtestcolour.pl to get some nice
colour-coded output (I put that one in my path, it's really helpful).
Once you know which unit test is failing, it's easiest to comment out
all the other tests, to both save time and make the output more
legible. I wish there was a better way of picking which tests to run,
but for now it's all there is.
BTW, 192.168.207.1 was just the IP address that my VMware was using,
so it was where I connected to wvdbusd just out of habit. For the
local connections, it could have been 127.0.0.1.
> Furthermore, Avery and I have encountered a funny and strange error when
> using StarQuery's "Test Connection" button when configuring the driver,
> etc.
>
> If you click it once, it says "Success!"
> If you click it twice, it says "Success!"
> If you click it a third time, it says "Catastrophic Failure!"
>
> Then the cycle repeats, making it fail every third time only. When it's
> not failing, dbus reports a happy initialization. Sometimes it reports a
> happy shutdown too, but sometimes it gets its connection reset by peer.
>
> My world is falling apart.
That sounds insanely... insane. I have no real idea why it would
possibly do that. StarQuery is a bit of an enigma, ODBC-wise. Just
another reason why you shouldn't be using it to debug your Versaplexd
issues. The (very, very) good news is that it's reproducible. Once
you know Versaplexd is working right, you can start looking into this;
I might start by turning on the ODBC tracing in the ODBC
administration dialog, and by looking at the vxodbc logs of course (if
you haven't found those yet: mkdir c:\temp, and they'll start getting
written to there). It's possible the "Test Connection" button is just
completely insane and shouldn't be used, or it's possible vxodbc is
doing something ill-advised with the calls the button makes.
Peter.
One point of interest:
The insanely insane bit at the end actually only behaves that way "once
in a while." I'm not really sure what triggers the once in three error,
but it doesn't happen all the time. The good news is that it happens
often enough during regular testing.
Furthermore, I scanned the code for the word catastrophic.
Interestingly, it's not found in that context. Avery suggests that it
could be vxodbc crashing and that error being caught and passed along. I
don't know.
Finally, the freaky tcp/unix connection code does something else that I
don't like:
"""
Adding interface com.versabanq.versaplex.db
Connecting to 'tcp:host=localhost,port=5432'
Connecting via TCP: 10.65.1.141:1433
"""
What exactly is it doing here? (Note: You only see this output because I
put that aforementioned WriteLine into the dodgytransport code.
Regardless, the above shows us that both entry method unix abstract and
entry method tcp are used)
-AT
It's not necessarily a crash, but yeah, when StarQuery gets an ODBC
error it wasn't expecting, especially when setting itself up, it gives
you a message like that.
> Finally, the freaky tcp/unix connection code does something else that I
> don't like:
>
> """
> Adding interface com.versabanq.versaplex.db
> Connecting to 'tcp:host=localhost,port=5432'
> Connecting via TCP: 10.65.1.141:1433
> """
>
> What exactly is it doing here? (Note: You only see this output because I
> put that aforementioned WriteLine into the dodgytransport code.
> Regardless, the above shows us that both entry method unix abstract and
> entry method tcp are used)
How exactly does this output show us that both the unix and tcp
methods are being used? I just see TCP connections. And since
wvdbusd doesn't listen to Unix domain sockets, it's unlikely that any
Unix connections are useful.
Note that Versaplex actually connects to DBus a couple of times:
there's the initial connection to register its name, and then I
believe it makes a second connection to retrieve the Unix UID when
it's trying to validate a query (this is related to that discussion
that Avery and I were having before). It may not be relevant, but
just don't be surprised if you see that.
Peter.
Sorry for the confusion.
-AT
Note that the above is screwy, but only because the WriteLine comes
before the TryGetValue lines. So it basically lies about what it's
going to connect to. WriteLine should come right before OpenTcp.
> I'm having trouble tracking down what triggers this code being run, but
> if I had to guess from the output (I added the writeline line), I would
> guess that versaplex first connects to dbus using the abstract method:
>
> if (entry.Method == "unix")
> {
> ....
> }
>
> And I know what triggers this first call (the main method).
There is no "first" and "second" try. It just does what the
DBUS_SESSION_BUS_ADDRESS says to do:
tcp:host=FOO,port=BLAH
unix:abstract=WHATEVER
entry.Method is either tcp or unix based on the prefix of the string.
Then entry.Properties ends up containing key-value pairs for host and
port, or for abstract, or for anything else passed in the
DBUS_SESSION_BUS_ADDRESS.
> Then "something else" connects to "something else" on tcp, using the
> hard-coded values of localhost:5555.
How are you seeing the connection?
> So I tried to change the values to my test machine, making this look
> like (note, that is the ip of my vm and the port is the port MSSQL is
> on):
>
> else if (entry.Method == "tcp")
> {
> string host = "10.65.1.141";
> string port = "1433";
> Console.WriteLine("Connecting via TCP: {0}:{1}",host,port);
> entry.Properties.TryGetValue("host", out host);
> entry.Properties.TryGetValue("port", out port);
> socket = OpenTcp(host, Int32.Parse(port));
> }
>
> However, this seems to heed no results. I'm rather confused why my
> system is not carrying through.
Again here, the WriteLine is lying if your DBUS_SESSION_BUS_ADDRESS
includes valid host and port values, because they override the
hardcoded defaults.
Have fun,
Avery
This isn't actually true, is it? There's no reason I can imagine that
we'd need more than one connection to the bus. There might be more
than one message in-flight at once, but that's totally different.
Thanks,
Avery
It may not actually be true. I might be thinking of my troubles
trying to send messages; when I was having trouble re-using the
existing connection (and thought it was due to blocking problems), at
one point I tried making a second bus connection. That said, for all
I know the crazy bus method proxying business may make a second
connection.
Peter.
-AT
In looking at the hardcoded values, it seems that the original intent is
to get a lot of the data from the dbus.
Values:
conStr.DataSource = "amsdev";
conStr.UserID = "asta";
conStr.Password = "m!ddle-tear";
conStr.InitialCatalog = "adrian_test";
Connection String:
ATABASE=adrian_test;SERVER=10.65.1.133;PORT=1234;UID=asta;ReadOnly=0;
I'm assuming that all of the data should probably be passed over dbus
eventually. I'm not so sure that a config file is the way to go. Sorry
if I misunderstand.
-AT
I have changed the bit of code in VxSqlPool.cs to look like this (to
give me a better idea of how to set up my database and the workings
of .net):
"""
conStr.DataSource = "amsdev";
conStr.UserID = "asta";
conStr.Password = "m!ddle-tear";
conStr.InitialCatalog = "adrian_test";
System.Console.WriteLine("Connection String: " +
conStr.ConnectionString);
"""
So it seems Versaplexd received the message:
Header data:
0000 6c 01 00 01 10 00 00 00 ef 03 00 00 9b 00 00 00 l...............
0010 01 01 6f 00 1b 00 00 00 2f 63 6f 6d 2f 76 65 72 ..o...../com/ver
0020 73 61 62 61 6e 71 2f 76 65 72 73 61 70 6c 65 78 sabanq/versaplex
0030 2f 64 62 00 00 00 00 00 06 01 73 00 17 00 00 00 /db.......s.....
0040 63 6f 6d 2e 76 65 72 73 61 62 61 6e 71 2e 76 65 com.versabanq.ve
...
But It never output the Connecton String line that I put in. If that is
the case, then I would assume that it never used the connection data.
How come?
-AT