My Logistic Regression test program has blown-up, please help

39 views
Skip to first unread message

charle...@gmail.com

unread,
Nov 23, 2014, 1:42:29 PM11/23/14
to accor...@googlegroups.com
Hi Ceasar:

I'm just discovering your Machine Learning Framework, and my first experiment has given bad results.

What I have done so far:

1. I am running on an Intel I7 Windows 8.1 machine. (64-bit)
2. I installed Accord.MachineLearning 2.13.1 by doing:
PM> Install-Package Accord.MachineLearning
3. I built the enclosed test program.

The program ran with the following results:


Full
Confidence (lower): 0
Confidence (upper): Infinity
Odds ratio: 0
Standard error: 6747666295503.39
Confidence (lower): 0
Confidence (upper): Infinity
Odds ratio: 109.106224189417
Standard error: 25672908706.8628
Confidence (lower): 0
Confidence (upper): Infinity
Odds ratio: 1.74087592478731E-17
Standard error: 197879553602.116
Confidence (lower): 0
Confidence (upper): Infinity
Odds ratio: 1.33208398486837E+67
Standard error: 853778008903.736

Best
Confidence (lower): 0
Confidence (upper): Infinity
Odds ratio: 0
Standard error: 2362690.45450474
Confidence (lower): 0
Confidence (upper): Infinity
Odds ratio: 8.59270339542076E+187
Standard error: 15121218.9088303

ChiSquare best model likehood ratio p-value: 1

Nested
Confidence (lower): 0
Confidence (upper): Infinity
Odds ratio: 0
Likelihood Ratio:
Standard error: 2362690.45450474
Confidence (lower): 0
Confidence (upper): Infinity
Odds ratio: 8.59270339542076E+187
Likelihood Ratio:
Standard error: 15121218.9088303

new patient has died with probability: 1

It appears that the proram has blown-up.
Any suggentions will be greatly appreciated.

Charles

--------------- test program ------------------------

using System;
using Accord.Statistics.Analysis;

namespace AccordLogisticRegression {
internal class Program {
private static void Main() {

double[][] inputs = {
// Age Sex Chol
new[] {48, 1, 4.40 },
new[] {60, 0, 7.89 },
new[] {51, 0, 3.48 },
new[] {66, 0, 8.41 },
new[] {40, 1, 3.05 },
new[] {44, 1, 4.56 },
new[] {80, 0, 6.91 },
new[] {52, 0, 5.69 },
new[] {58, 0, 4.01 },
new[] {58, 0, 4.48 },
new[] {72, 1, 5.97 },
new[] {57, 0, 6.71 },
new[] {55, 1, 5.36 },
new[] {71, 0, 5.68 },
new[] {44, 1, 4.61 },
new[] {65, 1, 4.80 },
new[] {38, 0, 5.06 },
new[] {50, 0, 6.40 },
new[] {80, 0, 6.67 },
new[] {69, 1, 5.79 },
new[] {39, 0, 5.42 },
new[] {68, 0, 7.61 },
new[] {47, 1, 3.24 },
new[] {45, 1, 4.29 },
new[] {79, 1, 7.44 },
new[] {41, 1, 4.60 },
new[] {45, 0, 5.91 },
new[] {54, 0, 4.77 },
new[] {43, 1, 5.62 },
new[] {62, 1, 7.92 },
new[] {72, 1, 7.92 },
new[] {57, 1, 6.19 },
new[] {39, 1, 2.37 },
new[] {51, 0, 5.84 },
new[] {73, 1, 5.94 },
new[] {41, 1, 3.82 },
new[] {35, 0, 2.35 },
new[] {69, 0, 6.57 },
new[] {75, 1, 7.96 },
new[] {51, 1, 3.96 },
new[] {61, 1, 4.36 },
new[] {55, 0, 3.84 },
new[] {45, 1, 3.02 },
new[] {48, 0, 4.65 },
new[] {77, 0, 7.93 },
new[] {40, 1, 2.46 },
new[] {37, 1, 2.32 },
new[] {78, 0, 7.88 },
new[] {39, 1, 4.55 },
new[] {41, 0, 2.45 },
new[] {54, 1, 5.62 },
new[] {59, 1, 5.03 },
new[] {78, 0, 8.08 },
new[] {56, 1, 6.96 },
new[] {49, 1, 3.07 },
new[] {48, 0, 4.75 },
new[] {63, 1, 5.64 },
new[] {50, 0, 3.35 },
new[] {59, 1, 5.08 },
new[] {60, 0, 6.58 },
new[] {64, 0, 5.19 },
new[] {76, 1, 6.69 },
new[] {58, 0, 5.18 },
new[] {48, 1, 4.47 },
new[] {72, 0, 8.70 },
new[] {40, 1, 5.14 },
new[] {53, 0, 3.40 },
new[] {79, 0, 9.77 },
new[] {61, 1, 7.79 },
new[] {59, 0, 7.42 },
new[] {44, 0, 2.55 },
new[] {52, 1, 3.71 },
new[] {80, 1, 7.56 },
new[] {76, 0, 7.80 },
new[] {51, 0, 5.94 },
new[] {46, 1, 5.52 },
new[] {48, 0, 3.25 },
new[] {58, 1, 4.71 },
new[] {44, 1, 2.52 },
new[] {68, 0, 8.38 }
};

double[] output = {
0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1
};

var regression = new StepwiseLogisticRegressionAnalysis(
inputs,
output,
new[] { "Age", "Sex", "Chol" }, "Died"
);

regression.Compute();

var full = regression.Complete;
Console.WriteLine("\nFull");
foreach (var item in full.Coefficients) {
Console.WriteLine("Confidence (lower): " + item.ConfidenceLower);
Console.WriteLine("Confidence (upper): " + item.ConfidenceUpper);
Console.WriteLine(" Odds ratio: " + item.OddsRatio);
Console.WriteLine(" Standard error: " + item.StandardError);
}

var best = regression.Current;
Console.WriteLine("\nBest");
foreach (var item in best.Coefficients) {
Console.WriteLine("Confidence (lower): " + item.ConfidenceLower);
Console.WriteLine("Confidence (upper): " + item.ConfidenceUpper);
Console.WriteLine(" Odds ratio: " + item.OddsRatio);
Console.WriteLine(" Standard error: " + item.StandardError);
}

var test = best.ChiSquare;
Console.WriteLine("\nChiSquare best model likehood ratio p-value: " + test);
// If the model is distinguishable from a null model. We can also
// query the other nested models by checking the Nested property:
Console.WriteLine("\nNested");
foreach (var item in best.Coefficients) {
Console.WriteLine("Confidence (lower): " + item.ConfidenceLower);
Console.WriteLine("Confidence (upper): " + item.ConfidenceUpper);
Console.WriteLine(" Odds ratio: " + item.OddsRatio);
Console.WriteLine(" Likelihood Ratio: " + item.LikelihoodRatio);
Console.WriteLine(" Standard error: " + item.StandardError);
}

// Finally, we can also use the analysis to classify a new patient
double y = regression.Current.Regression.Compute(new[] { 72, 2, 6.38 });
Console.WriteLine("\nnew patient has died with probability: " + y);
Console.ReadKey();
}
}
}
Program.cs
Message has been deleted

César

unread,
Nov 26, 2014, 3:54:30 PM11/26/14
to accor...@googlegroups.com
Hi Charles,

Thank you very much for posting the issue, including the detailed program to reproduce it. I am investigating what might be going on, but at first I would say this would be a problem due to multicollinearity in the input variables.

I will open an issue in the issue tracker and investigate. Thanks again!

Best regards,
Cesar

charle...@gmail.com

unread,
Nov 27, 2014, 4:00:23 PM11/27/14
to accor...@googlegroups.com
Hi Cesar,

I got this data from a program my James McCaffrey at:

http://msdn.microsoft.com/en-us/magazine/jj618304.aspx?utm_source=rss&utm_medium=rss&utm_campaign=test-run-coding-logistic-regression-with-newton-raphson

I wanted to compare the resuls.

Charles

César

unread,
Nov 28, 2014, 4:51:53 PM11/28/14
to accor...@googlegroups.com
Hi Charles,

The problem in this case is that this problem is perfectly separable, and as such, has many possible solutions. In any case, the solution given by the framework is just one of them. I am adding support for regularization in the standard logistic regression learning, which can address those large coefficients. However, the answers given by the framework are still valid in the sense that the generated classifier is still able to classify correctly 100% of your data set.

In order to verify this, you can, for instance, use the following code:

double[][] input = data.Submatrix(null, 0, 2).ToArray();
double[] output = data.GetColumn(3);

LogisticRegression regression = new LogisticRegression(3);

var teacher = new IterativeReweightedLeastSquares(regression);

var errors = new List<double>();
for (int i = 0; i < 1000; i++)
    errors
.Add(teacher.Run(input, output));

double error = 0;
for (int i = 0; i < output.Length; i++)
{
   
double expected = output[i];
   
double actual = System.Math.Round(regression.Compute(input[i]));

   
if (expected != actual)
        error
++;
}

error
/= output.Length;

In the end, you should see that error will be equal to zero.

Hope it helps!

Best regards,
Cesar
Reply all
Reply to author
Forward
0 new messages