Dr. Hart,
Without additional context or the raw data, it would be tough to say that these fits are necessarily bad fits. The upper two curves (with the fits that look like lines of negative slope) look to be the consequence of noisy data, nothing more. I'm not sure if it's possible or correct to filter the incoming data, even using a simple weighted average filter would smooth the data into something with better fit characteristics.
Although, lacking context, it's hard to say what better really is. What in particular are you unhappy about with these fits? What is your goal? Do you know for certain they should be modeled, eg, logistically (because, for example, you know some underlying distribution), or is that a guess based on the data you've seen? If the latter, you may be better using something like a cubic spline interpolator for short term predictive power based on assuming some underlying continuity in the process you're observing.
Additional detail would help!
--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lmfit-py/3f3e243d-1ef6-4c9a-8490-c000d07716ebn%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lmfit-py/942fb9b0-1e43-4f2b-be11-6045301f3384n%40googlegroups.com.
Another possibility, if you understand your sources of noise well, is to do a Monte-Carlo simulation as part of your fitness function. That way, even if you don't know what the exact amount of noise is in a given year, you can keep mixing it in as the solver solves. With enough samples, you might converge faster than expected. At the very least, you can cover a huge amount of scenario variations this way. If you do this, I'd recommend adding the seed to an RNG as a parameter; that way when the solver completes you can rerun with exactly the same noise to see if it passes a sniff test.
I'm hoping someone smarter than me at this will chime in with a simpler way, but I've found this method pretty effective in the past.
Hi Roger,
As Jeremy suggested, understanding and characterizing the fluctuations and uncertainties in the data would be helpful. Like, your first CSV file had 11 significant digits – almost certainly far below “the noise level”. If I understand correctly, this data is “publication year” for articles published on some topic and then selected by “region”. In my experience, these could easily be influenced by the occurrences of related conferences and funding cycles. I might also wonder how clear the meaning of “region” is. It sounds like you are aware of all these and other subtleties.
For fitting itself, having a single model is vastly preferable to different models. If you are trying to show that “Category C” is different from others, using a different model for that data might not be super-persuasive. I suggest thinking about the Model for its ability to “explain the phenomenon”, which is a bit different from “heuristically match the data”. That is, the Model should imply some Theory of why (not just how) the data is changing.
To me, Exponential implies compounding growth, which is different from Accelerating growth. I might suggest a Quadratic model, implying “Offset, Velocity, and Acceleration” of change. Exponential growth implies more of an explosion. It may be a popular notion that a field or technique (or App or Meme or whatever) is experiencing Exponential growth when it is “only” Accelerating (and may reach a constant velocity).
An advantage of a Quadratic model is that you can use Regression methods. We’re all in for non-linear least-squares fitting, but some problems are linear (in the parameters) and so can use Regression.
I would say that none of the data you show is convincingly Logistic, which would imply a Step-like change (which could be possible: Group X entered the funding of this field, or Discovery Z makes certain techniques 10x cheaper/faster. Those things do happen in research sometimes ;) ).
Anyway, good luck with your analysis and findings!
--Matt
To view this discussion visit https://groups.google.com/d/msgid/lmfit-py/59e742a6-17d7-47ba-a42c-26dbdaa1b586n%40googlegroups.com.
--
--Matt Newville <newville at cars.uchicago.edu> 630-327-7411
China Fit Report: [[Model]] Model(variable_exp) [[Fit Statistics]] # fitting method = leastsq # function evals = 35 # data points = 25 # variables = 3 chi-square = 36.6713186 reduced chi-square = 1.66687812 Akaike info crit = 15.5779779 Bayesian info crit = 19.2346053 R-squared = 0.95745519 [[Variables]] A: 2.39023918 +/- 0.55894676 (23.38%) (init = 0.3092917) alpha0: 0.16630843 +/- 0.05183735 (31.17%) (init = 0.1373607) beta: 0.03474102 +/- 0.01687994 (48.59%) (init = 0.01) t0: 2000 (fixed) [[Correlations]] (unreported correlations are < 0.100) C(alpha0, beta) = +0.9822 C(A, alpha0) = -0.9547 C(A, beta) = -0.8846 Europe Fit Report: [[Model]] Model(variable_exp) [[Fit Statistics]] # fitting method = leastsq # function evals = 58 # data points = 25 # variables = 3 chi-square = 400.449145 reduced chi-square = 18.2022339 Akaike info crit = 75.3427739 Bayesian info crit = 78.9994014 R-squared = 0.51212748 [[Variables]] A: 44.4825774 +/- 2.48562521 (5.59%) (init = 44.11792) alpha0: -0.01500366 +/- 0.01171201 (78.06%) (init = 0.02351537) beta: 5.1431e-13 +/- 0.00509658 (990953569712.16%) (init = 0.01) t0: 2000 (fixed) [[Correlations]] (unreported correlations are < 0.100) C(alpha0, beta) = -0.9621 C(A, alpha0) = -0.8282 C(A, beta) = +0.6916 India Fit Report: [[Model]] Model(variable_exp) [[Fit Statistics]] # fitting method = leastsq # function evals = 111 # data points = 25 # variables = 3 chi-square = 4.12707796 reduced chi-square = 0.18759445 Akaike info crit = -39.0326546 Bayesian info crit = -35.3760271 R-squared = 0.89166040 [[Variables]] A: 0.14999824 +/- 0.11314413 (75.43%) (init = 0.1334867) alpha0: 0.14241291 +/- 0.08476284 (59.52%) (init = 0.07943498) beta: 1.1869e-12 +/- 0.00632034 (532515755435.27%) (init = 0.01) t0: 2000 (fixed) [[Correlations]] (unreported correlations are < 0.100) C(alpha0, beta) = -0.9867 C(A, alpha0) = -0.9716 C(A, beta) = +0.9217 North America Fit Report: [[Model]] Model(variable_exp) [[Fit Statistics]] # fitting method = leastsq # function evals = 21 # data points = 25 # variables = 3 chi-square = 434.553944 reduced chi-square = 19.7524520 Akaike info crit = 77.3861066 Bayesian info crit = 81.0427341 R-squared = 0.42938253 [[Variables]] A: 43.4547080 +/- 3.20150939 (7.37%) (init = 41.81545) alpha0: -0.03129501 +/- 0.02964241 (94.72%) (init = 0.01542437) beta: 0.04975881 +/- 0.08146205 (163.71%) (init = 0.01) t0: 2000 (fixed) [[Correlations]] (unreported correlations are < 0.100) C(alpha0, beta) = -0.9673 C(A, alpha0) = -0.8368 C(A, beta) = +0.7006
You received this message because you are subscribed to a topic in the Google Groups "lmfit-py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lmfit-py/nUJLZMM3mmI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lmfit-py/D635B8B0-CCC0-0440-B8FF-AD5FE07619EF%40hxcore.ol.
To view this discussion visit https://groups.google.com/d/msgid/lmfit-py/CANji%2Bq0ryHPiDkjmR3R3RqLRbp9t5mO9ck6%3DGLK3hh2rJ2_SxA%40mail.gmail.com.