Hi David,
Thank you for taking the time to comment. I see from your Univ of Birmingham page that you are trained in Physics and I in Statistics. This may explain the differences in your approach to comparing forecasts to ours. Scott did not mention your earlier communications with him as we prepared this paper so it must not have made much of an impact on his thinking.
Here are my reactions to your thoughts and analyses. I am being detailed so that I am as clear as possible in my response. My intention is to facilitate a scholarly exchange rather than the conflictual debates that so often occur on lists. HamSCI has fortunately not had that culture!
a. We used well-accepted methods in statistics and econometrics for comparing the relative accuracy of two time series for a given horizon. This design fits the desired comparison of the first half of Cycle 25 rather than picking-and-choosing features of each as you have done to create your graphs.
b. I do not think our approach is disingenuous at all as you claim. Disingenuous means "not candid or sincere, typically by pretending that one knows less about something than one really does." Instead, it is likely because of your training in Physics and not statistics, you think we are intentionally "ignorant" about the topic. Instead, it may be because you are not aware of the accepted methods in statistics and econometrics. In other words, you suffer from being wholly in your own discipline's paradigm on the topic and cannot see the merits of methods and approaches from others ("boundary maintenance"). When I consulted with physicists and engineers over the years, I ran into this frequently. I characterize it as the burden of FFTs: everything looks like that problem! Not really, but you get my point. There can be multiple approaches to the same problem without either being legitimately judged as incorrect as they are presented.
c. The conventional methods for comparing the forecast accuracy of two time series goes back to the 1960s with Henri Theil's classic text, cited in our paper. What we were doing is a comparison for the approximate first-half of the Cycle 25 period of SSNs (and SFI). Thus, we chose those starting and ending points as shown in Figure 2 of our paper. We put them on the same "apples to apples" time horizon to compare their respective forecast errors. This favors neither team's forecast.
You have ginned-up a different look at it and that is fine but it does not demonstrate that our conventional statistical approach is incorrect. You show an entire Cycle whereas we are focusing only on the first-half of Cycle 25. We do plan another study once Scott and I are satisfied (him more than me) that C25 is complete and the "second half" competition is complete for the full Cycle.
d. I taught data viz for a number of years and was awarded grants for computational hardware to pioneer some approaches for the US Dept of Agriculture during the 1990s. As I taught engineering (and a few physics) students who came to my classes back then (no one in their departments taught this subject), I emphasized that data viz is an inductive method (see John Tukey's pioneering work). It can allow one to legitimately see different things in a given visualization. Your graph, which you say is "generally quite good," is an inductive assessment, not one that formally compares the time series per se. This is what we have done here which, frankly, is a more rigorous comparison than an ad hoc graphic, subject to interpretation.
As Theil showed as far back as the mid-1960s (cited in our paper), a statistical comparison of the forecast errors of the two series should accompany any data viz of them. Even Tukey would say that. We did such by using the MAE and MAPE for the first half of Cycle 25. That is the stated scope of our paper. We stand by that approach. It is not disingenuous as you state.
e. We gave NASA/NOAA/ISED's published forecasts ("official" ones) the benefit of their "false start" and used their revised forecast. They published this without comment as they did for their original forecast. This is something highly counter to good science, not revealing methods and such, which we note in our RadCom papers. McIntosh's team has done this in accordance with sound scientific norms. His team also has a developing theory of why the rise and fall occurs whereas the NNI offers nothing, except a notation they compared some 60+ models sent to them (in our RadCom papers). So our comparisons are for a fixed term of the approximate first-half of Cycle 25 for SSN and SFI with an advantage given to the NNI team through their "false start" adjustment.
f. NNI had no knowledge of the McIntosh team's concept of the Terminator and the timing of the Terminator as an underlying force in shaping their forecast...at either their first edition or the revised six-month starting point version. So there is no basis for your suggestion to make both forecasts on some handicapped time horizon. This would render a "rigged comparison" that honors neither forecast. Our approximation of the first half is independent of how either team designed their forecast in terms of the year and month. You and others may dicker with our specific dates but they are the same "apples to apples" time horizon for both forecasts.
I want to thank you again for taking the time to comment on our paper. I openly encourage you to conduct your own comparison of the two forecasts using your thoughts expressed in your post to this list and publish them with sufficient documentation for all to see. This is what we have done.
We are in agreement about the value of others validating these things. We just use an approach long accepted in statistics and econometrics as well as the newer citation analysis procedures ("bibliometrics") used to track change in science. I've used these methods to advise both NIH and NSF to identify promising areas to create grant announcements. You choose to use more ad hoc methods that you favor. Let's see more of them!
Best regards,
Frank
K4FMH