Training is done on the whole dataset rather than X_train, y_train in "Star/Quasar Classification ROC Curves" example

Kyriakos Stylianopoulos

unread,

Nov 12, 2019, 12:54:38 PM11/12/19

to astroML-general

The python code of the "Star/Quasar Classification ROC Curves" example splits the data set in training and test parts but inside the `compute_results()` function, the whole data set is used for training each classifier.

I am attaching the source code file as downloaded from the link in the webpage (12 Nov 2019).

The fix is trivial: Change line 90 from

model.fit(X, y)

to

model.fit(X_train, y_train)

As a further notice, the resulting ROC plot on the webpage needs to be updated as well due to the changes in the results introduced by the fix.

I was not able to find the source code of that file on the Github repo in order to open an issue there.

fig_star_quasar_ROC.py

Brigitta Sipocz

unread,

Nov 12, 2019, 6:03:29 PM11/12/19

to Kyriakos Stylianopoulos, astroML-general

Hi Kyriakos,

The example scripts are hosted in this repo: https://github.com/astroML/astroML_figures

If you're interested, please open a pull request with your fix. I'll propagate updates from the repo to the webpage.

Brigitta

--
You received this message because you are subscribed to the Google Groups "astroML-general" group.
To unsubscribe from this group and stop receiving emails from it, send an email to astroml-gener...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/astroml-general/7e8debf5-1bcb-4163-bba0-33f9e9b109c6%40googlegroups.com.

Kyriakos Stylianopoulos

unread,

Nov 13, 2019, 11:46:48 AM11/13/19

to astroML-general

Hello Brigitta,

I have created a pull request on the Github repo and committed the necessary changes. Waiting for approval.

Kyriakos

Reply all

Reply to author

Forward