Useful scripts

148 views
Skip to first unread message

Razvan Marinescu

unread,
Oct 31, 2017, 6:37:18 AM10/31/17
to TADPOLE
Hi all,

I wanted to make available to everyone some useful scripts that we created for the PyConUK Hackathon.

They can be found in this repository. In particular, it contains a useful ipython notebook that visualizes the data and cleans it up from missing values, etc ... The scripts are all based on the leaderboard datasets, but can be used for the D1/D2/D3 datasets as well.

Have a look at the instructions.md for how to install it

Raz

Razvan Marinescu

unread,
Nov 1, 2017, 2:12:58 PM11/1/17
to tadpolec...@googlegroups.com
More Python scripts in the repository below, that read the TADPOLE data and make a simple prediction. As opposed to the scripts we made available previously, these have been software-engineered by the PyConUK attendants. The repository also contains refactored versions of our evaluation scripts. 

Many thanks to John Sandall, Frank Kelly, Sebastian Pölster, Alice Harpole, Mark Bell, Tom Viner and Matthew Power for enchanting our scripts with software-engineering magic!

Repo link:

lucas.l...@gmail.com

unread,
Nov 2, 2017, 7:18:54 AM11/2/17
to TADPOLE
Razvan and the software-engineering magic team - thank you for posting and all your work on generating and enchanting this!

At the "Data Cleaning" section, 3 lines of code shown below, I'm getting the following error.  Any ideas what could be missing that is generating this error?

CODE

tadpole = strat_train_set.drop(y_num_cols + y_num_cols, axis=1)
tadpole_labels_categorical = strat_train_set[useful_categorical_attribs].copy()
tadpole_labels_continuous = strat_train_set[useful_numerical_attribs].copy()

ERRORS

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-137-7de43013226f> in <module>()
      1 tadpole = strat_train_set.drop(y_num_cols + y_num_cols, axis=1)
      2 tadpole_labels_categorical = strat_train_set[useful_categorical_attribs].copy()
----> 3 tadpole_labels_continuous = strat_train_set[useful_numerical_attribs].copy()

/usr/local/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   1956         if isinstance(key, (Series, np.ndarray, Index, list)):
   1957             # either boolean or fancy integer index
-> 1958             return self._getitem_array(key)
   1959         elif isinstance(key, DataFrame):
   1960             return self._getitem_frame(key)

/usr/local/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_array(self, key)
   2000             return self.take(indexer, axis=0, convert=False)
   2001         else:
-> 2002             indexer = self.loc._convert_to_indexer(key, axis=1)
   2003             return self.take(indexer, axis=1, convert=True)
   2004 

/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
   1229                 mask = check == -1
   1230                 if mask.any():
-> 1231                     raise KeyError('%s not in index' % objarr[mask])
   1232 
   1233                 return _values_from_object(indexer)

KeyError: "['AGE_AT_EXAM'] not in index"

Razvan Marinescu

unread,
Nov 2, 2017, 9:56:41 AM11/2/17
to TADPOLE
Hi Lukas,

I just fixed that error, sorry about that.

Try pulling the new changes from the repo.

Raz

lucas.l...@gmail.com

unread,
Nov 5, 2017, 3:14:32 PM11/5/17
to TADPOLE
Got it to work.  Thank you, Razvan!
Reply all
Reply to author
Forward
0 new messages