Fill in Blanks pauses indefinately

26 views
Skip to first unread message

Benjamin Hill

unread,
Aug 30, 2023, 8:27:58 PM8/30/23
to simple-ml-for...@googlegroups.com
I'm filling in the last column with blanks (yay!) and for some reason, this column causes it to stop.

I've tried "flattening" the sheet (copy all, paste-as-values in a new tab, then running it on the new tab) - no luck.
The rest of the columns seem to go pretty quickly (max 10 minutes)
This one has been going for a few hours.  I've tried full browser restarts, clearing the drive folder (simple_m_for_sheets) - anything else I can do to debug?

Same with Chrome-on-Mac as Chrome-on-Windows.


Redacted Log follows: 

[INFO 23-08-31 00:07:41.6020 UTC addon_lib.cc:158] Create dataspec
[INFO 23-08-31 00:07:41.6190 UTC csv_example_reader.cc:202] 0 row(s) processed
[INFO 23-08-31 00:07:41.6380 UTC data_spec_inference.cc:410] 17 column(s) found
[INFO 23-08-31 00:07:41.6390 UTC csv_example_reader.cc:290] 0 row(s) processed
[INFO 23-08-31 00:07:41.6850 UTC data_spec_inference.cc:290] 398 item(s) have been pruned (i.e. they are considered out of dictionary) for the column xxxx (2000 item(s) left) because min_value_count=0 and max_number_of_unique_values=2000
[INFO 23-08-31 00:07:41.6870 UTC data_spec_inference.cc:422] Finalizing [12671 row(s) found]
[INFO 23-08-31 00:07:41.6880 UTC addon_lib.cc:163] Created Dataspec successfully
[INFO 23-08-31 00:07:41.6890 UTC addon_lib.cc:165] Dataspec:
Number of records: 12671
Number of columns: 17
Number of columns by type:
CATEGORICAL: 9 (52.9412%)
NUMERICAL: 8 (47.0588%)
Columns:
CATEGORICAL: 9 (52.9412%)
0: "xxx" CATEGORICAL manually-defined num-nas:4177 (32.965%) has-dict vocab-size:3 zero-ood-items most-frequent:"xxx" 4278 (50.365%)
3: "xxx" CATEGORICAL manually-defined has-dict vocab-size:4 zero-ood-items most-frequent:"xxx" 6854 (54.092%)
4: "xxx" CATEGORICAL manually-defined has-dict vocab-size:3 zero-ood-items most-frequent:"xxx" 8090 (63.8466%)
5: "xxx" CATEGORICAL manually-defined has-dict vocab-size:4 zero-ood-items most-frequent:"xxx" 8923 (70.4206%)
7: "xxx" CATEGORICAL manually-defined has-dict vocab-size:3 zero-ood-items most-frequent:"false" 12405 (97.9007%)
13: "xxx" CATEGORICAL manually-defined num-nas:299 (2.35972%) has-dict vocab-size:2001 zero-ood-items most-frequent:"xxx" 18 (0.14549%)
14: "xxx" CATEGORICAL manually-defined has-dict vocab-size:9 zero-ood-items most-frequent:"xxx" 4239 (33.4543%)
15: "xxx" CATEGORICAL manually-defined has-dict vocab-size:3 zero-ood-items most-frequent:"s" xxx(50.3591%)
16: "xxx" CATEGORICAL manually-defined has-dict vocab-size:1895 zero-ood-items most-frequent:"xxx" 34 (0.268329%)
NUMERICAL: 8 (47.0588%)
1: "xx" NUMERICAL manually-defined mean:4638.67 min:1 max:9280 sd:2683.61
2: "xx" NUMERICAL manually-defined mean:1.5114 min:1 max:8 sd:1.04315
6: "xx" NUMERICAL manually-defined mean:28.7813 min:0 max:79 sd:14.2575
8: "xxx" NUMERICAL manually-defined num-nas:259 (2.04404%) mean:221.073 min:0 max:11567 sd:630.611
9: "xx" NUMERICAL manually-defined num-nas:281 (2.21766%) mean:451.903 min:0 max:29813 sd:1588.98
10: "xx" NUMERICAL manually-defined num-nas:301 (2.3755%) mean:175.702 min:0 max:23492 sd:594.256
11: "xx" NUMERICAL manually-defined num-nas:281 (2.21766%) mean:306.081 min:0 max:22408 sd:1120.86
12: "xx" NUMERICAL manually-defined num-nas:264 (2.0835%) mean:308.954 min:0 max:24133 sd:1186.74
Terminology:
nas: Number of non-available (i.e. missing) values.
ood: Out of dictionary.
manually-defined: Attribute which type is manually defined by the user i.e. the type was not automatically inferred.
tokenized: The attribute value is obtained through tokenization.
has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.
vocab-size: Number of unique values.
[INFO 23-08-31 00:07:41.6940 UTC abstract_learner.cc:105] No input feature specified. Using all the available input features as input signal.
[INFO 23-08-31 00:07:41.6940 UTC abstract_learner.cc:120] The label "^cabin2org$" was removed from the input feature set.
[INFO 23-08-31 00:07:41.6980 UTC vertical_dataset_io.cc:59] 100 examples scanned.
[INFO 23-08-31 00:07:41.7420 UTC vertical_dataset_io.cc:70] 12671 examples read. Memory: usage:0MB allocated:0MB. 0 (0%) examples have been skipped. - this feels strange at 0.
[INFO 23-08-31 00:07:41.7450 UTC abstract_learner.cc:105] No input feature specified. Using all the available input features as input signal. - I manually excluded a few, not sure if this means "after manual excluding"
[INFO 23-08-31 00:07:41.7450 UTC abstract_learner.cc:120] The label "^cabin2org$" was removed from the input feature set.
[INFO 23-08-31 00:07:41.7450 UTC gradient_boosted_trees.cc:453] Default loss set to MULTINOMIAL_LOG_LIKELIHOOD
[WARNING 23-08-31 00:07:41.7460 UTC gradient_boosted_trees.cc:485] The model configuration specifies 300 trees but computation of the validation loss will only start at iteration 10 with 1894 trees per iteration. No validation loss will be computed, early stopping is not used.
[INFO 23-08-31 00:07:41.7460 UTC gradient_boosted_trees.cc:1079] Training gradient boosted tree on 12671 example(s) and 16 feature(s).
[user_name.cc : 66] RAW: UserName(), OS not supported
[INFO 23-08-31 00:07:41.7570 UTC gradient_boosted_trees.cc:1122] 11415 examples used for training and 1256 examples used for validation
Reply all
Reply to author
Forward
0 new messages