Agreement On The Format Of Results

17 views
Skip to first unread message

H L

unread,
Sep 12, 2018, 9:04:01 AM9/12/18
to mlbench
Hi,

Since one day we will cache the results to a github repository, it would be better for us to agree on the format as early as possible so that we don't have to rerun the experiments in the future.

Here are something I can think of
  • data posted to dashboard should be the same as the one we checkpointed. So we can extract exactly the same results from both master and worker.
  • each post only consists of one of metrics or loss. The format will be json
    • run_id: identify the experiment we are running in a cluster
    • name: name of the metrics or loss like "Training Loss", "Top5Accuracy" , etc.
    • value: value of the metrics
    • epoch: global epoch of training
    • type:  "train" or "validation", used to indicate the time of message
    • global_iteration: global iteration of training in the synchronized settings
    • local_iteration:  in the asynchronized settings, the local iterations can be different across processes; for synchronized settings, it is the same as global_iteration.
    • rank : rank of the process
    • timestamp: 
    • ?
  • if an experiment is paused/resumed several times, then it is possible to have several logs with same epoch & iteration. User should use the largest timestamp as the official one. (This is not a problem for checkpoints, but for the endpoint of master node, we need to identify the correct one).
Do you think it is enough to cover future use cases? Any suggestions about the fields/names etc?

Best regards,

Martin Jaggi

unread,
Sep 13, 2018, 9:34:15 PM9/13/18
to H L, mlbench
this is a good point, and useful for all benchmarks

i'd suggest to only keep some minimal mandatory fields, and make others optional:
  • benchmark task id
  • run_id
  • list of metric timepoints for this run, each having
    • timestamp
    • link to model checkpoint (ideally, but can be optional)
    • list of (optional) metric key,value pairs for that timepoint, as mentioned by lie below. that is for example iteration, test/train loss etc.
      note that some of these will be 'mandatory' by the official benchmark task description


--
You received this message because you are subscribed to the Google Groups "mlbench" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlbench+u...@googlegroups.com.
To post to this group, send email to mlb...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mlbench/3cf02df0-bd63-48bc-b495-3a8592ba97eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

H L

unread,
Sep 14, 2018, 4:56:37 AM9/14/18
to mlbench
We can use a public persistent storage and build NFS to save the checkpoints. Then blogs, documentations can refer to results here.

Ralf G.

unread,
Sep 14, 2018, 10:25:26 AM9/14/18
to mlbench
@martin: I don't think we need to save checkpoints of the runs for future reference, just having textfiles (JSON/CSV) of the results in the results-repo is enough. I mean, maybe the final, trained checkpoint could be useful for something, but keeping all intermediate checkpoints (1 after every x epochs) around seems like overkill. Ideally checkpoints are a transient thing and they get converted (evaluated) into there metrics after a run.

A run is already a separate table in the db, that's what run_id refers to. so at least in the db the benchmark definition etc. will be there, and metrics refer to what run they belong to with the run_id field.


@lie: Are global_iteration and local_iteration ever used at the same time? In async there's not really a global_interation or am I mistaken? Because then we could just call it "iteration". What's the difference between an "epoch" and an "iteration"?

For timestamp, right now we just save wall time (was easiest to implement). In the future it might make sense to switch to time since start. What do you think?
Technically we can have a run in the DB have an optional pointer to a previous run, to keep track of restarted runs. That would allow us to just combine them in the end. Or at least them we wouldn't have a problem with separating different metrics from multiple runs.

Martin Jaggi

unread,
Sep 14, 2018, 11:03:52 AM9/14/18
to mlbench
yea storing all checkpoints will not always be necessary. i was bringing this idea up basically just because the trained model (call it checkpoint or whatever) is the main output of ML training. so practitioners will care to get this one out in the end, and be sure that what they get out has indeed the quality properties that we've displayed in the benchmark. that's why i thought in the long term we should offer a functionality to associate models at timepoints/run, with metrics at those same timepoints/run. (not saying every benchmark we run always has to use that functionality)

about global / local iterations, we can for simplicity only track global iterations for now. any more fine grained stuff we can say it's implementation detail or config of the implementation 
Reply all
Reply to author
Forward
0 new messages