Time Travel Analysis or Undo in Jupyter

68 views
Skip to first unread message

Jayme Bird

unread,
Feb 22, 2018, 5:39:38 AM2/22/18
to jup...@googlegroups.com
Dear all,

I would appreciate your feedback with a potential research area, specifically within Jupyter - and perhaps more generally in Python.

Interactive data analysis in frameworks like jupyter notebooks has a common issue - the modification of potentially large datasets within an interactive session. Unintentional modification is frequent, and the common solution is to re-run the steps that were required to get from a data file to the point in question. This reduces the usability of the analysis tools, makes “what-if” exploration difficult, and creates a lot of unnecessary overhead for either manually saving state or re-running scripts to recreate it.

I'm investigating a proposed project focused on the use of relational Multi-Version Concurrency Control (MVCC) techniques from database systems for these interactive workloads. In essence allowing a control z undo functionality to return to the previous state after running a particular step of an interactive script.

I would appreciate knowing if any development of this has been discussed - as well as ideas or useful feedback in general?

Kind regards
Jayme Bird

Paul Ivanov

unread,
Feb 22, 2018, 7:08:40 PM2/22/18
to jup...@googlegroups.com
Hi Jayme,

two things come to mind immediately - bpython has a linear single version of this capability -  it's called "rewind": https://bpython-interpreter.org/

The more general case sounds related to some of Philip Guo's PhD thesis work around IncPy: http://pgbovine.net/PhD-dissertation.htm

Both of those are the general Python case - as for Jupyter -  we are not tightly coupled to the namespace (the feature of being able to execute cells out of order is seen as a bug by some) - the jupyter kernel does not know about a notebook document, it only receives code content to be executed, and sends back the results of such execution. It would be feasible to build in the sort of coupling that you are talking about, but I don't think such functionality would be welcomed back into the mainline of Jupyter notebook user interfaces (because the tighter coupling would be a limitation for other kinds of use cases).

best,
pi

Jayme Bird

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/1410954636.3371205.1519295968702.JavaMail.zimbra%40cwi.nl.
For more options, visit https://groups.google.com/d/optout.



--
                   _
                  / \
                A*   \^   -
             ,./   _.`\\ / \
            / ,--.S    \/   \
           /  `"~,_     \    \
     __o           ?
   _ \<,_         /:\
--(_)/-(_)----.../ | \
--------------.......J
Paul Ivanov
http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7

Jason Grout

unread,
Feb 22, 2018, 9:18:29 PM2/22/18
to jup...@googlegroups.com
It sounds like you are asking about reverting the kernel computation state to some previous state. With arbitrary side effects possible, I think this a difficult problem in general. However, if you constrained your computation to pure functions, or essentially checkpointed your project state at every step, you could do something like reverting your computation state.

Jason


On Thu, Feb 22, 2018 at 4:08 PM Paul Ivanov <p...@berkeley.edu> wrote:
Hi Jayme,

two things come to mind immediately - bpython has a linear single version of this capability -  it's called "rewind": https://bpython-interpreter.org/

The more general case sounds related to some of Philip Guo's PhD thesis work around IncPy: http://pgbovine.net/PhD-dissertation.htm

Both of those are the general Python case - as for Jupyter -  we are not tightly coupled to the namespace (the feature of being able to execute cells out of order is seen as a bug by some) - the jupyter kernel does not know about a notebook document, it only receives code content to be executed, and sends back the results of such execution. It would be feasible to build in the sort of coupling that you are talking about, but I don't think such functionality would be welcomed back into the mainline of Jupyter notebook user interfaces (because the tighter coupling would be a limitation for other kinds of use cases).

best,
pi
On Thu, Feb 22, 2018 at 2:39 AM, Jayme Bird <Jayme...@cwi.nl> wrote:
Dear all,

I would appreciate your feedback with a potential research area, specifically within Jupyter - and perhaps more generally in Python.

Interactive data analysis in frameworks like jupyter notebooks has a common issue - the modification of potentially large datasets within an interactive session. Unintentional modification is frequent, and the common solution is to re-run the steps that were required to get from a data file to the point in question. This reduces the usability of the analysis tools, makes “what-if” exploration difficult, and creates a lot of unnecessary overhead for either manually saving state or re-running scripts to recreate it.

I'm investigating a proposed project focused on the use of relational Multi-Version Concurrency Control (MVCC) techniques from database systems for these interactive workloads. In essence allowing a control z undo functionality to return to the previous state after running a particular step of an interactive script.

I would appreciate knowing if any development of this has been discussed - as well as ideas or useful feedback in general?

Kind regards
Jayme Bird

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.

To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/1410954636.3371205.1519295968702.JavaMail.zimbra%40cwi.nl.
For more options, visit https://groups.google.com/d/optout.
--
                   _
                  / \
                A*   \^   -
             ,./   _.`\\ / \
            / ,--.S    \/   \
           /  `"~,_     \    \
     __o           ?
   _ \<,_         /:\
--(_)/-(_)----.../ | \
--------------.......J
Paul Ivanov
http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.

To post to this group, send email to jup...@googlegroups.com.

Robert Schroll

unread,
Feb 23, 2018, 3:13:38 PM2/23/18
to jup...@googlegroups.com, jup...@googlegroups.com
Hi Jayme,

A while back, I was involved in the Reinteract project [1], which had the same goal as you with a different mechanism. Reinteract provided a notebook-like interface, but it recorded the state in which each expression was executed. When you went back to a previous line, it would rewind the state and execute in the appropriate historical state.

That was the goal at least. To avoid copying the whole state every step, it relied on heuristics at the syntax level to guess whether code would modify objects. Reinteract would then only copy things that changed. This worked 98% of the time, which was enough for me, but it still produced a few odd corner cases where its behavior was incomprehensible to those who didn't know its internals.

I'm happy to talk more about Reinteract if you're interested, but I've already bored this mailing list once about it.

Robert

Brian Granger

unread,
Feb 23, 2018, 11:46:07 PM2/23/18
to Project Jupyter
I built something similar to reinteract as well last year. Doing the
simple case isn't very hard - was just deep copying the namespace
before exec'ing each cell and storing those as a python list. As
Robert mentions, doing this type of thing well and efficiently, gets
to be much more difficult.

However, it is important that Jupyter doesn't make any assumptions
about these things or place limitations on the model of the kernel. As
an example here is a python kernel for Jupyter that implements a
dataflow model:

https://github.com/dataflownb/dfkernel
> https://groups.google.com/d/msgid/jupyter/local-ac90a11e-2fb9-v1.1.4-22d9f20d%40mando.
>
> For more options, visit https://groups.google.com/d/optout.



--
Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgra...@calpoly.edu and elli...@gmail.com
Reply all
Reply to author
Forward
0 new messages