* Porting the notebook over to use Flask on top of Twisted. What is the
status of this? It seemed like it was pretty much ready when Sage Days
finished last week. Do we just need testers?
* Redesigning the entire communication architecture to be
database-centric. I've consolidated some information, answered some
frequently asked questions, and put up a diagram of the architecture
here: http://wiki.sagemath.org/Notebook%20design
What do people think? I guess one big question is how it compares with
the other main proposal for scalable notebook design:
Should we decide on one or the other, or should both go forward, or
should there be some merging of ideas? I plan to work on the
database-centric design, as it seems to be:
1. easier to manage with smaller more self-contained tasks
2. competitive in scalability
3. Websockets (which I think is a main component of Alex's design) are
probably not going to make it into Firefox 4.0 due to security concerns
[1], so I think that technology may need to mature just a bit more. I
can easily understand what is needed for the database-centric design and
it uses a well-proven architecture and standard existing capabilities.
Also, Flask will not support websockets (i.e., wsgi doesn't support
websockets) and probably won't in the near future. After seeing the
readability improvement when moving to flask, I'm willing to give up
websockets right now to get a more developer-friendly notebook.
I just want to make sure that we don't run out of steam before we get
some steady progress going.
Jason
[1] http://hacks.mozilla.org/2010/12/websockets-disabled-in-firefox-4/
I was talking with Greg Bard yesterday, and an idea came up for
another related project that would be much easier, yet still very
helpful to a _lot_ of users. It's to make a very highly scalable
database-oriented secure website that does nothing but "evaluate a
block of Sage code in a clean namespace". The key thing is that we
make it incredibly scalable, e.g., to thousands of users. And somehow
test this scalability.
This could be fairly easy to build starting with the demo here:
http://code.google.com/p/simple-python-db-compute/. It will have to
display graphics, etc., so the javascript is still nontrivial. But
doing this is vastly less daunting than what is mentioned below, and I
can't see how we could do our ultimate goal without easily doing what
I'm proposing here.
I imagine that this could be something that specifically uses apache +
mongodb.
Once it works, we could try to modify it to also serve individual
public interacts.
Does anybody want to help with this?
> What do people think? I guess one big question is how it compares with the
> other main proposal for scalable notebook design:
>
> https://docs.google.com/document/d/1uYJXPAWypGgb92QStJ19cW-29y4-hn5hi8oXMR-11TU/edit?hl=en&authkey=CISp9cQB&pli=1#
>
> Should we decide on one or the other, or should both go forward, or should
> there be some merging of ideas?
> I plan to work on the database-centric
> design, as it seems to be:
The two designs are in a sense orthogonal, due to their levels of detail.
The "database-centric design" has I think far more details thought
through. That said, Alex's design above is also database-centric.
There's not much real difference between them, except we thought
through every detail of messages going back and forth from the
perspective of making sure it's possible to implement a range of
specific mathematical commands that matter to us (graph editing,
showing plots, calculating). If you look at the top of Alex's design
it has the exact same big three components as the other design (but
the internals are all much more complicated with Alex's design)
> 1. easier to manage with smaller more self-contained tasks
> 2. competitive in scalability
> 3. Websockets (which I think is a main component of Alex's design) are
> probably not going to make it into Firefox 4.0 due to security concerns [1],
> so I think that technology may need to mature just a bit more. I can easily
> understand what is needed for the database-centric design and it uses a
> well-proven architecture and standard existing capabilities. Also, Flask
> will not support websockets (i.e., wsgi doesn't support websockets) and
> probably won't in the near future. After seeing the readability improvement
> when moving to flask, I'm willing to give up websockets right now to get a
> more developer-friendly notebook.
>
>
> I just want to make sure that we don't run out of steam before we get some
> steady progress going.
I propose we do the warm-up I mentioned above. It will quickly and
definitively bring the ability for anybody to compute with Sage over
the web robustly, though without all the session, etc., stuff. And
since it will be very simple, it will be easier to make it cell-phone
friendly.
Also, what I'm proposing is a lot like Wolfram Alpha or
http://magma.maths.usyd.edu.au/calc/.
-- William
--
William Stein
Professor of Mathematics
University of Washington
http://wstein.org
Hi all,
As a highly-interested observer to the web notebook process, this
could be enormously beneficial to computational scientists, as well.
For instance, in Astrophysics the question in the past has been how to
provide an interface to running analysis at remote locations while
avoiding the hassle and overhead of SSH, X11 tunneling, etc etc, and
having a "single cell" mode that dispatched jobs transparently from a
frontend would be extremely useful.
Just for a simple example, imagine the case where you have a group
that runs large suites of Galaxy Cluster simulations. The datasets
produced by these calculations may be many gigabytes or even terabytes
in size, but it may be useful to provide the ability for remote
collaborators to interact with them. (This is one of the things the
NSF Teragrid Science Gateway program was designed to address.)
Typically this is done by either providing login information or copies
of the data, which then get shipped around, people log in via SSH,
tunnel images back and forth, etc etc.
However, with this environment, where you have the separation between:
* UI (including script / block-of-code and image display)
* Job dispatch and initiation
* Job execution
you would have the ability to address each component individually.
The dispatch could dispatch to an MPI-enabled cluster, single
processor, etc. This means rather than shipping around data or
providing manual logins, analysis (exploratory or otherwise) could be
conducted in the browser, without the overhead typically endured.
By providing the "single-block-of-code" approach, it becomes a very
general solution -- applicable as well to environments of
data-intensive analysis and exploration! In particular, this would
meet a need that's not necessarily well-met by stateful, notebook
exploration.
Anyway, I think this is a very exciting direction for the sage-notebook!
Best,
Matt
>
> The two designs are in a sense orthogonal, due to their levels of detail.
>
> The "database-centric design" has I think far more details thought
> through. That said, Alex's design above is also database-centric.
> There's not much real difference between them, except we thought
> through every detail of messages going back and forth from the
> perspective of making sure it's possible to implement a range of
> specific mathematical commands that matter to us (graph editing,
> showing plots, calculating). If you look at the top of Alex's design
> it has the exact same big three components as the other design (but
> the internals are all much more complicated with Alex's design)
I see there being rather fundamental differences between the two
designs. By database-centric, I meant that *everything* goes through
the database and nothing really lives outside of the database (i.e.,
everyone goes through the database for any information, and immediately
puts things back in the database). In Alex's design, the database just
stores worksheets and is updated when a worksheet is saved. The primary
work is done outside of the database with a server-side process that
maintains state and communicates with the workers.
> I propose we do the warm-up I mentioned above. It will quickly and
> definitively bring the ability for anybody to compute with Sage over
> the web robustly, though without all the session, etc., stuff. And
> since it will be very simple, it will be easier to make it cell-phone
> friendly.
>
> Also, what I'm proposing is a lot like Wolfram Alpha or
> http://magma.maths.usyd.edu.au/calc/.
That sounds like a great idea. Like you said, it's a good warm-up to
shake out any bugs or unforseen complications with using the database or
the protocol we thought out, as well as a way to easily test scalability
(it would be trivial to script something that would hit the page with
1000 simultaneous computations). Right now, it's been a little daunting
(no time...) to completely rewrite the notebook for the new
architecture. Your idea is certainly more doable in what time I have.
Having the Sage process start by forking will be critical to making this
webpage fast enough to be really useful, I think, as well as making the
load on the server bearable. What security implications are there in
forking a Sage process several times for different users?
Jason
I think this is a GREAT idea. If I recall correctly, didn't we
implement almost exactly this in about 30 minutes during Bug Days? It
would need some polish and the addition of sessions, but it's almost
there, right?
I think it would need (maybe grouped in version numbers):
0.1:
* implement a computation id assigned to each post request
* javascript to receive a computation id after request and continue
polling for that result
* make workers working in a virtual machine for security
* implement forking Sage to start up a new worker process to lower
load and latency
0.2:
* the "stream" protocol we discussed for transmitting generated files
or other information
* javascript to handle the different streams (e.g., an image stream, a
jmol stream, etc.)
0.3:
* public interacts
Thanks,
Jason
+1
> * javascript to receive a computation id after request and continue polling
> for that result
+1
> * make workers working in a virtual machine for security
Could be in 0.2. For now could use the same worksheet user on boxen
that is used by sagenb.org.
> * implement forking Sage to start up a new worker process to lower load and
> latency
Could be easy using @fork:
http://trac.sagemath.org/sage_trac/ticket/9631
Also need:
* configure a mod_wsgi setup to server flask.
> 0.2:
> * the "stream" protocol we discussed for transmitting generated files or
> other information
> * javascript to handle the different streams (e.g., an image stream, a jmol
> stream, etc.)
>
> 0.3:
> * public interacts
>
> Thanks,
>
> Jason
>
>
--
There's also (d) an online version of SageTeX that uses this system
instead of the current "simple server API".
Dan
--
--- Dan Drake
----- http://mathsci.kaist.ac.kr/~drake
-------
Since I will be working on an online library of interacts this summer, I
will definitely get something like this up by then if it's not already
up. I'm also going to announce this project to our CS students and see
if anyone is interested in forming a small working group here at Drake
for helping with this.
Jason
Certainly not.
> Is there a chance that there is
> a improvement from of the flask rewrite?
It's possible, but I doubt it.
> What if we run the WGSI flask
> app with apache (I dont know how to do that, but its feasible) or even
> upgrade twisted (since we don't really need twisted web 2 anymore)?
I doubt any of that would help at all. I don't think WSGI flask +
apache is even possible.
> For the other projects, I can help with the client-side (javascript).
Awesome!
> The one shot eval sounds like a good start to iron out the
> specifications and the communication protocol before starting with the
> whole notebook. As a final feature, the new flask notebook supports
> openid, which I think will be more attractive to new users; all it
> takes is a gmail account.
That is so awesome!!!
>
> Rado
Fantastic! What is needed to write tests using flask's native test
suite? Is that a small task or a big task? Are you referring to this:
http://flask.pocoo.org/docs/testing/ ?
Jason
I see there being rather fundamental differences between the two
designs. By database-centric, I meant that *everything* goes through
the database and nothing really lives outside of the database (i.e.,
everyone goes through the database for any information, and immediately
puts things back in the database). In Alex's design, the database just
stores worksheets and is updated when a worksheet is saved. The primary
work is done outside of the database with a server-side process that
maintains state and communicates with the workers.
One of the nice things of mongodb is sharding. this means, you can
define a cluster of machines with a "master" where the data is
distributed among them. simple example, the lowest bit of the hash of
a designated "shard key" for an object decides, if the data is on
machine a or b. I think, after looking at
http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key that a good
shard key is the user-id. Then, the user's worksheets are always on
one machine and concurrent writes to different users are distributed -
at the same time, queries/updates for one user only hit one machine.
(paragraph "query isolation")
H
Again, are there any security issues we'd need to be aware of? For
example, what happens if a process crashes? Is it easier for someone to
elevate their permissions if the process was forked?
My guess is that we don't have to worry, but it's always safe to ask
about security issues.
>
>
> On Thursday, January 20, 2011 12:33:40 PM UTC-8, Jason Grout wrote:
>
> I see there being rather fundamental differences between the two
> designs. By database-centric, I meant that *everything* goes through
> the database and nothing really lives outside of the database (i.e.,
> everyone goes through the database for any information, and immediately
> puts things back in the database). In Alex's design, the database just
> stores worksheets and is updated when a worksheet is saved. The primary
> work is done outside of the database with a server-side process that
> maintains state and communicates with the workers.
>
>
> Yup. Having everything go through the DB will make things simpler
> process-wise, but eventually all the state updates for temporary
> sessions will start thrashing the DB if too many things are happening at
> once. If there was some set of processes that each served some simple
> purpose (ala the unix philosophy of doing one thing well) and talked to
> each other, then it would make everything much more responsive than
> having to poll the database for state updates.
It sounds like this single-cell public interface would serve as a good
testbed for both ideas. If it's sufficiently easy to build such a thing
with both designs, then we can throw a huge amount of activity at it and
more objectively evaluate which we'd like to see grow into the full
notebook.
Jason