Microdatabase project thoughts

121 views
Skip to first unread message

David Bau

unread,
Mar 19, 2015, 10:20:10 AM3/19/15
to pencilcode-development, meet...@gmail.com, Rhythm Gupta, Rishabh Chhabra, Vidhun k, Caroline Meeks, Anand Ramakrishna, Nikhil Thorat
Hello GSoC students,

I'm pleased that there's quite a bit of interest in the Microdatabase project on Pencil Code for Google Summer of Code!  Several have started to draft proposals.

I'd like to brainstorm here a bit more about how to make the project amazing for teachers and students.  The base idea is to design a scalable microdatabase, together with a simple client library for students to use.  But here are some questions about how to make the database really useful for teaching:

- How will students get going when starting from zero?  Should we build a visual tool that lets students explore and build a database manually?
- How will students learn queries?  Would students benefit from an interactive "query builder" that lets students understand how their query will work before they put it in their programs?
- Is it important to support a widely used query syntax like SQL?  (E.g.,: https://github.com/dsferruzza/simpleSqlParser/tree/v2) - I'd be interested in what Caroline Meeks thinks.
- What shape should the client library take?  What javascript functions should students use to interact with their database?
- Should we build a database transaction logger, so that students can go to their database to debug "what happened to my data?" after their program had a bug?
- Will classrooms benefit if they can import data from external sources like Google Spreadsheets or CSV files, or other sources?
- Should the databases be able to support "stored queries"?  What about "stored HTML templates?"
- What should we do for access control to databases?  Can a student put a password on their database?

(I've cced a couple mentors and teacher on the list who might chime in if they have suggestions.)

We can't build solutions for all of these in a summer, but as you put your proposal together, think about whether there are features like this that excite you, that you think will make a big difference in helping students.  The purpose of pencil code is not just to have something that "works", but to make a beautiful instructive tool that gives students a quick start at difficult concepts.  We want to get all students to be able to use computer science (and databases, in this case) as an everyday tool.

Also, remember that students are starting from zero, and there are a lot of concepts for them to see and learn.

Here is a starting point to get us in the right frame of mind - here is the CS Unplugged Databases curriculum, for introducing K12 students to the idea of databases.


With your microdatabase, a teacher will have another tool, and will be able to teach different things in different ways.  If you were a teacher or a student, what would you want the database to do for you?

David

Vidhun k

unread,
Mar 19, 2015, 3:02:41 PM3/19/15
to pencilcode-...@googlegroups.com, meet...@gmail.com, rhythm....@gmail.com, rishabh.c...@gmail.com, vidh...@gmail.com, caro...@meekshome.com, theanandr...@gmail.com, nsth...@google.com
Here is what I had in mind for this project.

I like a lot of ideas pointed here, I'll wait for other suggestions for mentors and teachers.

Initially my plan was to build a database query API tool with a visual tinkering tool. JSONql, as I found wouldn't be too welcoming for entry level students. So for the client side library I was thinking of Queries using objects( like ORM's).
Something like:

bird = Animal.kind("bird")

Or for RDBMS.

vidhun = User.create("name": "vidhun")
pet = vidhun.pet.create("cat")

This is much more simpler especially when learning advanced concepts like Primary key and Foreign key.

The same can be also used for creating an interactive visual learning tool, the book has some good points that can be made into a level based learning tool.

I'd like to hear more thoughts on using SQL query syntax.

Best,
Vidhun

David Bau

unread,
Mar 20, 2015, 7:57:52 AM3/20/15
to Rishabh Chhabra, Anand Ramakrishna, Nikhil Thorat, Caroline Meeks, pencilcode-development
Hi Rishabh -

This sounds like a great way to think about the steps for learning about databases.

Two thoughts
(1) I would recommend, for the GSoC project, focusing on building a tool that can be used by a teacher to teach a learning sequence like this rather than building online tutorials that deliver the activities.
(2) It's important that after learning about them, students be able to incorporate the microdatabase into projects, so it's important to think about the right javascript api to use from within their pencilcode/HTML programs.

The reason to recommend focusing on the tool rather than building tutorials is just to limit the scope.  We have found that it is a lot of effort to build good online tutorials (e.g., recording video is time-consuming), and then tutorials often don't end up giving teachers the sort of flexibility they may need for their particular class.

So I would recommend scoping your project to building a tool lets a teacher use the tool as part of a lesson.  Students might need a way to easily access a prepopulated database, and then interact it according to a set of exercises or projects created by the teacher.

If you're interested in this path, I recommend mocking up a bit of UI for how your proposed system might work (or how students might use it) as part of the project proposal (of course it will evolve over the summer - it doesn't need to be perfect at the start).

David



On Fri, Mar 20, 2015 at 7:09 AM, Rishabh Chhabra <rishabh.c...@gmail.com> wrote:

Hello Mr. Bau


The questions raised by you in the previous email have helped me form a rough outline on how would I go about doing the project. Here is what I have in mind after the brainstorming.


Taking into account that Pencil Code is targeted at a young audience from a learning point of view, I’d like to implement the project consisting of short fun challenges....


(rest of the email clipped out)

David Bau

unread,
Mar 20, 2015, 10:32:24 AM3/20/15
to pencilcode-...@googlegroups.com, meet...@gmail.com, rhythm....@gmail.com, rishabh.c...@gmail.com, vidh...@gmail.com, caro...@meekshome.com, theanandr...@gmail.com, nsth...@google.com, Anthony Bau
Hi Vidhun.

I'll answer with more questions.  I've always wondered if ORMs are more confusing than than helpful, because rather than tearing down obstacles between the student and what's going on, they introduce a "third" new concept between two other concepts.  (I'm author of the old schema-mapping system XMLBeans, which is also "guilty" of this.)

Javascript is a dynamic environment, so wouldn't it be natural just return things in a natural form without generated object classes?

answer = queryall('mydata', 'select age, lastname, firstname from person where age > 14')

>[{age: 18, firstname: 'harrison', lastname: 'ford'},
  {age: 16, firstname: 'john', lastname: 'travolta}]

What are going to be the stumbling blocks?  How will students know what to do with "answer" once they have it?

Is it important that students know what a primary or secondary key is?

What do we do about the fact that the concept of a key in an relational database is that it's expressed differently from the key in a key-value map that you would see in Javascript or python or JSON?  Students are just barely learning how to use dictionary/map/objects in-memory.  Is it helpful or unhelpful to have the same concept be done different in their database?

Ideally we'd find a way to
* Let students get going using their own databases while learning a bare minimum of concepts.  Should they need to learn about schemas, indexes, authorization, object mapping, query languages, etc?  The ideal is that they should need to learn as few concepts as possible to get going.
* But as students *do* learn new concepts, they should be as "useful for the future" as possible.  That's why SQL syntax is interesting; because relational databases *are* everywhere.  (Anthony suggested that if we really want to teach relational data, maybe the backing files should be sqlite instead of json.)  On the other hand, the relational model is not everything.

David

Rhythm Gupta

unread,
Mar 20, 2015, 1:23:33 PM3/20/15
to David Bau, pencilcode-...@googlegroups.com, meet...@gmail.com, rishabh.c...@gmail.com, vidh...@gmail.com, caro...@meekshome.com, theanandr...@gmail.com, nsth...@google.com, Anthony Bau
Hello David,

These are all very interesting questions that you've asked. Some of these I've thought about while I was drafting my proposal, and I wanted to talk to you them. Glad you started this thread.

When I was learning databases and sql, I was doing it without any teacher and out of class. And during that experience, I remember that most interesting part used to be to check the table I've modified after every query. It helps in learning a lot. Thus, there has to be a visual component to entire project where students can look at the entries in it. If I were a student, I'd want this feature no matter what. When I can see what result my query has caused, it gives me immense satisfaction.

A few ideas that I had in mind were:

Basic
1. Students will be able to log in because they can have their own different databases. If we do this, we wouldn't have to password encrypt the databases.
2. We'll provide some basic databases when a student creates an account(or comes to the website), so that she have some data which she can play around with.

Now, to move to a more advanced level of discussion, we need to first clarify what do we want to teach students?
I don't like the idea of introducing ORMs to teach database concepts. As David rightly pointed out, it wouldn't be more effective, and rather it'd be redundant.

I think teaching database concepts via sql (CRUD operations) could be a starting point of this project, and also, let's assume this for the sake of discussion.
The best way to teach sql concepts would be to divide the basic operations into few modules. It could be divided like these:
1. Basic module: which will have pre-written half-baked queries and students could complete them. They'll be in sql syntax (and also it'll be explained in english too). An example could be:

Search through the table for any column where a condition is satisfied.

SELECT _________ FROM _________ WHERE __________. Few examples would be provided so that they can run it and confirm from the database that the result they've received is correct.

Concepts like Updating a database, inserting entries etc would come under this basic module.

2. Advanced module: More complex database concepts like ordering and sorting, joining of two tables etc can be taught in the way mentioned above.

3. Another high level thing can be to let the student write her own queries when she thinks she's pro enough.

[We could have a javascript approach to writing queries as well but after going through two modules, students will have a rough idea about sql now, and we could use direct sql too.]


Apart from all these, if I were a teacher, I'd also want to have features like indexes, primary keys etc. But I've little clue about how can we go ahead teaching this. Should we teach students concepts of B trees first before telling them what indexes are?
Indexes are primarily used for faster searching right? so if we use small 1 MB databases, how can we teach students that?

These are some of the thoughts I had in mind when I thought about creating this project in summers. Please let me know your thoughts and feedback about it.

Looking forward to hear from you

Cheers
Rhythm Gupta
--
Rhythm Gupta
Senior Undergraduate
Computer Science, IIT Delhi

m: (+91) 98738 03932
t: @irhythmgupta
inLinkedIn

Vidhun k

unread,
Mar 20, 2015, 3:52:44 PM3/20/15
to pencilcode-...@googlegroups.com, meet...@gmail.com, rhythm....@gmail.com, rishabh.c...@gmail.com, vidh...@gmail.com, caro...@meekshome.com, theanandr...@gmail.com, nsth...@google.com, dab...@gmail.com
Hi David,

To go with the point regarding "student experience". Would it be easier for a student to to learn SQl with "block" styled visual query builder.

The visual interface could be made using interactive blocks/selections, this would involve tokenising SQL and would be difficult to support complete SQL, otherwise considering beginners we could use something like the table below adding more relevant tokens. The blocks like any other visual tool can also have relevant descriptors. This way we can get over learners struggling with syntax and also making it more engaging.

Step1

Step2

Step3

Step4

Step5


Choose 1 or more tables or if lesser tables in database, then get all tables.

SELECT

* or getColumnList(step1)

FROM

getColumnList(Step3)

OPTIONAL:

WHERE

FULL JOIN

INNER JOIN

GROUP BY

UPDATE

getTableList(step1)

SET

getColumnList(Step3)

WHERE

DELETE

FROM

getTableList(step1)

WHERE

getColumnList(Step4)

INSERT

INTO

getTableList(step1)

(*)

VALUES

ALTER

TABLE

getTableList(step1)

ADD

DROP

ALTER

MODIFY


CREATE

TABLE

table_name

(*)




I'm more interested in the teaching flow for the teachers and using this project should help the teachers go with the flow, so the easier things would become more important. The report on database for K-12 students has a good learning flow which can be followed for this project.
It would be easier to set up pre-populated database for the students to play with and to do the end of lesson exercises. Implementing this would be much easier than setting up a whole set of tutorials.

It could be like:
Exercise 1:
Get all data from table called Items -> execute input and match output with correct answer -> move to next Exercise if correct, otherwise show hints.

When it comes to Relational databases, it might be difficult for a beginner to grasp concepts of "Relational", assuming we are going ahead with relational databases it should probably be better if we could abstract the whole concept of Primary key and Foreign key, which is the working of the concept rather than the concept itself. It would therefore be an "overload" for the learners, it could however be introduced in the later stages of learning with an introduction of what is happening "behind the scenes".
ORM's abstract the primary key, foreign key in their own way, but as you mentioned it would be just another layer. Otherwise for a key-value database it should be better without generated classes.

It might be easier to implement a visual working concept of RDBMS using Blocks.

user_vidhun = {BLOCK} {BLOCK} {BLOCK}
school = {BLOCK} {BLOCK} {BLOCK}
user_vidhun {LINK_BLOCK} school


Best,
Vidhun

Rhythm Gupta

unread,
Mar 20, 2015, 4:02:40 PM3/20/15
to Vidhun k, Rishabh Chhabra, Anthony Bau, nsth...@google.com, theanandr...@gmail.com, meet...@gmail.com, caro...@meekshome.com, pencilcode-...@googlegroups.com

Hey Vidhun,

I like this idea of having exercises.

This could be integrated with the three modules(basic, advanced, query based)to make the whole process of learning complete.

Basically, after student has gained enough confidence in a particular module, he can choose to do exercises, which will be based on a similar lines that you mentioned.

Also, we'll have two options. We'll have a set of exercises already made and there'll be a features by which teachers can create exercises. Your thoughts, David?

Rhythm

Sent from my phone, sorry for typos

Message has been deleted

Amulya Sahoo

unread,
Mar 21, 2015, 2:13:52 PM3/21/15
to pencilcode-...@googlegroups.com, meet...@gmail.com, rhythm....@gmail.com, rishabh.c...@gmail.com, vidh...@gmail.com, caro...@meekshome.com, theanandr...@gmail.com, nsth...@google.com
I learned SQL as RDBMS in school. Its easy because we use statements which are very similar to english sentences. But, I had no idea how to use the database with programming languages.
I have idea of teaching students about working of queries of NoSQL database like MongoDB as its easy to catch up for them and its easy to integrate it with programming because it returns the results as JSON objects which look like normal objects. As they have normal objects (say JSON), they can easily use the data in their programs, so they won't need any more method.

This will help them learn about databases and get going with DBs as well.


The whole system will be divided into sections, like beginner, intermediate and advanced, and each section will contain sub sections like insertion, deletion, comparison operators etc which will teach the concept( by telling about the steps and a small detail). After that, there will be an exercise which will contain an incomplete code and the student will have to complete the code (this happens in codecademy.com 's exercises) to pass the exercise. If he requires hint, it will be there.

What are your view about this?

Amulya

Caroline Meeks

unread,
Mar 25, 2015, 10:04:17 AM3/25/15
to pencilcode-...@googlegroups.com, meet...@gmail.com, rhythm....@gmail.com, rishabh.c...@gmail.com, vidh...@gmail.com, caro...@meekshome.com, theanandr...@gmail.com, nsth...@google.com
Hi, I'm a CS teacher at a high school near Google Cambridge that is just starting to teach CS to a diverse group of students. Previously I was a Product Manager for Amplify and before that I ran a company using OpenACS to create database backed websites.

I would like to add some context about comparable products, users and use cases and user problems.

Teachers widely recognize that data is important in todays world and that we are not preparing our students. Just last week I had a conversation in the teacher lunch room about the fact that people in the real world use spreadsheets everyday and that we are not teaching spreadsheets.

We are also not teaching students to understand data sets or to collect and analyze data sets bigger than they can collect in a lab class. This includes not just CS teachers but science, math and social studies teachers who all know that data is important to their subjects.

Teachers understand the importance of these things and are ready to embrace tools and curriculum that helps them solve these problems.

An important point is to remember is that your tools will be used for teaching not just future computer scientists but lots of other students who will become scientists, executives work in health care not mention voters who need to understand the issues of all of our futures. We should have high expectations that in our future many more people are able to deal with data in much more sophisticated ways then most non-programmers do now. Our job is to create that future.


Some things that teachers might want to do with a database tool, for example:

Data collection and analysis, e.g, in a science class. Check out isense (linked below) - they have activities which are good - like collecting data over time (maybe collaboratively with your class), and then having a tools that automatically show and graph the data.

But then the issue with isense is that it's a pretty closed system. You'd want students to be able to program against the data, which is where the starting point of the discussion is here so far. (But some teachers may actually teach databases without any programming at all!)

Another activity teachers might do is bring in an outside data source, for example from a spreadsheet, and have students work with it.

Another activity might be part of a programming class - there students might want to persist their data, for example, a high score list in a game, using a simple database.

As we get more complex, we might want to combine a tool that helps students collect data and join it with data from the web. For example, students could collect data using their phones into a database, and join it with weather or climate data from a public source.

This is a step back to look at the big picture. Its important to think about the preprogramming experience. What can you do with a database before and as you just barely start to program?






Some links and comparable landscape to think about:

Scratch has a data capability, but it was abused and is now only available to some users: http://wiki.scratch.mit.edu/wiki/Cloud_Data
For instance as a teacher I a still not enabled for Cloud Data.

App Inventor has 3 ways of storing data. TinyDB, TinyWebDB and Google Fusion Tables.

iSense is working on a different part of the problem. It is a tool to let teachers create simple database scheme and have their students all contribute data then analyze the data together.

We should also consider the many corporate tools for helping non-programmers in the workforce work with data...Crystal reports, Tableau, Many Eyes and others.

Vidhun k

unread,
Mar 25, 2015, 11:29:40 AM3/25/15
to pencilcode-...@googlegroups.com, meet...@gmail.com, rhythm....@gmail.com, rishabh.c...@gmail.com, vidh...@gmail.com, caro...@meekshome.com, theanandr...@gmail.com, nsth...@google.com
Hi Caroline,
Thank you for your ideas!

I was wondering, would it be beneficial if basic analytics was part of the project from the perspective of a student. Since this tool is going to see a wide range of age groups, would it be difficult for K12 student to interpret these kinds of charts/graphs.

Since data analytics is an important process in any industry, I was thinking it could benefit students from all disciplines of study. 

Best,
Vidhun 

Rhythm Gupta

unread,
Mar 25, 2015, 12:25:17 PM3/25/15
to Caroline Meeks, pencilcode-...@googlegroups.com, meet...@gmail.com, Rishabh Chhabra, Vidhun k, Anand Ramakrishna, Nikhil Thorat
Hello Caroline,

Thanks for such an informative post!


Some things that teachers might want to do with a database tool, for example:

Thanks for providing a teacher's perspective. This will help us a lot in giving shape to our project.

Data collection and analysis, e.g, in a science class.  Check out isense (linked below) - they have activities which are good - like collecting data over time (maybe collaboratively with your class), and then having a tools that automatically show and graph the data.  

For data collection, I was thinking of letting people use spreadsheets and PencilCode will provide an option of uploading them.
 
But then the issue with isense is that it's a pretty closed system.  You'd want students to be able to program against the data, which is where the starting point of the discussion is here so far.  (But some teachers may actually teach databases without any programming at all!)
For analysis, the approach could be that students will have to do operations on data and we can create basic visualizations using google charts and students won't have to worry about that.
But this raises the same point, would students be able to program against the data? To resolve this, in my opinion, we can do something similar like scratch (maybe not visually, but keeping the essence intact) like students can use half baked queries (written in english) and students will just have to put column names which they want to select or project.
 
Another activity teachers might do is bring in an outside data source, for example from a spreadsheet, and have students work with it.

We can also have an option of importing data from google spreadsheets.
 
Another activity might be part of a programming class - there students might want to persist their data, for example, a high score list in a game, using a simple database.

As we get more complex, we might want to combine a tool that helps students collect data and join it with data from the web.  For example, students could collect data using their phones into a database, and join it with weather or climate data from a public source.

This is a step back to look at the big picture. Its important to think about the preprogramming experience. What can you do with a database before and as you just barely start to program?
In the pre-programming experience, do you think we should do something like what code-academy does? Helping students learn concepts of databases, with different modules and chapters with increasing level of difficulty? What do you think about exercises like that?

Looking forward to hear from you.

 
Cheers
Rhythm Gupta
Reply all
Reply to author
Forward
0 new messages