Codechecker Redesign: Why and how

krishnan parthasarathi

unread,

Oct 11, 2010, 4:10:49 PM10/11/10

to codec...@googlegroups.com

Hi all,
The principle is the core rationale for the the design/implementation
change.

Principle: Each distinct part of the codechecker must be decoupled from the
rest, so that we can enable reuse of different components of the software
elsewhere (for eg. in other softwares).

The core codechecker should be a lib/script that just does two things:
(1) runs
a submission (using setuid-helper), against a specific input file, and
generate
the output file; (2) verifies the output by (a) checking against a reference
output file or (b) a separate checker program that outputs PASS/FAIL. The
separate checker program can also be run as a "submission" and its output
checked with a canonical reference file that just says PASS or FAIL and
optionally outputs a score.

Another important decoupled component should be a the scoring module. It
just
takes the scores for each test, and computes the score for the submission.
Also, perhaps provides logic for the rank list.

Other parts of the software worry about updating the state of the
world. For
example, a separate DB updation module is used to make changes to the db and
keep persistent track of the pass/fail status of submissions, scoring
submissions, etc.

Just to give a small introduction as to how the modules are laid out in the
existing design. All the modules referred in the following description are
available under "codechecker/backend" directory. Currently,
SubmissionManager
module takes every submission through the workflow, ie. compile, execute and
compare with reference output.

TestRunner module is the one which does the bulk of executing the compiled
submission program in a 'controlled' environment using setuid_helper. As of
today, it also does the evaluation ie. comparing the submission's output
with
the reference output supplied by the problem setter.

Other smaller tasks like scoring and compiling submissions based on the
submission language, are done by Score and Compile modules respectively.

I hope this gives a fair idea on why we would like to redesign
codechecker. Let
the discussions begin.

cheers,
Krishnan

krishnan parthasarathi

unread,

Oct 12, 2010, 11:37:10 PM10/12/10

to codec...@googlegroups.com

Some thoughts on what we could redesign in the codechecker's architecture.
The codechecker
- fetches a submission for a problem
- compiles the code based on the language
- executes for the given testset for the problem
- evaluates a score for it.

Only one submission goes through the pipeline at a time. Can the stages in the
pipeline made such that ordering of submissions that enter and exit the
pipeline don't matter?

Do we make the codechecker multi-threaded or run multiple instances of the
SubmissionManager and TestRunner?

The above two questions can be summarised as whether we want to design a single
threaded pipeline based architecture or a traditional multithreaded model,
where each thread handles a submission. Let me explain what I mean by a
pipeline based architecture. In this approach, we can imagine 'execution units'
(extending the hardware pipeline analogy) each picking up a submission or a
'task' and working on it and place the result in the next units queue. In this
approach, the execution units can be run independently (to an extent) thereby
reducing the wait times.

cheers,

Krishnan

suren...@gmail.com

unread,

Oct 13, 2010, 6:15:37 AM10/13/10

to codec...@googlegroups.com

Top posting as I am only summarizing the response.

The loose decoupling that is suggested is a very good step considering
the utility of setuid helper as a stand alone tool. Please find the
following parts that could be the major components, in my opinion.

* Setuid Helper :
takes an executable, runs it against an input in an controlled
environment, produces an output or fails if exceeds the limitations.
Doesn't have to know anything about the checker's data models or
logic.

* Evaluation logic :
Given an output that the setuid helper produced by running a
submission and a desired output (either as a direct output file or an
executable that can check the accuracy of the output produced ) and
returns a score. This also does not know about the checker's data
model/logic

* Checker Main thread :
This is closely bound with the data model. This can be a
daemon/crontab which keeps polling the database for new submissions,
compiles it depending upon language, makes a call to setuid helper and
follow it up with a call to evaluation logic. Updates the db
accordingly.

* Ranklist generator : takes a contest ID and generates a ranklist
based upon a decided rules. Closely tied to our data models.

* Frontend: Django handles it pretty good, again closely tied to our data model.

In this approach, I see that any one can use the setuid helper and
evaluation logic to develop their own checker.

On Wed, Oct 13, 2010 at 9:07 AM, krishnan parthasarathi

--
regards
Suren

http://twitter.com/suren

Aditya Manthramurthy

unread,

Oct 13, 2010, 1:29:50 PM10/13/10

to codec...@googlegroups.com

I am going to propose a somewhat different design than what you guys
have been talking about.

Secure Execution Module (SE Module) - (Its basically a new name for
setuid_helper which sounds ugly IMO) - This module simply runs an
executable program against an file input and generates a file output -
applies various limits like memory, cpu time, disk access, etc.

Evaluation Module - This module checks a generated output with the
reference output file or output-checker program. The output-checker
program (that checks the output of the submission for correctness) is a
program submitted by a problem-setter-level-user. This program should
also be treated as potentially harmful, and should IMO be run via the SE
Module.

Codechecker Core - This module takes a source code, compiles it,
generates an executable and passes it on to the SE and Evalution
modules. This module talks to other separate modules that handle the db
changes. This way, it makes it possible to plugin a different persistent
access module like say CouchDB in another application that uses the
Codechecker Core.

Persistent Storage Module - This module abstacts db access away from the
Codechecker Core. It should be possible to perhaps derive this module
and use a new one (that interfaces say with a NOSQL db) in the CC Core.

Other modules are also required, like a scoring module to score
submissions using the results of all the tests, a ranklist module to
generate a ranklist, etc. Also, the frontend needs some changes. The
users should have user levels configured. A normal user should be able
to only submit solutions to problems. A psetter level user should be
able to set problems for a site. A contest setter level user should be
able to setup a contest. An admin-user should be able to (obviously) do
anything!

How does this sound?

--
Aditya.

suren...@gmail.com

unread,

Oct 13, 2010, 2:40:36 PM10/13/10

to codec...@googlegroups.com

Hi ,

> Codechecker Core - This module takes a source code, compiles it,
> generates an executable and passes it on to the SE and Evalution
> modules. This module talks to other separate modules that handle the db
> changes. This way, it makes it possible to plugin a different persistent
> access module like say CouchDB in another application that uses the
> Codechecker Core.
>
> Persistent Storage Module - This module abstacts db access away from the
> Codechecker Core. It should be possible to perhaps derive this module
> and use a new one (that interfaces say with a NOSQL db) in the CC Core.

> --
> Aditya.
>

Though I have not stressed this point in the past, I think its
important to know that Django also plays the role of abstracting the
DBMS.
When we write code, say, to fetch all contests, we write something like,
<snip>
Contests.objects.all()
</snip>

Now this itself is a well defined abstraction for all our purposes and
it would work for all the DBMS supported by Django should work out of
the box for both the backend and frontend. I am not sure if couch DB
or similar noSQL based DB's are supported by Django yet, but I am
certain that we don't have to go to that level, as part of codechcker
yet. In fact I am not entirely sure that we should even try make our
checker to be made capable of noSQL as all our data models are very
relational and noSQL might not.

Also as much as the frontend part goes, the power of Django comes with
the ORM support it provides. To go out of it and to write our own ORM
support seem to beat the purpose of using Django for FE development.

---
regards
Suren

http://twitter.com/suren

Aditya Manthramurthy

unread,

Oct 14, 2010, 10:29:57 AM10/14/10

to codec...@googlegroups.com

On Thursday 14 October 2010 12:10 AM, suren...@gmail.com wrote:
> Though I have not stressed this point in the past, I think its
> important to know that Django also plays the role of abstracting the
> DBMS.
> When we write code, say, to fetch all contests, we write something like,
> <snip>
> Contests.objects.all()
> </snip>
>
> Now this itself is a well defined abstraction for all our purposes and
> it would work for all the DBMS supported by Django should work out of
> the box for both the backend and frontend. I am not sure if couch DB
> or similar noSQL based DB's are supported by Django yet, but I am
> certain that we don't have to go to that level, as part of codechcker
> yet. In fact I am not entirely sure that we should even try make our
> checker to be made capable of noSQL as all our data models are very
> relational and noSQL might not.

I agree that it is a well defined abstraction, and I understand it much
better now. But it is an abstraction designed for a website. The
codechecker could be a more general purpose application that can be used
with say a standalone application like say Mooshak. The idea I am
proposing decouples it from our abstraction of a particular website. The
system could also be used with a different kind of website, say like
istreet's codechecker.

Anyway, the abstraction is not so difficult to achieve. I've just pushed
a new folder to our repo called cc_backend. Have a look at it. It has
stub code and a little documentation for the top level module as well
for the storage layer abstraction. Let me know your thoughts. I don't
care if it sucks, so there's no need to be polite if you feel like
criticizing :P. But have a look!

>
> Also as much as the frontend part goes, the power of Django comes with
> the ORM support it provides. To go out of it and to write our own ORM
> support seem to beat the purpose of using Django for FE development.
>

I am not at all suggesting a new ORM model. The default is going to use
Django's abstraction only. The thing is that it will be possible to
easily implement a different storage engine if required. That is all.