On Fri, Mar 3, 2017 at 5:19 AM, Marek Marczykowski-Górecki
<
marm...@invisiblethingslab.com> wrote:
> On Thu, Mar 02, 2017 at 04:26:28AM -0800, Paras Chetal wrote:
>> Hi,
>>
>> I am a computer science undergraduate studying at IIT Roorkee. I wish to
>> contribute to QubesOS under the Google Summer of Code and look forward to
>> continue contributing to the community as an active developer beyond the
>> GSoC period too. I was quite excited when I got to know that QubesOS had
>> been selected as a mentoring organization. I have heard a lot of praise
>> about Qubes regarding the security and privacy it offers, and now I'd like
>> to use GSoC as an initiative to contribute my bit to it.
>
> Welcome :)
>
>> I plan to work on analyzing the Qubes code base. The project would involve
>> working on leveraging modern static and dynamic analysis, and formally
>> analyzing how untrusted user inputs propagate through the code base, which
>> have been mentioned in the ideas page <
https://www.qubes-os.org/gsoc/>. I
>> have experience in reverse engineering, static and dynamic analysis, from
>> my frequent participation in CTFs. I'm quite excited to work on analyzing a
>> full fledged OS.
Awesome! This is the project I am personally most interested in as well :)
>> I believe Jean-Philippe Ouellet will be my mentor for the project. Could
>> you please guide me on how I set up my system to get started? Maybe assign
>> me some micro-tasks related to the project for a start? Also, could you
>> please elaborate on what all is included in the "untrusted user input" part?
>> I have already installed QubesOS on my system and am transitioning to start
>> using it as my default OS.
>
> Recently we have extended
https://www.qubes-os.org/gsoc/ page with some
> additional information on where to start.
>
> I recommend starting with setting up qubes-builder, which, among other
> things, will download all relevant source code. You can look around
> there, especially qubes-src directory.
>
> There are a lot of simple issues to use as practice, for example this
> one:
>
https://github.com/QubesOS/qubes-issues/issues/2660
> The fix itself is trivial, but should get you familiar with
> qubes-builder, code signing, pull requests and other related things.
Yep. Perhaps not that issue in particular, but I 2nd Marmarek's advice.
The first thing to do is to become fully comfortable with using Qubes
as your main day-to-day OS, as well as our workflow for building,
signing, submitting, & reviewing code, as well as gain general
familiarity with the various components of a Qubes system, and where
the code for each resides.
Reading the Qubes arch spec [1] is probably a useful start for
understanding some of the design rationale, even though it is somewhat
outdated at this point.
Then I'd say just start diving into the code. Asking any
(reasonably-informed) questions that may arise in doing so is
certainly encouraged, and are probably best asked here on qubes-devel.
[1]:
https://www.qubes-os.org/attachment/wiki/QubesArchitecture/arch-spec-0.3.pdf
Exactly.
There are actually many approaches that can be taken here, and
definitely some flexibility to pursue whichever you are most
interested in.
The important part to note here is that it is not very useful to only
produce a model of the current state of the Qubes codebase, but rather
we should aim to automate analysis that can help detect issues in the
future and generally help improve and maintain the quality of our code
in the long run.
In particular, some kind of document of "here is how untrusted inputs
propagate" has a very short shelf life and therefore limited utility,
and is not what we should seek to accomplish. Rather, the goal should
be to stand up infrastructure and tooling to automate or at least aid
continual analysis into the future.
I describe several possible approaches below:
1. Fuzzing
I think it could be a very useful thing to stand up some automated
fuzzing of individual qubes components. The first step would be to
identify and enumerate all interfaces which cross trust-boundaries. It
would be awesome if this identification could also be automated (e.g.
via cross-process taint tracking of data originating from vchan
buffers). There are obvious ones like the qrexec services, the gui
protocol, and basically everything else that is directly one domain
communicating with another over vchan. Then there are also the
slightly less obvious and protocol-specific things, like qfilecopy
unpacking, where what exactly "dangerous" behavior is isn't always
clear. For example, it is unlikely we would find memory corruption
issues leading to traditional code execution, but rather we are quite
interested in filesystem modifications outside a particular hierarchy.
As you see, simply applying fuzzing tools to look for arbitrary code
execution in localized contexts would not be sufficient to catch
everything we are interested in.
One thing that could have good long-term value to the project is
getting all the qubes components running under oss-fuzz [2]. This
would require writing a bunch of small harnesses for the code under
test.
[2]:
https://github.com/google/oss-fuzz
2. Static analysis
There are several ways to approach static analysis projects in Qubes.
I think the first step would be (not so difficult) modifications to
qubes-builder such that we have a convenient way to run e.g. clang's
scan-build across all qubes components and collect the results.
Then, I think there is certainly a case to be made for writing custom
analysis passes e.g. to enforce we follow our proclaimed pattern [3]
for handling untrusted input (which admittedly we could do a better
job of). There is plenty of work to build on for doing this in C,
python afaik less so.
[3]:
https://www.qubes-os.org/doc/coding-style/#security-coding-guidelines
Another useful thing would be somehow using symbolic analysis or taint
tracking to determine which data is attacker controlled, and how it is
constrained (and where in the source that happens) to aid manual
auditing. This is made difficult by the fact that we pass data between
different processes and languages rather often. This requires
significantly more thought and research.
It may also prove useful to have a mechanism to e.g. build the entire
codebase with clang sanitizers, or run everything under valgrind tools
and centrally collect any warnings, etc.
There are really many directions this could go. I encourage you to
think about what you are most interested in, and we can then work
together to determine what satisfactory progress for GSoC means for
your official proposal. We are also allowed to revise that if things
turn out to be too easy or too ambitious, so don't be afraid :)
If you prefer IRC, I can be found as jpo on Freenode & OFTC (and
idling in #qubes on both) and currently live in UTC-5.
Cheers (and apologies for the delayed reply),
Jean-Philippe