GSoC project: Analyzing the Qubes code base

Paras Chetal

unread,

Mar 2, 2017, 7:26:28 AM3/2/17

to qubes-devel

Hi,

I am a computer science undergraduate studying at IIT Roorkee. I wish to contribute to QubesOS under the Google Summer of Code and look forward to continue contributing to the community as an active developer beyond the GSoC period too. I was quite excited when I got to know that QubesOS had been selected as a mentoring organization. I have heard a lot of praise about Qubes regarding the security and privacy it offers, and now I'd like to use GSoC as an initiative to contribute my bit to it.

I plan to work on analyzing the Qubes code base. The project would involve working on leveraging modern static and dynamic analysis, and formally analyzing how untrusted user inputs propagate through the code base, which have been mentioned in the ideas page. I have experience in reverse engineering, static and dynamic analysis, from my frequent participation in CTFs. I'm quite excited to work on analyzing a full fledged OS.

I believe Jean-Philippe Ouellet will be my mentor for the project. Could you please guide me on how I set up my system to get started? Maybe assign me some micro-tasks related to the project for a start? Also, could you please elaborate on what all is included in the "untrusted user input" part?
I have already installed QubesOS on my system and am transitioning to start using it as my default OS.

Cheers!

Regards,
Paras Chetal

Marek Marczykowski-Górecki

unread,

Mar 3, 2017, 5:19:25 AM3/3/17

to Paras Chetal, qubes-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Thu, Mar 02, 2017 at 04:26:28AM -0800, Paras Chetal wrote:
> Hi,
>
> I am a computer science undergraduate studying at IIT Roorkee. I wish to
> contribute to QubesOS under the Google Summer of Code and look forward to
> continue contributing to the community as an active developer beyond the
> GSoC period too. I was quite excited when I got to know that QubesOS had
> been selected as a mentoring organization. I have heard a lot of praise
> about Qubes regarding the security and privacy it offers, and now I'd like
> to use GSoC as an initiative to contribute my bit to it.

Welcome :)

> I plan to work on analyzing the Qubes code base. The project would involve
> working on leveraging modern static and dynamic analysis, and formally
> analyzing how untrusted user inputs propagate through the code base, which

> have been mentioned in the ideas page <https://www.qubes-os.org/gsoc/>. I

> have experience in reverse engineering, static and dynamic analysis, from
> my frequent participation in CTFs. I'm quite excited to work on analyzing a
> full fledged OS.
>
> I believe Jean-Philippe Ouellet will be my mentor for the project. Could
> you please guide me on how I set up my system to get started? Maybe assign
> me some micro-tasks related to the project for a start? Also, could you
> please elaborate on what all is included in the "untrusted user input" part?
> I have already installed QubesOS on my system and am transitioning to start
> using it as my default OS.

Recently we have extended https://www.qubes-os.org/gsoc/ page with some
additional information on where to start.

I recommend starting with setting up qubes-builder, which, among other
things, will download all relevant source code. You can look around
there, especially qubes-src directory.

There are a lot of simple issues to use as practice, for example this
one:
https://github.com/QubesOS/qubes-issues/issues/2660
The fix itself is trivial, but should get you familiar with
qubes-builder, code signing, pull requests and other related things.

As for untrusted input - it's about everything transferred between trust
zones - basically between any VMs, or from a VM to dom0.
For the latter category, see here:
https://www.qubes-os.org/doc/security-critical-code/
For the former (transferring data between VMs), it's about qrexec
services, like file-copy. Most of them are here:
https://github.com/QubesOS/qubes-core-agent-linux/tree/master/qubes-rpc
And also some (most?) of qubes-app-* repositories. Split GPG (ticket
#2660 mentioned above) is one of them.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJYuUMlAAoJENuP0xzK19cs4QQH/2uFTicnf/vJCbrERNLp26/v
Hlhe4WjSPqd0jkhS7qmuVfE2Hl82GhdycEVnQfNi6Px9fqQYFri7kJvneQxZ2DQo
FsGpG/qNYNlVxlvhVZvbYEO9pMGpo7tjVpIPX5cn4yeg5+L9FykJXBNf2lrCZPLS
/kjCpruCYZKEdIpxzgwwJawkESH3gil5QL0u+n3M8UCn3z+6N/iKS7vb9caKGIQC
5VRazwLOUX2C7B9uO8bDo9UtRAnJ4lbTtgHZ1FJxjuY9sVl1lnPXnE4HNrvYEDSO
C1JqDbtYsBT4P78WBDhCdoNmfbanokd/+wIGLIY7G+Jw1wiDyeAMSq1XDiTV7zk=
=4hnk
-----END PGP SIGNATURE-----

Jean-Philippe Ouellet

unread,

Mar 9, 2017, 4:58:04 PM3/9/17

to Marek Marczykowski-Górecki, Paras Chetal, qubes-devel

On Fri, Mar 3, 2017 at 5:19 AM, Marek Marczykowski-Górecki
<marm...@invisiblethingslab.com> wrote:
> On Thu, Mar 02, 2017 at 04:26:28AM -0800, Paras Chetal wrote:
>> Hi,
>>
>> I am a computer science undergraduate studying at IIT Roorkee. I wish to
>> contribute to QubesOS under the Google Summer of Code and look forward to
>> continue contributing to the community as an active developer beyond the
>> GSoC period too. I was quite excited when I got to know that QubesOS had
>> been selected as a mentoring organization. I have heard a lot of praise
>> about Qubes regarding the security and privacy it offers, and now I'd like
>> to use GSoC as an initiative to contribute my bit to it.
>
> Welcome :)
>
>> I plan to work on analyzing the Qubes code base. The project would involve
>> working on leveraging modern static and dynamic analysis, and formally
>> analyzing how untrusted user inputs propagate through the code base, which
>> have been mentioned in the ideas page <https://www.qubes-os.org/gsoc/>. I
>> have experience in reverse engineering, static and dynamic analysis, from
>> my frequent participation in CTFs. I'm quite excited to work on analyzing a
>> full fledged OS.

Awesome! This is the project I am personally most interested in as well :)

>> I believe Jean-Philippe Ouellet will be my mentor for the project. Could
>> you please guide me on how I set up my system to get started? Maybe assign
>> me some micro-tasks related to the project for a start? Also, could you
>> please elaborate on what all is included in the "untrusted user input" part?
>> I have already installed QubesOS on my system and am transitioning to start
>> using it as my default OS.
>
> Recently we have extended https://www.qubes-os.org/gsoc/ page with some
> additional information on where to start.
>
> I recommend starting with setting up qubes-builder, which, among other
> things, will download all relevant source code. You can look around
> there, especially qubes-src directory.
>
> There are a lot of simple issues to use as practice, for example this
> one:
> https://github.com/QubesOS/qubes-issues/issues/2660
> The fix itself is trivial, but should get you familiar with
> qubes-builder, code signing, pull requests and other related things.

Yep. Perhaps not that issue in particular, but I 2nd Marmarek's advice.

The first thing to do is to become fully comfortable with using Qubes
as your main day-to-day OS, as well as our workflow for building,
signing, submitting, & reviewing code, as well as gain general
familiarity with the various components of a Qubes system, and where
the code for each resides.

Reading the Qubes arch spec [1] is probably a useful start for
understanding some of the design rationale, even though it is somewhat
outdated at this point.

Then I'd say just start diving into the code. Asking any
(reasonably-informed) questions that may arise in doing so is
certainly encouraged, and are probably best asked here on qubes-devel.

[1]: https://www.qubes-os.org/attachment/wiki/QubesArchitecture/arch-spec-0.3.pdf

> As for untrusted input - it's about everything transferred between trust
> zones - basically between any VMs, or from a VM to dom0.
> For the latter category, see here:
> https://www.qubes-os.org/doc/security-critical-code/
> For the former (transferring data between VMs), it's about qrexec
> services, like file-copy. Most of them are here:
> https://github.com/QubesOS/qubes-core-agent-linux/tree/master/qubes-rpc
> And also some (most?) of qubes-app-* repositories. Split GPG (ticket
> #2660 mentioned above) is one of them.

Exactly.

There are actually many approaches that can be taken here, and
definitely some flexibility to pursue whichever you are most
interested in.

The important part to note here is that it is not very useful to only
produce a model of the current state of the Qubes codebase, but rather
we should aim to automate analysis that can help detect issues in the
future and generally help improve and maintain the quality of our code
in the long run.

In particular, some kind of document of "here is how untrusted inputs
propagate" has a very short shelf life and therefore limited utility,
and is not what we should seek to accomplish. Rather, the goal should
be to stand up infrastructure and tooling to automate or at least aid
continual analysis into the future.

I describe several possible approaches below:

1. Fuzzing

I think it could be a very useful thing to stand up some automated
fuzzing of individual qubes components. The first step would be to
identify and enumerate all interfaces which cross trust-boundaries. It
would be awesome if this identification could also be automated (e.g.
via cross-process taint tracking of data originating from vchan
buffers). There are obvious ones like the qrexec services, the gui
protocol, and basically everything else that is directly one domain
communicating with another over vchan. Then there are also the
slightly less obvious and protocol-specific things, like qfilecopy
unpacking, where what exactly "dangerous" behavior is isn't always
clear. For example, it is unlikely we would find memory corruption
issues leading to traditional code execution, but rather we are quite
interested in filesystem modifications outside a particular hierarchy.
As you see, simply applying fuzzing tools to look for arbitrary code
execution in localized contexts would not be sufficient to catch
everything we are interested in.

One thing that could have good long-term value to the project is
getting all the qubes components running under oss-fuzz [2]. This
would require writing a bunch of small harnesses for the code under
test.

[2]: https://github.com/google/oss-fuzz

2. Static analysis

There are several ways to approach static analysis projects in Qubes.
I think the first step would be (not so difficult) modifications to
qubes-builder such that we have a convenient way to run e.g. clang's
scan-build across all qubes components and collect the results.

Then, I think there is certainly a case to be made for writing custom
analysis passes e.g. to enforce we follow our proclaimed pattern [3]
for handling untrusted input (which admittedly we could do a better
job of). There is plenty of work to build on for doing this in C,
python afaik less so.

[3]: https://www.qubes-os.org/doc/coding-style/#security-coding-guidelines

Another useful thing would be somehow using symbolic analysis or taint
tracking to determine which data is attacker controlled, and how it is
constrained (and where in the source that happens) to aid manual
auditing. This is made difficult by the fact that we pass data between
different processes and languages rather often. This requires
significantly more thought and research.

It may also prove useful to have a mechanism to e.g. build the entire
codebase with clang sanitizers, or run everything under valgrind tools
and centrally collect any warnings, etc.

There are really many directions this could go. I encourage you to
think about what you are most interested in, and we can then work
together to determine what satisfactory progress for GSoC means for
your official proposal. We are also allowed to revise that if things
turn out to be too easy or too ambitious, so don't be afraid :)

If you prefer IRC, I can be found as jpo on Freenode & OFTC (and
idling in #qubes on both) and currently live in UTC-5.

Cheers (and apologies for the delayed reply),
Jean-Philippe

Jean-Philippe Ouellet

unread,

Mar 18, 2017, 4:15:50 AM3/18/17

to Paras Chetal, qubes-devel

If you are still interested in this project, we really need to work
together before the proposals are due to determine a more concrete
plan (with milestones, etc.) for what specifically you plan to work
on.

GSoC proposals need to be clear, and "I plan to investigate ${vague
thing} using ${generic category of approaches}" is not sufficient.
This means having a clear plan of "what" and "how", which your
progress can be compared against for GSoC's midterm & final
evaluations.

Paras Chetal

unread,

Mar 18, 2017, 10:32:25 AM3/18/17

to qubes-devel, paras....@gmail.com

Hi Jean-Philippe,

Really sorry for the delayed response, but yes, I am still very much interested in the project.
I have been going through the arch-spec of qubes and have been studying the code base of qubes-builder, trying to understand how everything is stitched together.

Regarding static analysis, I ran scan-build across the various components in the qubes-builder, but that does not really reveal anything interesting off the top (as expected). I will still be sending a PR in by the end of this weekend to have a convinient bash script in the qubes-builder/scripts directory which will scan-build the various components under qubes-builder/qubes-src. Custom static analysis of untrusted_ variables will obviously be much more useful, and I'll add it later.

I'm more interested in taking up extensive fuzzing (using afl) of the various qubes components. It seems to me that would be much more useful for Qubes than just static analysis. The two major tasks, like you said, would be the identification and subsequent guided fuzzing of the interfaces which cross trust boundaries. Since I'm relatively new to this, I'm not sure how I should divide them into sub tasks. I would appreciate if you could help me with that. Once we have the sub-tasks figured out, I'll think about the milestones and a proposed timeline for the project.

I think I'll be able to work on both static and dynamic analysis over the summer. Implementing symbolic execution is still unclear, and I'll work on it later if time permits.
I'll stay in touch with you on the #qubes channel on freenode IRC. (nick: feignix)

Regards,
Paras Chetal

Paras Chetal

unread,

Mar 21, 2017, 3:59:55 AM3/21/17

to qubes-devel, paras....@gmail.com, j...@vt.edu

Hi Jean-Philippe,

Could you please explain how identification of the interfaces which exchange data from vchan buffers could be automated? Designing an intermediate layer through which all data will move through, seems to me to be the ideal way. I don't have much knowledge of how cross-process taint tracking could be implemented here. Could you please point me to some resources from where I can read up on it?. Also, I have some questions regarding how I should set up my system to get started with fuzzing. For instance, let's say I start with the fuzzing of libvchan. I would have to provide some dummy input data (controlled by afl-fuzz) and then store the results, right? How would I detect whether the
input actually led to unintended behavior? Since I'm fuzzing individual components but I need to analyze the effect these components have on the whole system. (like filesystem modifications outside of the hierarchy).

I have started writing the proposal for GSoC. I'll be sharing it in around a week or so. I plan to lay the groundwork for both static and dynamic analysis. Please let me know what according to you would be the satisfactory outcome for the project.

Regards,
Paras Chetal

Reply all

Reply to author

Forward