GSoC Student Introduction 2017

197 views
Skip to first unread message

Alisa Matsak

unread,
Mar 12, 2017, 6:00:27 AM3/12/17
to qubes-devel
Hi everyone!

Hope it’s not too late to introduce myself!
My name is Alisa Matsak and I’m living in Moscow, Russia. I’m in my third year at Moscow State University (Faculty of Computational Mathematics and Cybernetics). My laboratory is directly related to security and developing methods of protection.
So I would be happy to join the Qubes project under GSoC, I think it would be an excellent experience for me and a chance to upgrade my knowledge and skills. It’s important thing that Google helps students like me to begin working on real projects and promotes Open Source Software.
I can’t identify myself as an experienced developer, but I feel pretty confident about my skills in C/C++, I had experience with Python, I like Linux and its internals.
I have already installed Qubes (wasn’t quite easy, but I’ve dealt with all the difficulties already). I have also read your GSoC page and began reading documentation.
I am interested in working on LogVM(s) project from your Idea List.
Maybe, someone can help me with finding some easy task (like test task) to get started with working on Qubes (I have already seen Github issues page, but it’s not trivial to find not long-term issue)?

I look forward to a reply.
Best wishes,
Alisa Matsak.

Jean-Philippe Ouellet

unread,
Mar 12, 2017, 2:16:57 PM3/12/17
to Alisa Matsak, shubham dubey, qubes-devel
On Sun, Mar 12, 2017 at 6:00 AM, Alisa Matsak <arisu...@gmail.com> wrote:
> Hi everyone!

Hi! Welcome!

> Hope it’s not too late to introduce myself!

Nope. Not at all.

> I am interested in working on LogVM(s) project from your Idea List.

Shubham (CC'd here) has also expressed interest in working on LogVM integration.

I would encourage a discussion between you two to see if maybe one of
you is equally (or more) interested in another project as well,
because we cannot accept two students for the same (relatively small)
project.

Shubham also expressed interest in reviving the Live USB project as
well. While definitely more work, I think that would likely more
beneficial to the project.

If both of you apply for the LogVMs, then it will come down to who has
the best application, and who we believe would be more likely to
succeed at completing the project (part of which is indicated by level
of interaction with the community prior to that decision being made).

shubham dubey

unread,
Mar 12, 2017, 2:25:29 PM3/12/17
to qubes-devel, arisu...@gmail.com, sdub...@gmail.com



Well I have shifted all my interest earlier into Qubes Live USB project ,so she can apply for that project:).
 

Alisa Matsak

unread,
Mar 12, 2017, 2:56:22 PM3/12/17
to qubes-devel, sdub...@gmail.com


воскресенье, 12 марта 2017 г., 21:25:29 UTC+3 пользователь shubham dubey написал:

Thank you, it's really very nice that you've given this project to me)
 

Jean-Philippe Ouellet

unread,
Mar 12, 2017, 3:20:33 PM3/12/17
to Alisa Matsak, qubes-devel
On Sun, Mar 12, 2017 at 6:00 AM, Alisa Matsak <arisu...@gmail.com> wrote:
> I am interested in working on LogVM(s) project from your Idea List.
> Maybe, someone can help me with finding some easy task (like test task) to
> get started with working on Qubes (I have already seen Github issues page,
> but it’s not trivial to find not long-term issue)?

As for tasks to get started, there are two categories: things to get
you familiar with the workflow for proposing changes to the qubes
codebase, and tasks to familiarize yourself with the specifics of what
is required for your chosen task.

For the former, review the pages linked from the gsoc page [1],
particularly those on qubes-builder and code-signing, and definitely
feel free to pick up any tasks in qubes-issues [2] that look easy to
you.

For tasks related directly to LogVMs as a GSoC project:
1. I would familiarize yourself with the Qubes RPC framework [3],
because this is the mechanism you will use to send and receive logs
across domains.
2. Take a look at the prior work done for build logs [4], as this may
be largely reusable.
3. Read about syslog and journald, specifically with respect to
sending logs off to remote machines. Keep track of the things you read
(links and such), as it would be a good idea to include details and
rationale about why you select your particular method of hooking into
the logging subsystems. Specifically we are interested in avoiding
complex processing of the produced logs, unless that processing
happens in a DispVM.

That should be plenty to get you started.

Please feel free to ask any questions you may have!

Regards,
Jean-Philippe

[1]: https://www.qubes-os.org/gsoc/
[2]: https://github.com/QubesOS/qubes-issues/issues
[3]: https://www.qubes-os.org/doc/qrexec3/
[4]: https://github.com/QubesOS/qubes-issues/issues/2023
Message has been deleted

Jean-Philippe Ouellet

unread,
Mar 12, 2017, 4:05:46 PM3/12/17
to Alisa Matsak, qubes-devel
On Sun, Mar 12, 2017 at 3:28 PM, Alisa Matsak <arisu...@gmail.com> wrote:
> Okay, I got it)
> But let me clarify, do you mean working directly on project or just trying
> to fix bugs and asking questions?

I don't expect you to do your GSoC project before the summer. It's
Google *Summer* of Code for a reason ;) I fully understand that many
of us are quite busy with other things at the moment (myself
included).

The goal is to bootstrap your familiarity with the community, the
code, and the systems you will be interfacing with so that you can be
optimally productive later on.

That said, if you would like to begin working on things in earnest,
there's certainly nothing stopping you from doing so either!

> On Sunday, 12 March 2017 21:16:57 UTC+3, Jean-Philippe Ouellet wrote:
>>
>> > level of interaction with the community prior to that decision being
>> > made

All I mean here is that if given more applicants than we can accept,
I'd prefer the candidate who is an active member of the community over
one who dropped a single email saying "Hi, I'd like to work on ____".

Alisa Matsak

unread,
Mar 12, 2017, 4:42:51 PM3/12/17
to qubes-devel, arisu...@gmail.com
Oh, sorry for this message:
On Sun, Mar 12, 2017 at 3:28 PM, Alisa Matsak <arisu...@gmail.com> wrote:
> Okay, I got it)
> But let me clarify, do you mean working directly on project or just trying
> to fix bugs and asking questions?
I received your "to get started" message as soon as I sent it.

I am very thankful to you for such a detailed explanation and I'll try to do my best! :)

Alisa Matsak

unread,
Mar 23, 2017, 5:28:59 PM3/23/17
to qubes-devel, arisu...@gmail.com
On Sunday, 12 March 2017 22:20:33 UTC+3, Jean-Philippe Ouellet wrote:
> Please feel free to ask any questions you may have!


Hi again!

Since last time I've learned some materials related to LogVM task. Now I want to use your offer about asking questions that I have. :)

I discovered that the majority of required functionality had already been implemented as a part of journald (which is a part of systemd project). Journald saves all the logs it knows about (such as kernel messages generated with printk(), userspace messages generated with syslog(3), userspace entries using the native API, coredumps via /proc/proc/sys/kernel/core_pattern and more; I took it from here: [1]). It also takes care of security (undetected manipulation is impossible because of because of each entry cryptographically verifying all previous ones) and journal files rotation for more efficient disk usage. At the same time it provides tools for comfortably viewing logs and even searching them. Because of the fact that all our VMs are working on Fedora, we can use all this features for our profit. (All of this sounds like journald advertisement. xD)

Journald developers advise not to change journal files because of basic principles of journald implementation. They describe its on-disk format and note that it is "not what you want to use as base of your project". I think that we can parse journal export format (reasoning for why this is necessary below) to delete meta-information, but I'm afraid journald won't work with our modified file later. So this way an attempt to write some tool for processing such files is similar to reinventing the wheel (or reimplementing journald).

My idea for the project is the following. Among other functionality, journald contains functions for sending and receiving journal messages over the network. For our goals we need its systemd-journal-remote [2] and systemd-journal-upload [3]. For transmiting entries journald uses the special format [4]. The problem is those tools only support transmitting logs in HTTP/HTTPS over TCP/IP, while we only support VMs communicating via qrexec. I think a simple proxy-server (maybe, even a self-written one) would solve the problem. Journald on VMs would send its logs to the proxy (that works on the same VM in the background) and it, in its turn, would open qrexec connection to pass them to LogVM. LogVM here would be a usual VM working on Fedora. There would also be a proxy-server working on LogVM in the background. It would receive data via qrexec and simulates for journald on LogVM the situation like it was received through TCP/IP. So this way can be suitable for collecting logs from other VMs.

For better understanding the process I attach the scheme of the described process [5]. Hope it will be useful.

Please, let me know what do you think about this idea. Is it suitable for this project? Can I write а proposal based on it?

Best wishes, Alisa.

[1] https://goo.gl/BaCCko
[2] https://www.freedesktop.org/software/systemd/man/systemd-journal-remote.html
[3] https://www.freedesktop.org/software/systemd/man/systemd-journal-upload.html
[4] https://www.freedesktop.org/wiki/Software/systemd/export/
[5] https://goo.gl/8euAAM

Marek Marczykowski-Górecki

unread,
Mar 23, 2017, 7:30:57 PM3/23/17
to Alisa Matsak, qubes-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Thu, Mar 23, 2017 at 02:28:59PM -0700, Alisa Matsak wrote:
> On Sunday, 12 March 2017 22:20:33 UTC+3, Jean-Philippe Ouellet wrote:
>
> > > Please feel free to ask any questions you may have!
> >
>
> Hi again!
>
> Since last time I've learned some materials related to LogVM task. Now I
> want to use your offer about asking questions that I have. :)
>
> I discovered that the majority of required functionality had already been
> implemented as a part of journald (which is a part of systemd project).
> Journald saves all the logs it knows about (such as kernel messages
> generated with printk(), userspace messages generated with syslog(3),
> userspace entries using the native API, coredumps via
> /proc/proc/sys/kernel/core_pattern and more; I took it from here: [1]). It
> also takes care of security (undetected manipulation is impossible because
> of because of each entry cryptographically verifying all previous ones) and
> journal files rotation for more efficient disk usage. At the same time it
> provides tools for comfortably viewing logs and even searching them.
> Because of the fact that all our VMs are working on Fedora, we can use all
> this features for our profit. (All of this sounds like journald
> advertisement. xD)

Not all the VMs are running Fedora, there are also Debian-based and some
people use also Arch. Not to mention also Windows, but I think we can
not care about it for now ;)
But currently all the Linux VMs do have journald (hmm, not sure about
Arch?). Using journald as a log collecting tool looks like a good idea.
But mostly because it is already there, not because we depend on some
specific feature of it. The question is how exactly (including data
format) transfer entries from one journald instance to another.

Also, if using journald for managing log storage, we need to make sure
it's reasonably configured. For example to prevent a VM to produce a lot
of log entries causing log rotation and removing very recent logs
(possibly some evidence of compromise).

Even worse - removing not-so-old entries related to other VMs. Example
attack scenario:
1. 'work' VM got successfully attacked, but Log VM got evidences of the
attack
2. Compromised VM start a new DispVM
3. That DispVM produce a lot of rubbish log entries, causing all the
recent logs to be rotated and removed
4. Compromised 'work' VM also clean local logs (if any)

Now, the only thing you have in LogVM is some garbage sent by a random
DispVM and you don't even know which VM started that DispVM (because
that log entries were rotated too). While you may suspect that it isn't
only DispVM that got compromised, you have no idea which VM it is and
what exactly have happened.

Probably some rate-limiting (maybe connected with alerting) should solve
this problem, but we need to think about such scenarios.

> Journald developers advise not to change journal files because of basic
> principles of journald implementation. They describe its on-disk format and
> note that it is "not what you want to use as base of your project". I think
> that we can parse journal export format (reasoning for why this is
> necessary below) to delete meta-information, but I'm afraid journald won't
> work with our modified file later. So this way an attempt to write some
> tool for processing such files is similar to reinventing the wheel (or
> reimplementing journald).

Using full journald export format isn't a good idea, at least for those
reasons:
- many fields should be out of control for sending VMs - for example
hostname, timestamp, but probably more
- many fields are unnecessary (for example all __*), so lets keep the
attack surface as small as possible; even Lennart Poettering can't
write bug-free code ;)
- for the same reason, I'd filter out binary entries (replace
non-printable characters with dot, underscore or sth like this) -
even if journald itself handle them well, some log-viewing software
may not; even simple 'less' command throw a bunch of parsers on its
input...

If using this format, I'd use some simplified version - filter out
unneeded fields (most of them?) when sending, and synthesize those
required after receiving entry in LogVM. And of course reject entries
not conforming to this simplified specification.

To be honest, I think the "short" format (`journalctl --output short`),
with timestamp and hostname stripped off is enough. So, basically just
MESSAGE field from "export" format.
If that means the need to synthesise all the other fields (which I
doubt), lets be it.

> My idea for the project is the following. Among other functionality,
> journald contains functions for sending and receiving journal messages over
> the network. For our goals we need its systemd-journal-remote [2]

Looks like this tool can accept input not only from the network, but
also from a local socket :)

> and
> systemd-journal-upload [3]. For transmiting entries journald uses the
> special format [4]. The problem is those tools only support transmitting
> logs in HTTP/HTTPS over TCP/IP

Receiving part support local socket, without HTTP(S) wrapping - see
- --listen-raw option. But the sending part indeed looks like supporting
only HTTP(S).

> , while we only support VMs communicating via
> qrexec. I think a simple proxy-server (maybe, even a self-written one)
> would solve the problem. Journald on VMs would send its logs to the proxy
> (that works on the same VM in the background) and it, in its turn, would
> open qrexec connection to pass them to LogVM. LogVM here would be a usual
> VM working on Fedora. There would also be a proxy-server working on LogVM
> in the background. It would receive data via qrexec and simulates for
> journald on LogVM the situation like it was received through TCP/IP. So
> this way can be suitable for collecting logs from other VMs.

Such proxies on both sides seems like a reasonable solution. Keep in
mind that the proxy on receiving side has a very important job: make
sure the entries conform to required format, whatever the format will
be. The simpler the format will be, the simpler the tool will be.

> For better understanding the process I attach the scheme of the described
> process [5]. Hope it will be useful.
>
> Please, let me know what do you think about this idea. Is it suitable for
> this project? Can I write а proposal based on it?
>
> Best wishes, Alisa.
>
> [1] https://goo.gl/BaCCko
> [2]
> https://www.freedesktop.org/software/systemd/man/systemd-journal-remote.html
> [3]
> https://www.freedesktop.org/software/systemd/man/systemd-journal-upload.html
> [4] https://www.freedesktop.org/wiki/Software/systemd/export/
> [5] https://goo.gl/8euAAM
>


- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJY1FqrAAoJENuP0xzK19csuccH/1UkNtYO340BtK0anqIDxWpu
Dpz5BrvEVdGvHNa6WPJWRo3nz3yLsWYXNZ40O/J/dyFEQBSQQA2UGfcd9IAF2VYZ
OGQmyd0rGlzzI/DVh//yxxtDKXU2MZphPusHBD+pK/b2PVi4vrCH+oe5gKBQgpN1
Lqo0K7WR8VCEdll1N53NNvmiejNgYONA+p3ZbYUsUIcc+s9DELP75MC73TtVM/IB
c8UQO7bhgieVzZeAa6sFoFqj/qGf1BMUpfAwmZI9DwLEasstrOaMsdC99wOD1lc/
eYGxNfRRg5gL4VbLq/JrQvYwlh09fj5D1FKYOfw5dHZcHNaIocOsD1TVjGKHQyA=
=uE+K
-----END PGP SIGNATURE-----

Alisa Matsak

unread,
Mar 26, 2017, 5:06:34 PM3/26/17
to qubes-devel, arisu...@gmail.com
Hi! It's me again!

I tried to take account of all your recent comments. So, here's what I'm thinking about the project acccording to them.


> Not all the VMs are running Fedora, there are also Debian-based and some
> people use also Arch. Not to mention also Windows, but I think we can
> not care about it for now ;)
> But currently all the Linux VMs do have journald (hmm, not sure about
> Arch?).


You're absolutely right, currently all the Linux VMs do have journald (as a part of systemd).


> Using full journald export format isn't a good idea, at least for those
> reasons:
>  - many fields should be out of control for sending VMs - for example
>    hostname, timestamp, but probably more
>  - many fields are unnecessary (for example all __*), so lets keep the
>    attack surface as small as possible; even Lennart Poettering can't
>    write bug-free code ;)
>  - for the same reason, I'd filter out binary entries (replace
>    non-printable characters with dot, underscore or sth like this) -
>    even if journald itself handle them well, some log-viewing software
>    may not; even simple 'less' command throw a bunch of parsers on its
>    input...

> If using this format, I'd use some simplified version - filter out
> unneeded fields (most of them?) when sending, and synthesize those
> required after receiving entry in LogVM. And of course reject entries
> not conforming to this simplified specification.


If we don't use the full version of journald export format, I can't see the point of using it at all. Don't get me wrong, I only suggested to use this format for full compatibility with journald (which we would receive in LogVM). I thought it would be very useful because of the already implemented log rotation, very handy search in journal entries and the security feature of journald (which lets us know when a journal entry has been tampered with). But if we want to change structure of journald entries and leave only the "necessary" fields, we should be ready to lose all the advantages of using journald for logs storage mentioned above. I also don't think that synthesised entries can make friend with journald, this approach prevents us from using all the cool features of journald, and at the same time makes us process the transmitted data twice.


> To be honest, I think the "short" format (`journalctl --output short`),
> with timestamp and hostname stripped off is enough. So, basically just
> MESSAGE field from "export" format.


Okay, we can use the "short" format in its pure form, if I understood you correctly.

Let's take the output of this command, put it in some file and send it to LogVM via qrexec. In this case we don't even need a proxy-server on the ordinary VM's side. Of course, we would track which entries have alredy been sent, and the next time we would send only the newly generated ones (there should also be some sort of timer for establishing connections). On the side of LogVW we would use a simple proxy-server, which would be responsible for listening for connections from different VMs and receiving information from them. We can keep received logs from different VMs in different directories and rotate them independently (like delete older files, archive not very recent ones and don't touch very recent logs). In this case, we can only search logs as text files (grep), and we don't get the security features of journald (I'm talking about sequential hashing)

So, based on your remarks, if I understood you right, here's what I suggest. Please let me know what you think about it.

Love,
Alisa.

Jean-Philippe Ouellet

unread,
Mar 26, 2017, 11:08:14 PM3/26/17
to Alisa Matsak, qubes-devel
I didn't have time to write a short email... so I wrote a long one ;)


I think it is important to keep in mind the value we are adding to
Qubes by implementing a log subsystem:
1. integrity guarantee by immediate sending to and storage in separate VM
- inability to modify previous logs
- inability to fake timestamps
2. persistence of logs from ephemeral VMs (such as DispVMs)
3. allowing retention policies on per-origin-vm basis
4. safe viewing of logs with arbitrarily-complex log-viewing/analysis tools

#3 is important to ensure one VM does not DoS available log storage to
hide another VM's compromise. We need per-origin quotas, and user
alerting when they are reached.

#4 can be implemented by only ever handling the log contents in
DispVMs. I imagine some GUI in the LogVM which allows one to view a
list of all logs collected (time, origin, size, etc.) and easily open
them with a suitable program in a DispVM, or manage them (sort,
delete, copy to a different VM to share or archive it, etc.).

Note that these features are inherently log-format-agnostic. They are
most definitely still useful for unstructured text logs. It is also
not difficult to imagine someone wanting the same features for other
non-journald audit formats, such as Windows' EVTX format, or FreeBSD's
/ Mac OS X's OpenBSM format. With a well-designed format-agnostic
log-collection subsystem, supporting these would likely be as simple
as adding a mapping for "open this log type with this log-viewing
application" and perhaps some trivial format-specific timestamp
injection / substitution happening in the LogVM as logs are received.

Yes, adding timestamps does mean some *simple* log parsing /
processing happening in the LogVM. It it can not be done simply in an
obviously-correct manner (such as prepending a timestamp to the
beginning of each line), then IMO it should either not be done at all,
or should be done in a per-origin DispVM.

On Sun, Mar 26, 2017 at 5:06 PM, Alisa Matsak <arisu...@gmail.com> wrote:
> You're absolutely right, currently all the Linux VMs do have journald (as a
> part of systemd).

Journald may serve as a convenient building block in that there may be
a substantial ecosystem which facilitates easy integration, and it
makes sense to want to take advantage of that, but not at the cost of
excluding systems without it from also being able to take advantage of
the Qubes-encouraged logging facility.

I believe it is important to also support accepting plain unstructured
text logs as well, and I strongly encourage you to do so. Perhaps via
separate qubes.logs.JournaldExport and qubes.logs.PlainTextLines (or
similar) services.

Qubes is not really a linux distribution, and systemd (while perhaps a
good initial target) is not ubiquitous among all guest systems we are
interested in, so IMO we should avoid designing systems that
inherently rely on functionality exclusive to it where possible.

>> If using this format, I'd use some simplified version - filter out
>> unneeded fields (most of them?) when sending, and synthesize those
>> required after receiving entry in LogVM. And of course reject entries
>> not conforming to this simplified specification.
>
> If we don't use the full version of journald export format, I can't see the
> point of using it at all. Don't get me wrong, I only suggested to use this
> format for full compatibility with journald (which we would receive in
> LogVM). I thought it would be very useful because of the already implemented
> log rotation, very handy search in journal entries and the security feature
> of journald (which lets us know when a journal entry has been tampered
> with).

I do not think the hash chain is a meaningful security feature in this
context. If a VM is compromised, the adversary who wishes to modify a
log could still do so and recalculate all hashes afterwards (and
accordingly modify state of the process creating more logs to make
future logs appear fine too). The security comes from having the logs
stored in a separate VM, to which the only interface is sending logs,
and of which we should avoid any complex parsing or analysis in order
to avoid the log-collection VM getting compromised.

In this manner, we can rely on the isolation between Qubes domains to
guarantee authenticity and integrity without increasing the complexity
of the log format or requiring special tools to view them (which was
one of the major complaints against systemd-journald in its early
days).

If you want log hash chains to be secure, you need the hashes to be
regularly incorporated into the hash chains of other machines (see
sections 4.4 & 4.5 of Schneier & Kelsey's 1999 paper [1] on this),
which... if we need cross-domain communication to guarantee resistance
to undetected retroactive modification in the first place, we may as
well just send the logs themselves over an existing secure channel
(vchan/qrexec) to a secure destination (the LogVM) immediately as
they're created and store them there.

[1]: https://www.schneier.com/academic/paperfiles/paper-auditlogs.pdf

> But if we want to change structure of journald entries and leave only
> the "necessary" fields, we should be ready to lose all the advantages of
> using journald for logs storage mentioned above.

> I also don't think that synthesised entries can make friend with journald,

I don't understand what you mean by this. Care to elaborate?

> this approach prevents us
> from using all the cool features of journald, and at the same time makes us
> process the transmitted data twice.
>
>> To be honest, I think the "short" format (`journalctl --output short`),
>> with timestamp and hostname stripped off is enough. So, basically just
>> MESSAGE field from "export" format.
>
> Okay, we can use the "short" format in its pure form, if I understood you
> correctly.

Only MESSAGE is definitely too minimal IMO. It is extremely useful to
filter by what the message came from, for which _PID, _UID, _COMM,
_CMDLINE, etc. fields are commonly used (even if just by simple grep
of standard text format).

The fact that these can be provided in a somewhat-harder-to-spoof
manner than traditional syslog is nice, but does not somehow make them
trustworthy. However, just because they are not trustworthy does not
mean they are not still useful.

> Let's take the output of this command, put it in some file and send it to
> LogVM via qrexec. In this case we don't even need a proxy-server on the
> ordinary VM's side. Of course, we would track which entries have alredy been
> sent, and the next time we would send only the newly generated ones (there
> should also be some sort of timer for establishing connections). On the side
> of LogVW we would use a simple proxy-server, which would be responsible for
> listening for connections from different VMs and receiving information from
> them. We can keep received logs from different VMs in different directories
> and rotate them independently (like delete older files, archive not very
> recent ones and don't touch very recent logs). In this case, we can only
> search logs as text files (grep), and we don't get the security features of
> journald (I'm talking about sequential hashing)

If by "proxy server" you mean anything involving a network stack... I
really doubt you need that. There shouldn't even need to be any
sockets at all involved on the receiving-side code. The simplest case
(plain text format) would be a qrexec service consisting of something
like:
#!/bin/sh
set -e
d=$HOME/QubesLogs/$QREXEC_REMOTE_DOMAIN
mkdir -p "$d"
while read line; do
printf "%s\t%s\n" "$(date)" "$line"
done >> "$d/$(date +%s).log"

You can almost kind of think of qrexec services as CGI scripts. The
listening and multiplexing and such normally handled by the webserver
is handled by qrexec-agent.

Regards,
Jean-Philippe

Alisa Matsak

unread,
Mar 30, 2017, 12:22:04 PM3/30/17
to qubes-devel, arisu...@gmail.com

Hello again!


> I didn't have time to write a short email... so I wrote a long one ;)


First of all I want to thank you for your last mail. It helped me better understand the project’s problems.


> I don't understand what you mean by this. Care to elaborate?


I meant that this way transmitted data can’t be processed by journald as its own entries. But it isn’t required point (even unwanted point), so I think it doesn’t matter.



Based on the requirements to the LogVM project (thanks your letter again!), I wrote my draft proposal. I ask you to read it before I apply and point out my mistakes.


___



# Introduction

Qubes OS is a reasonably secure​ operating system. Qubes takes an approach called security by compartmentalization, which allows to compartmentalize the various parts of someone’s digital life into securely isolated compartments. This approach ensures that one compartment getting compromised won’t affect the others.


It is an amazing idea with a pretty implementation but Qubes currently lacks a way to securely store and retrieve logs. There is no way to conveniently inspect logs from apps and services running across several virtual machines. This project aims to create an effective, robust and security-focused log system for Qubes OS.

# Project goals


Priority goals of the project are following:

  • Implement a log collection system that is itself working in its own separate VM.

  • That system can receive logs from multiple logging systems, such as journald and syslog, for example. Additional bindings can be implemented if needed, including support for non-Unix guest operating systems.

  • The log collection system is designed with security in mind.

  • The system guarantees log integrity, including inability to modify previous logs and fake timestamps.

  • The logs are persistent, despite coming from possibly ephemeral VMs such as DispVM.

  • The system supports automatic log rotation with per-VM quotas to prevent DDoS attacks.

  • The system is extensively documented and tested from top to bottom.


If time permits, some GUI can be implemented in the LogVM designed specially for viewing a list of all collected logs and easily opening them with a suitable program in a DispVM or otherwise managing them.


# Implementation

The aforementioned system is implemented in two parts. One of them runs on AppVMs and the other one on the LogVM (that runs a Unix-like operating system). A communication between them is going through vchan/qrexec channel, which already exists as a part of Qubes OS and is well-protected. Let’s discuss the parts of the system separately in more detail.


The part on the AppVM side.


This part includes a daemon (named log-exporter) that retrieves logs in real time (in simple text format with necessary fields included, such as the log message, hostname, timestamp, PID and so on; the exact format can be defined later) from a logging system (such as journald for Linux guest systems). Data parsing is also implemented here (to define format of log entries and separate one from another in specific way). The daemon is started during the process of guest system boot-up. It is also responsible for creating a connection to the LogVM via vchan/qrexec and sending collected logs over it. It can be written as a bash script or a C program.


The part on the LogVM side.


The remaining part of the system is not tied to any specific log format in any way (so it can be said to be log-format-agnostic). The only thing it knows about transmitted data is its text nature (successive lines of text).


The part on the LogVM side consists of the following:

  • A program (named log-collector) that receives logs sent via the vchan/qrexec connection and saves them to a text file (with .log extension, for example). An instance of this program is spawned automatically each time an AppVM connects to the LogVM to transmit its logs. It’s important to notice that log files received from two different VMs are saved to a separate directories by this tool (it’s like any VM has the right to a separate directory, even DispVM). This tool is also responsible for prepending timestamps to log entries (this can be achieved with only very simple parsing to split lines). Can be implemented as a bash script.

  • A daemon (named log-compressor) that tracks the size of those directories and is responsible for intelligent and secure log rotation. This tool compresses medium-aged files to .zip, .gz, etc, deletes not the old ones and doesn't touch recent ones. The daemon is started during the process of the LogVM boot-up and works till it's shut down. Can be implemented as a C program (because there is a very useful C library called libzip which is so suitable for this daemon’s implementation).


# Timeline

Frankly speaking, I have exams at my University until the end of June. I hope to pass most of them ahead of time, so I think I'll have enough free time to work on the project. But keeping this in mind, I find it more reasonable not to plan any time-consuming tasks for June. So, a timeline would be similar to something like this:


June - working with the Linux AppVM side:


  • May 30th - June 13th (two weeks)

Reading more about daemon programs in Linux to upgrade my knowledge.

Determining fields of log entries which reasonably should be collected and the exact way of getting them from log-collecting system.

Working with my knowledge in the data parsing.


  • June 14th - June 27th (two weeks)

Applying the new knowledge and writing the log-exporter daemon for the Linux AppVM side.


July - working with the LogVM side:


  • June 28th - July 8th (one and a half weeks)

Determining requirements to the LogVM as to a system and solving related problems.

Upgrading my knowledge in bash scripting and writing the log-collector program in the right way.


  • July 10th - July 31th (three weeks)

Writing the log-compressor daemon for the LogVM side.

   

August - working with a documentation, unforeseen circumstances and final evaluations:


  • August 1th - August 20th (two weeks)

Documentating the written project and dealing with unforeseen circumstances.

   

  • August 21st - August 29th (one week)

Final code submission and final evaluations.


I plan to test the written components separately and the system in general all the time during the work, that’s why I don’t allocate any special time for testing.


I think a weekly formal posting to the qubes-devel mailing list is suitable for me. It will include information about my current progress and difficulties I will be facing.


I don’t plan any full or part-time time jobs during the summer. Maybe there will be part-time jobs on weekends, but I’m not yet sure about it. In July or August there can be a short (about a week) family trip. I’ll have access to the Internet there in any case and will be available for communication all the time.


Qubes is the only project I am submitting a proposal for.


# About me

I’m finishing up my third year at Moscow State University (Faculty of Computational Mathematics and Cybernetics). I’m in the Laboratory of Information Systems Security on the basis of Information Systems in Education and Research Laboratory. So my education is directly related to security and developing protection methods.


I am not an experienced developer and had never worked on such a large project. But everybody took their first steps someday, right? Working with Qubes would be an excellent experience for me and a chance to upgrade my knowledge and skills. It'll also arguably be the most useful I ever spent my time on in my entire life till now.


# Contact information


Email: arisu...@gmail.com

Tel: +7 (916) 414-62-66

Timezone: UTC+03 MSK

___


I am looking forward to your answer. Thank you!


Regards,

Alisa.

Jean-Philippe Ouellet

unread,
Mar 31, 2017, 12:32:26 AM3/31/17
to Alisa Matsak, qubes-devel
On Thu, Mar 30, 2017 at 12:22 PM, Alisa Matsak <arisu...@gmail.com> wrote:
> The system supports automatic log rotation with per-VM quotas to prevent
> DDoS attacks.

Not DDoS. Just DoS.

Also, some careful thought needs to be put into log rotation. It is
desirable to prevent the case where a VM gets attacked in some manner
which produces logs showing what happened and how, and then proceeds
to fill up the logs with garbage. It is not desirable for rotation to
be implemented in such a manner that the attacker can reliably cause
the logs detailing their attack to be discarded.

There is of course the fundamental problem of logging generating data
and storage being finite, so something must be discarded. So in some
way the above may be unavoidable (keep-oldest policy results in
attackers filling logs before attacking, keep-newest policy results in
attackers filling logs after attacking). Some user interaction here
may perhaps be the best option. This requires more thought.

Regardless, there is still plenty to do without getting hung up on this issue.

> If time permits, some GUI can be implemented in the LogVM designed specially
> for viewing a list of all collected logs and easily opening them with a
> suitable program in a DispVM or otherwise managing them.

I think there should be sufficient time. Just sending & receiving logs
is really not a multi-month task, and the GUI itself can be really
quite simple.

> This part includes a daemon (named log-exporter) that retrieves logs in real
> time ... It can be written as a bash script or a C program.

The majority of the Qubes code base is in Python. We prefer Python
over C for safety reasons, and Python over bash for portability
reasons. I would recommend Python be used for this as well.

How familiar are you with Python?

> The part on the LogVM side.
>
> [...]
>
> A program (named log-collector) that receives logs sent via the vchan/qrexec
> ... Can be implemented as a bash script.

Technically it certainly could, but Python is preferred. See note above, and
https://www.qubes-os.org/doc/coding-style/#bash-specific-guidelines

> A daemon (named log-compressor) that tracks the size of those directories
> and is responsible for intelligent and secure log rotation. This tool
> compresses medium-aged files to .zip, .gz, etc, deletes not the old ones and
> doesn't touch recent ones. The daemon is started during the process of the
> LogVM boot-up and works till it's shut down. Can be implemented as a C
> program (because there is a very useful C library called libzip which is so
> suitable for this daemon’s implementation).

I'd recommend against zip as it is effectively a 2nd-class citizen on
unix systems. gzip (/zlib) and xzip (/lzma) are much more common,
especially for logs.

Python has libraries for everything too.


> # Timeline

Your proposed timeline looks somewhat sparse to be honest. I encourage
you to be more ambitious ;)

> June - working with the Linux AppVM side:
> May 30th - June 13th (two weeks)
> Reading more about daemon programs in Linux to upgrade my knowledge.

Daemons are not really any different from regular programs.
Traditionally you'd fork twice & kill the middle process (known as
backgrounding the process), but these days it's more common to not
bother with that and let systemd handle all process management for
you.

> Determining fields of log entries which reasonably should be collected and
> the exact way of getting them from log-collecting system.

Sounds good.

> Working with my knowledge in the data parsing.

Not sure what this means.

> June 14th - June 27th (two weeks)
> Applying the new knowledge and writing the log-exporter daemon for the Linux
> AppVM side.

> July - working with the LogVM side:
> June 28th - July 8th (one and a half weeks)
> Determining requirements to the LogVM as to a system and solving related
> problems.

I think the way we'd want to implement the LogVM itself (as opposed to
the services inside it) would be to describe it via Salt. [1][2][3]

[1]: https://www.qubes-os.org/news/2015/12/14/mgmt-stack/
[2]: https://www.qubes-os.org/doc/salt/
[3]: https://docs.saltstack.com/en/latest/topics/tutorials/walkthrough.html

Salt has a non-negligible learning curve though, so if you'd prefer to
just work on the actual log handling and let someone else (most likely
me) integrate it and do automatic creation of an actual LogVM by the
installer, etc. I think that'd be fine.

> Upgrading my knowledge in bash scripting and writing the log-collector
> program in the right way.

> July 10th - July 31th (three weeks)
> Writing the log-compressor daemon for the LogVM side.

I think you over-estimate the complexity & difficulty here. I'm not
sure exactly why we need an independent daemon for compressing logs as
opposed to just compressing them as they are received. We should also
be careful to avoid reinventing the wheel [4].

[4]: http://www.linuxcommand.org/man_pages/logrotate8.html

> August - working with a documentation, unforeseen circumstances and final
> evaluations:
>
> August 1th - August 20th (two weeks)
> Documentating the written project and dealing with unforeseen circumstances.

Er... 1st to 20th is closer to 3 weeks. That's 3 weeks to write
documentation? I think even two weeks is way more time than necessary.

Dealing with unforeseen circumstances is implicitly assumed. I think
you should aim to do more in this period (GUI perhaps?). If prior work
turns out to be more difficult and things get pushed back, so be it.
As long as you are making reasonable progress and maintain good
communication, then you should pass the evaluations. Your plan as-is
looks like you won't have much to do during the last month.

> August 21st - August 29th (one week)
> Final code submission and final evaluations.

> I plan to test the written components separately and the system in general
> all the time during the work, that’s why I don’t allocate any special time
> for testing.

Does this mean you intend to write unit tests, etc. as you go? Or just
manual testing?

> Frankly speaking, I have exams at my University until the end of June. I
> hope to pass most of them ahead of time, so I think I'll have enough free
> time to work on the project. But keeping this in mind, I find it more
> reasonable not to plan any time-consuming tasks for June. So, a timeline
> would be similar to something like this:

All of June potentially gone sounds like it would make it difficult to
make sufficient progress by the first midterm evaluation. You need to
have a clear plan for how you will make it work.

To quote the GSoC FAQ: [5]

How much time does GSoC participation take?

You are expected to spend around 30+ hours a week working on your project
during the 3 month coding period. If you already have an internship,
another summer job, or plan to be gone on vacation for more than a week
during that time, GSoC is not the right program for you this year.

and the GSoC mentor manual: [6]

# Warning Signs

Not enough hours in the day: If your student has a full-time job or is
attempting to defend a graduate thesis during the summer, that is probably
going to not work. Even though your student thinks they will have enough
extra time, don't believe them. They won't. If your student has every
single minute of every day completely booked, any unexpected event,
such as getting sick or a family emergency, derails this plan beyond
repair. If your student cannot commit to a specified time schedule, this
is an immediate red flag that they need serious help with time management.

I'm not bringing this up to discourage you or to say it can't be done,
but to be honest it is somewhat concerning. Some more information
about how you plan to handle both would be most welcome.

[5]: https://developers.google.com/open-source/gsoc/faq#how_much_time_does_gsoc_participation_take
[6]: http://write.flossmanuals.net/gsoc-mentoring/warning-signs/


Regards,
Jean-Philippe

Alisa Matsak

unread,
Apr 1, 2017, 2:15:02 PM4/1/17
to qubes-devel, arisu...@gmail.com

Hello!


Thanks for your feedback! I really appreciate your taking time to answer me.


According to your notes and recommendations, I changed the timeline part of my proposal. It actually looks like this:

___


# Timeline

Frankly speaking, I have exams at my University until the end of June. I hope to pass most of them ahead of time; at best, I'll have only one exam (or two, in a worse case), so I think I'll have enough free time to work on the project. So, a timeline would be similar to something like this:


June - working with the Linux AppVM side:


  • May 30th - June 11th (one and a half weeks)

Reading more about systemd.

Determining fields of log entries which reasonably should be collected and the exact way of getting them from log-collecting system.

Finding the proper way to extract the chosen fields from the log collecting system and formatting them to form the desired output.


  • June 12th - June 25th (two weeks)

Applying the new knowledge and writing the log-exporter daemon for the Linux AppVM side in the right way.


Rest part of June, July and the beginning of August - working with the LogVM side:


  • June 26th - July 9th (two weeks)

Finding the proper way to prepend timestamps and writing the log-collector program.


  • July 10th - July 23th (two weeks)

Writing the log-compressor daemon (or configuring already existed tools) for the LogVM side (hope, I overestimate the complexity here and can start working on the GUI earlier).


  • July 24th - August 6th (two weeks)

Learning toolkits like Qt or GTK and working on the GUI implementation.

   

Rest part of August - working with a documentation, unforeseen circumstances (like I wouldn’t finish the GUI till this time) and final evaluations:


  • August 7th - August 20th (two weeks)

    Documenting the written project and dealing with unforeseen circumstances.

   

  • August 21st - August 29th (one week)

Final code submission and final evaluations.


I plan to test the written components separately and the system in general all the time during the work, that’s why I don’t allocate any special time for testing.


I think a weekly formal posting to the qubes-devel mailing list is suitable for me. It will include information about my current progress and difficulties I will be facing.


I don’t plan any full or part-time time jobs during the summer. Maybe there will be part-time jobs on weekends, but I’m not yet sure about it. In July or August there can be a short (about a week) family trip. I’ll have access to the Internet there in any case and will be available for communication all the time.


Qubes is the only project I am submitting a proposal for.

___


> I think there should be sufficient time. Just sending & receiving logs

> is really not a multi-month task, and the GUI itself can be really

> quite simple.


I added the GUI point to the Project Goals part and to the Implementation part of my proposal as:

___


# Project goals


[...]


  • Implement some convenient GUI designed specially for the log collecting system.

___


# Implement


[...]


Also, there is the GUI for log collecting system on the LogVM side. It allows to view a list of all collected logs sorted by their receiving time and see some of log attributes (such as their origin, size, etc.). At the same time it permits to open them with a suitable program in a DispVM. It can be implemented in Python using toolkits like Qt or GTK.


As soon as this minimal functionality will be implemented, more capability can be added. For example, it can be managing logs like sorting, deleting, copying them or so on.

___


> Salt has a non-negligible learning curve though, so if you'd prefer to

> just work on the actual log handling and let someone else (most likely

> me) integrate it and do automatic creation of an actual LogVM by the

> installer, etc. I think that'd be fine.


Thank you, that would make it a lot easier. To be honest, it was the point frightened and concerning me.


> The majority of the Qubes code base is in Python. We prefer Python

> over C for safety reasons, and Python over bash for portability

> reasons. I would recommend Python be used for this as well.


> How familiar are you with Python?


I had an experience with Python. There were mostly easy tasks for my University courses, nothing the code quality of the OS depended on. :) I understand that it isn’t the best recommendation for me, but I promise to get more familiar with Python before the summer begins.


In this way, given the benefits of Python you mentioned (and because Python has libraries for everything), I can work on project using Python.


> Your proposed timeline looks somewhat sparse to be honest. I encourage

> you to be more ambitious ;)


It looked this way because I tried to be pessimistic when I thought about it. I believe it’s better than to overestimate myself and miss deadlines. I hope it’s more ambitious now (but not entirely, I still want to be so productive as I promise or more :) ).


> Does this mean you intend to write unit tests, etc. as you go? Or just
> manual testing?


Yes, I intend to write unit test for everything as I go.


> All of June potentially gone sounds like it would make it difficult to

> make sufficient progress by the first midterm evaluation. You need to

> have a clear plan for how you will make it work.


> To quote the GSoC FAQ: [5]


> [...]


> and the GSoC mentor manual: [6]


> [...]


> I'm not bringing this up to discourage you or to say it can't be done,

> but to be honest it is somewhat concerning. Some more information

> about how you plan to handle both would be most welcome.


I know other countries have a different appraisal system for students and the last ones are absolutely free by June but by my country is not among them. This is the reality I have to deal with.


I had read all of these documents before I decided to participate in the GSoC. I am fully aware of the responsibility in relation to the GSoC too.


My situation is not as severe as the one in the mentor recommendation. At best, I'll have only one exam (or two, in a worse case). In any case, I can handle those and the GSoC at the same time and both of them won’t suffer (or the GSoC will have to prevail).


So, you shouldn’t be concerned about it.



Here is my entire proposal in Google Docs. I decided not to post it here again because this mail is already too long. You are welcome to comment if it still has places that need to be edited.


Thank you again.


Best wishes,

Alisa.

Reply all
Reply to author
Forward
0 new messages