On Sun, Mar 12, 2017 at 3:28 PM, Alisa Matsak <arisu...@gmail.com> wrote:
> Okay, I got it)
> But let me clarify, do you mean working directly on project or just trying
> to fix bugs and asking questions?
> Please feel free to ask any questions you may have!
Hello again!
> I didn't have time to write a short email... so I wrote a long one ;)
First of all I want to thank you for your last mail. It helped me better understand the project’s problems.
> I don't understand what you mean by this. Care to elaborate?
I meant that this way transmitted data can’t be processed by journald as its own entries. But it isn’t required point (even unwanted point), so I think it doesn’t matter.
Based on the requirements to the LogVM project (thanks your letter again!), I wrote my draft proposal. I ask you to read it before I apply and point out my mistakes.
___
# Introduction
Qubes
OS is a reasonably secure operating system. Qubes takes an approach
called security by compartmentalization, which allows to
compartmentalize the various parts of someone’s digital life into
securely isolated compartments. This approach ensures that one
compartment getting compromised won’t affect the others.
It
is an amazing idea with a pretty implementation but Qubes currently
lacks a way to securely store and retrieve logs. There is no way to
conveniently inspect logs from apps and services running across several
virtual machines. This project aims to create an effective, robust and
security-focused log system for Qubes OS.
# Project goals
Priority goals of the project are following:
Implement a log collection system that is itself working in its own separate VM.
That system can receive logs from multiple logging systems, such as journald and syslog, for example. Additional bindings can be implemented if needed, including support for non-Unix guest operating systems.
The log collection system is designed with security in mind.
The system guarantees log integrity, including inability to modify previous logs and fake timestamps.
The logs are persistent, despite coming from possibly ephemeral VMs such as DispVM.
The system supports automatic log rotation with per-VM quotas to prevent DDoS attacks.
The system is extensively documented and tested from top to bottom.
If time permits, some GUI can be implemented in the LogVM designed specially for viewing a list of all collected logs and easily opening them with a suitable program in a DispVM or otherwise managing them.
# Implementation
The aforementioned system is implemented in two parts. One of them runs on AppVMs and the other one on the LogVM (that runs a Unix-like operating system). A communication between them is going through vchan/qrexec channel, which already exists as a part of Qubes OS and is well-protected. Let’s discuss the parts of the system separately in more detail.
The part on the AppVM side.
This part includes a daemon (named log-exporter) that retrieves logs in real time (in simple text format with necessary fields included, such as the log message, hostname, timestamp, PID and so on; the exact format can be defined later) from a logging system (such as journald for Linux guest systems). Data parsing is also implemented here (to define format of log entries and separate one from another in specific way). The daemon is started during the process of guest system boot-up. It is also responsible for creating a connection to the LogVM via vchan/qrexec and sending collected logs over it. It can be written as a bash script or a C program.
The part on the LogVM side.
The remaining part of the system is not tied to any specific log format in any way (so it can be said to be log-format-agnostic). The only thing it knows about transmitted data is its text nature (successive lines of text).
The part on the LogVM side consists of the following:
A program (named log-collector) that receives logs sent via the vchan/qrexec connection and saves them to a text file (with .log extension, for example). An instance of this program is spawned automatically each time an AppVM connects to the LogVM to transmit its logs. It’s important to notice that log files received from two different VMs are saved to a separate directories by this tool (it’s like any VM has the right to a separate directory, even DispVM). This tool is also responsible for prepending timestamps to log entries (this can be achieved with only very simple parsing to split lines). Can be implemented as a bash script.
A daemon (named log-compressor) that tracks the size of those directories and is responsible for intelligent and secure log rotation. This tool compresses medium-aged files to .zip, .gz, etc, deletes not the old ones and doesn't touch recent ones. The daemon is started during the process of the LogVM boot-up and works till it's shut down. Can be implemented as a C program (because there is a very useful C library called libzip which is so suitable for this daemon’s implementation).
# Timeline
Frankly
speaking, I have exams at my University until the end of June. I hope
to pass most of them ahead of time, so I think I'll have enough free
time to work on the project. But keeping this in mind, I find it more
reasonable not to plan any time-consuming tasks for June. So, a timeline
would be similar to something like this:
June - working with the Linux AppVM side:
May 30th - June 13th (two weeks)
Reading more about daemon programs in Linux to upgrade my knowledge.
Determining fields of log entries which reasonably should be collected and the exact way of getting them from log-collecting system.
Working with my knowledge in the data parsing.
June 14th - June 27th (two weeks)
Applying the new knowledge and writing the log-exporter daemon for the Linux AppVM side.
July - working with the LogVM side:
June 28th - July 8th (one and a half weeks)
Determining requirements to the LogVM as to a system and solving related problems.
Upgrading my knowledge in bash scripting and writing the log-collector program in the right way.
July 10th - July 31th (three weeks)
Writing the log-compressor daemon for the LogVM side.
August - working with a documentation, unforeseen circumstances and final evaluations:
August 1th - August 20th (two weeks)
Documentating the written project and dealing with unforeseen circumstances.
August 21st - August 29th (one week)
Final code submission and final evaluations.
I plan to test the written components separately and the system in general all the time during the work, that’s why I don’t allocate any special time for testing.
I think a weekly formal posting to the qubes-devel mailing list is suitable for me. It will include information about my current progress and difficulties I will be facing.
I don’t plan any full or part-time time jobs during the summer. Maybe there will be part-time jobs on weekends, but I’m not yet sure about it. In July or August there can be a short (about a week) family trip. I’ll have access to the Internet there in any case and will be available for communication all the time.
Qubes is the only project I am submitting a proposal for.
# About me
I’m
finishing up my third year at Moscow State University (Faculty of
Computational Mathematics and Cybernetics). I’m in the Laboratory of
Information Systems Security on the basis of Information Systems in
Education and Research Laboratory. So my education is directly related
to security and developing protection methods.
I am not an experienced developer and had never worked on such a large project. But everybody took their first steps someday, right? Working with Qubes would be an excellent experience for me and a chance to upgrade my knowledge and skills. It'll also arguably be the most useful I ever spent my time on in my entire life till now.
# Contact information
Email: arisu...@gmail.com
Tel: +7 (916) 414-62-66
Timezone: UTC+03 MSKI am looking forward to your answer. Thank you!
Regards,
Alisa.
Hello!
Thanks for your feedback! I really appreciate your taking time to answer me.
According to your notes and recommendations, I changed the timeline part of my proposal. It actually looks like this:
___
# Timeline
Frankly speaking, I have exams at my University until the end of June. I hope to pass most of them ahead of time; at best, I'll have only one exam (or two, in a worse case), so I think I'll have enough free time to work on the project. So, a timeline would be similar to something like this:
June - working with the Linux AppVM side:
May 30th - June 11th (one and a half weeks)
Reading more about systemd.
Determining fields of log entries which reasonably should be collected and the exact way of getting them from log-collecting system.
Finding the proper way to extract the chosen fields from the log collecting system and formatting them to form the desired output.
June 12th - June 25th (two weeks)
Applying the new knowledge and writing the log-exporter daemon for the Linux AppVM side in the right way.
Rest part of June, July and the beginning of August - working with the LogVM side:
June 26th - July 9th (two weeks)
Finding the proper way to prepend timestamps and writing the log-collector program.
July 10th - July 23th (two weeks)
Writing the log-compressor daemon (or configuring already existed tools) for the LogVM side (hope, I overestimate the complexity here and can start working on the GUI earlier).
July 24th - August 6th (two weeks)
Learning toolkits like Qt or GTK and working on the GUI implementation.
Rest part of August - working with a documentation, unforeseen circumstances (like I wouldn’t finish the GUI till this time) and final evaluations:
August 7th - August 20th (two weeks)
Documenting the written project and dealing with unforeseen circumstances.
August 21st - August 29th (one week)
Final code submission and final evaluations.
I plan to test the written components separately and the system in general all the time during the work, that’s why I don’t allocate any special time for testing.
I think a weekly formal posting to the qubes-devel mailing list is suitable for me. It will include information about my current progress and difficulties I will be facing.
I don’t plan any full or part-time time jobs during the summer. Maybe there will be part-time jobs on weekends, but I’m not yet sure about it. In July or August there can be a short (about a week) family trip. I’ll have access to the Internet there in any case and will be available for communication all the time.
Qubes is the only project I am submitting a proposal for.
___
> I think there should be sufficient time. Just sending & receiving logs
> is really not a multi-month task, and the GUI itself can be really
> quite simple.
I added the GUI point to the Project Goals part and to the Implementation part of my proposal as:
___
# Project goals
[...]
Implement some convenient GUI designed specially for the log collecting system.
___
# Implement
[...]
Also, there is the GUI for log collecting system on the LogVM side. It allows to view a list of all collected logs sorted by their receiving time and see some of log attributes (such as their origin, size, etc.). At the same time it permits to open them with a suitable program in a DispVM. It can be implemented in Python using toolkits like Qt or GTK.
As soon as this minimal functionality will be implemented, more capability can be added. For example, it can be managing logs like sorting, deleting, copying them or so on.
___
> Salt has a non-negligible learning curve though, so if you'd prefer to
> just work on the actual log handling and let someone else (most likely
> me) integrate it and do automatic creation of an actual LogVM by the
> installer, etc. I think that'd be fine.
Thank you, that would make it a lot easier. To be honest, it was the point frightened and concerning me.
> The majority of the Qubes code base is in Python. We prefer Python
> over C for safety reasons, and Python over bash for portability
> reasons. I would recommend Python be used for this as well.
> How familiar are you with Python?
I had an experience with Python. There were mostly easy tasks for my University courses, nothing the code quality of the OS depended on. :) I understand that it isn’t the best recommendation for me, but I promise to get more familiar with Python before the summer begins.
In this way, given the benefits of Python you mentioned (and because Python has libraries for everything), I can work on project using Python.
> Your proposed timeline looks somewhat sparse to be honest. I encourage
> you to be more ambitious ;)
It looked this way because I tried to be pessimistic when I thought about it. I believe it’s better than to overestimate myself and miss deadlines. I hope it’s more ambitious now (but not entirely, I still want to be so productive as I promise or more :) ).
> Does this mean you intend to write unit tests, etc. as you go? Or just
> manual testing?
Yes, I intend to write unit test for everything as I go.
> All of June potentially gone sounds like it would make it difficult to
> make sufficient progress by the first midterm evaluation. You need to
> have a clear plan for how you will make it work.
> To quote the GSoC FAQ: [5]
> [...]
> and the GSoC mentor manual: [6]
> [...]
> I'm not bringing this up to discourage you or to say it can't be done,
> but to be honest it is somewhat concerning. Some more information
> about how you plan to handle both would be most welcome.
I know other countries have a different appraisal system for students and the last ones are absolutely free by June but by my country is not among them. This is the reality I have to deal with.
I had read all of these documents before I decided to participate in the GSoC. I am fully aware of the responsibility in relation to the GSoC too.
My situation is not as severe as the one in the mentor recommendation. At best, I'll have only one exam (or two, in a worse case). In any case, I can handle those and the GSoC at the same time and both of them won’t suffer (or the GSoC will have to prevail).
So, you shouldn’t be concerned about it.
Here is my entire proposal in Google Docs. I decided not to post it here again because this mail is already too long. You are welcome to comment if it still has places that need to be edited.
Thank you again.
Best wishes,
Alisa.