Added:
wiki/Workshop20081114.wiki
Log:
Copied from Workshop20081017
Added: wiki/Workshop20081114.wiki
==============================================================================
--- (empty file)
+++ wiki/Workshop20081114.wiki Tue Nov 11 15:48:49 2008
@@ -0,0 +1,1259 @@
+#summary ITEC IT-SOFT 2008 Workshop Paper (2008.11.14)
+#labels Research,Paper
+
+|| *DOWNLOADS* ||
[http://lemona.googlecode.com/svn/docs/publications/papers/IT-SOFT%202008/%5bITEC810%5d%20-%20Lemona%20Workshop%20Paper%20-%20IT-SOFT%202008.pdf
PDF] ||
[http://lemona.googlecode.com/svn/docs/publications/papers/IT-SOFT%202008/%5bITEC810%5d%20-%20Lemona%20Workshop%20Paper%20-%20IT-SOFT%202008.doc
DOC] ||
[http://lemona.googlecode.com/svn/docs/publications/papers/IT-SOFT%202008/%5bITEC810%5d%20-%20Lemona%20Workshop%20Paper%20-%20IT-SOFT%202008.docx
DOCX] ||
+
+= ITEC IT-SOFT 2008 Workshop Paper =
+
+_Lemona: Towards an Open Architecture for Decentralized Forensics Analysis_
+
+---------------------------------------------------------------------------
+
+Computer forensics tools provide capabilities to identify the
+circumstances, causes and effects of criminal activities or accidental
+damage to computer systems. We present here a brief overview of the
+general problems of computer forensics as well as the existing
+research in this field. We then introduce Lemona, our proposal for a
+forensics architecture relying on open standards and implementing
+fine-grained monitoring facilities.
+
+---------------------------------------------------------------------------
+
+
+== Table of Contents ==
+
+ * *Abstract*
+ * *Introduction*
+ * *Scope*
+ * *Background*
+ * *Cyber-Criminality*
+ * *Computer Forensics*
+ * *Evidence*
+ * *Threats and Countermeasures*
+ * *Research Problem*
+ * *Outline*
+ * *Related Work*
+ * *Activity Monitoring Specifics*
+ * *Logging*
+ * *System Call Interposition*
+ * *Hooks and Checkpoints*
+ * *Software Probes*
+ * *Trace Transmission Specifics*
+ * *Reporting Architecture*
+ * *Reporting Modes*
+ * *Response Times*
+ * *Forensics Analysis Specifics*
+ * *Reconstruction and Recovery Specifics*
+ * *Imaging*
+ * *Revisions and Callbacks*
+ * *Virtualization / Sandboxing*
+ * *Reconstruction*
+ * *Lemona*
+ * *Background*
+ * *System Calls*
+ * *Scheduler*
+ * *Memory Mapping*
+ * *Architecture*
+ * *Monitoring Components*
+ * *Logging Components*
+ * *Forensics Components*
+ * *Experimentation*
+ * *Results*
+ * *Coverage*
+ * *Performance*
+ * *Conclusion*
+ * *Limitations*
+ * *Future Work*
+ * *References*
+ * *Appendices*
+
+---------------------------------------------------------------------------
+
+
+= Abstract =
+
+When incidents occur within or when attackers penetrate into
+information systems, preventive actions are no longer an appropriate
+resort to turn to. It is then vital for the targeted organizations to
+review and understand the what's, why's and how's of the threats, as
+well as their executions and their outcomes.
+
+This is usually the role of a team of computer forensics experts,
+which will go through the tedious task of investigating the digital
+crime scene. Because of the very nature of the crime, investigators
+need to collect non-disputable evidence to build a valid case.
+
+However, it is not always possible to conduct such an investigation on
+the target machine, as it may have been physically destroyed or have
+its memory units wiped out clean by skillful malevolent attackers. To
+address this problem, we turn ourselves towards post-mortem forensics
+analysis tools.
+
+
+---------------------------------------------------------------------------
+
+
+= Introduction =
+
+== Scope ==
+
+The *Lemona* project aims at providing a secure, complete and efficient
+mean of monitoring a computer system. Related papers and projects
+referenced in this document are based mostly on Linux- and UNIX-based
+systems, although most of the concepts are applicable to other
+operating systems. Thus, references made to the "_kernel_" or "system"
+should be interpreted as being Linux when not otherwise specified.
+
+
+== Background ==
+
+=== Cyber-Criminality ===
+
+Over the last decades, companies and individuals have undergone a
+shift from legacy information processing systems to computerized
+information systems. This trend largely impacts the usage and access
+policies of valuable assets. Unfortunately, the modern enterprise and
+private models of information processing are as exposed as their
+ancestors, and we recognize a matching raise of cyber-robbery and
+malevolent misuse.
+
+
+=== Computer Forensics ===
+
+Forensic science (commonly referred to as simply "forensics"), aims to
+answer questions asked by the legal system to establish the liability
+of individuals in relation to criminal acts. It naturally encompasses
+what we call computer forensics, which is a specific branch pertaining
+to legal evidence being looked for and examined in digital systems.
+
+Computer forensic science mostly aims at collecting digital evidence
+to prove the ownership and retransmission of illegal information by an
+individual, or to recreate and prove the logical course of action
+followed by an attacker while providing proof of the attacker's
+wrongdoing. In doing so, a forensics analysis determines the course of
+actions which led to the state of a digital artifact. Such an artifact
+can be either a storage media or a static or mobile piece of
+electronic information (for instance an electronic document or a
+sequence of network packets).
+
+Examples of typical case studies and illegal activities are diverse [13]:
+
+ * network intrusions;
+ * corruption of private or corporate data;
+ * stealing of private or corporate data;
+ * the publication and storage of illegal content.
+
+The role of the forensics expert, in a legal context, is to provide [2]:
+
+ * the technical information bound to the criminal act;
+ * the skills and technological background to understand its surroundings;
+ * evidence of the absence or presence of the crime.
+
+Considering this setting, it means computer forensics analysis has to
+be led following a very strict, professional and documented
+protocol. In effect, every action of the forensics team has to be
+recorded and should prove the validity of its method and its results.
+
+
+=== Evidence ===
+
+When a team of computer forensics experts conducts the analysis of a
+digital crime scene, it faces several challenges, which constitute a
+critical path to the acceptance of any potential evidence produced as
+an outcome [17]:
+
+ * Admissible
+ * Must be able to be used in court or elsewhere;
+ * Authentic
+ * Evidence relates to incident in relevant way;
+ * Complete (no tunnel vision)
+ * Exculpatory evidence for alternative suspects;
+ * Reliable
+ * No question about authenticity & veracity;
+ * Believable
+ * Clear, easy to understand, and believable by a jury.
+
+While computer forensics is not an entirely new field per se, there
+are still gray areas in its legal definition and
+application. Therefore, it is important to pay crucial attention to
+the details of the forensics analysis setting and its execution.
+
+In addition, a company who wants to protect its digital assets might
+want to reuse various components of an Information Security Management
+System or a similar framework, destined to prevent the destruction
+and/or stealth of its corporate information, and to facilitate its
+recovery, should the worst happen.
+
+Such concerns, though not addressed in this document, are in close
+relationship with its content, as some of the elements described in
+existing specifications of such frameworks are part of the general
+concepts of computer forensics we describe, or applied by the
+solutions we review.
+
+In the context of a company in the position of building a legally
+receivable case, the questions we aim to ask and answer in this paper
+are the following:
+
+ * What information is available on a common system (default
installation)?
+ * What information should be collected?
+ * How can this information be collected?
+ * How can this information be stored?
+ * How can this information be transferred?
+ * What impact will be induced on system performance by the gathering of
information?
+
+This brings us to consider the various elements of a forensics analysis
system:
+
+ * The data source/host system,
+ * The data samples themselves,
+ * The data storage system.
+
+These questions have several aspects pertaining to the collected
+information (the "data"), regarding its court wise validity and value:
+
+ * Exhaustiveness,
+ * Integrity,
+ * Confidentiality,
+ * Reliability.
+
+These values match the CIA elements (see Figure 1 -
[http://lemona.googlecode.com/svn/docs/images/security/lemona-security-concepts-cia.png
CIA Elements]).
+
+http://lemona.googlecode.com/svn/docs/images/security/lemona-security-concepts-cia.png
+
+=== Threats and Countermeasures ===
+
+==== Intrusions ====
+
+Before going further, let us quickly explain what type of intrusion
+and attacks a computer system can be exposed to and what can we do,
+from a forensics point-of-view, to detect these attacks.
+
+William Stallings defined three distinct types of Computer Intrusion [21]:
+
+===== Masquerader ====
+
+An individual who is not authorized to use the computer and who
+penetrates a system's access controls to exploit a legitimate user
+account.
+
+===== Misfeasor =====
+
+A legitimate user who accesses data, programs, or resources for
+which such access is not authorized, or who is authorized for such
+access but misuses his or her privileges.
+
+===== Clandestine User =====
+
+An individual who seizes supervisory control of the system and uses
+this control to evade auditing and access controls or to suppress
+audit collection.
+
+==== Vulnerabilities ====
+
+In 2000, Eric Knight gave an exhausting list of Computer
+Vulnerabilities [13], although time has passed they are still the same
+nowadays:
+
+===== Logic Errors =====
+
+Mistakes in the design of the software that allows a security breach.
+
+===== Social Engineering =====
+
+ * "Art of personal manipulation"
+ * "Human Element of Security" [39]
+
+===== Computer Weakness =====
+
+Very similar to vulnerabilities, the difference being that weaknesses
+might never have a resolution (e.g. encryption key size).
+
+===== Policy Oversights =====
+
+Does not necessarily involve people, can be simple "Act of God" such
+as fire or hardware failures.
+
+===== Faults =====
+
+Commonly known as bugs, or human errors.
+
+http://lemona.googlecode.com/svn/docs/images/security/lemona-security-concepts-threats.png
+
+==== Aftermath ====
+
+When one of the above vulnerabilities is successfully applied to
+perform an intrusion, computer forensics is used to [22, 2, 16]:
+
+ * assess that an intrusion has happened;
+ * determine what has been affected;
+ * examine the circumstances and causes.
+
+In order to do so, evidences can be collected from various sources
+[22, 2]:
+
+ * Memory Units:
+ * Hard Drives,
+ * Random Access Memory,
+ * External Memory Units,
+ * System Logs,
+ * Network Traffic,
+ * IDS (both NIDS, HIDS),
+ * Physical Security.
+
+However, on a default system only System Logs level would be
+available. Unfortunately, they can be easily modified by the attacker
+and are not configured to log as many data as possible by default and
+these are often not kept long enough or in a secure location. On the
+other hand, _IDS_ and Physical Security when installed and correctly
+configured can cut down forensics investigation time in half [22].
+
+
+== Research Problem ==
+
+A post-mortem forensics system shall allow investigators to examine
+targeted systems (or a legally valid reproduction) after their
+decommission or destruction. Therefore, whether the system and its
+data can still be accessed or not, and whether the attack covered or
+not the tracks of his intrusion, does not have any value.
+
+To obtain these capabilities, a system must be permanently
+scrutinized, with the lowest level of monitoring possible, so that
+each and every one of a user's activities is exhaustively logged. Such
+a level of granularity allows the forensics team to reproduce step by
+step specific time periods of the life of the now defunct system, and
+to determine how it was eventually taken down. If such a solution
+looks perfect from a theoretical standpoint, there are nonetheless
+some problems to take into consideration:
+
+ * negative performance impact on the target operating system;
+ * high network bandwidth consumption;
+ * high storage memory requirements;
+ * privacy concerns;
+ * broad-to-fine granularity adjustments;
+ * practical usability of the logs and traces.
+
+Therefore, a post-mortem forensics analysis shall be designed with
+performance and security concerns in mind. It ought to provide an
+acceptable mix of fine-grained monitoring and flexibility to produce a
+viable system, both from the users' point of view (in terms of speed
+and responsiveness) and from the forensics analysts' point of view (in
+terms of legal acceptance and recognition of the collected data as
+potential evidence).
+
+Those are the reasons for the creation of *Lemona*, which is intended to
+draft an architecture relying on open communication, encryption and
+logging standards. Following these architecture specifications, *Lemona*
+designs and implements a decentralized post-mortem forensics analysis
+system. This platform should be capable of providing fine-grained
+logging capabilities to all parts of a system and reporting it to
+secure storage points for further examination, while maintaining the
+system's usability, building upon the empirical research provided by
+similar projects.
+
+http://lemona.googlecode.com/svn/docs/images/architecture/lemona-architecture-2.png
+
+
+== Outline ==
+
+Our goal in the first part of this article is to present the current
+state of the forensics software market regarding post-mortem analysis
+tools. We will briefly review in the "Related Work" section the latest
+literature and project research that we have been basing our research
+on to confirm or infirm our results and design decisions.
+
+We will then expose this design by detailing in the "Lemona" section
+our software solution and its architecture, as well as the current
+state of its implementation. We will then expose and explain the
+results we have obtained so far.
+
+Finally, we would like to discuss the possible improvements that could
+be brought to *Lemona* and other alternatives directions we are in the
+process of experimenting with.
+
+
+= Related Work =
+
+There is an urging need for always more dynamic and autonomous
+intrusion detection systems and forensics analysis tools, capable of
+reacting in _real-time_ to attempted break-ins by securing the system or
+recording suspicious activity.
+
+Therefore, there is a flourishing mass of research projects published
+on the net, describing various approaches to the resolution of
+problems we exposed earlier, and which aim to fill in the gaps left
+open by the current security architectures already available on
+standard operating systems.
+
+We will attempt to review these approaches from a conceptual
+standpoint and provide an extensive review of their capabilities and
+efficiency, to determine how they qualify (or not) for *Lemona*.
+
+Though forensics analysis and system surveillance encompass numerous
+fields, like we outlined above, we are only going to focus on the
+elements for which we intend to conduct (at least in the first place)
+thorough research leading to the design and implementation *Lemona*'s
+framework. Therefore, we will describe below research projects and
+their approaches following four axis of study. Namely:
+
+ * Monitoring Specifics
+ * Reporting Specifics
+ * Forensics Analysis Specifics
+ * Recovery / Reconstruction Specifics
+
+== Monitoring Specifics ==
+
+Monitoring is the act of gathering information about a target system
+and its current active state(s). There are various ways, which one can
+use to monitor a system. We are going to review them below.
+
+=== Logging ===
+
+We refer as logging to the basic activity or recording a piece of
+software's activity via the duplication and storage of its obvious
+Input/Output channels and actions. Such a monitoring system is a
+common and well-known feature, available on almost every mainstream
+system nowadays.
+
+Its full range of capabilities can be quite large, as it embodies the
+simple act of an application logging its own I/O and actions to buffer
+files, either voluntarily (via the use of standard log files), or
+indirectly (via the use of standard capture frameworks, such as the
+classic syslog protocol and its syslogd daemon on UNIX systems
+[3]). The gathering and centralization of different logging
+information can in some cases lead to the realization of an _IDS_ [18].
+
+Logging features do not present much of an interest for us in the
+context of our research, as they are by all aspects an existing and
+well-defined solution to higher-level concerns, but which cannot be
+used to record a system's lifelong activity in a consistent, usable
+and exhaustive manner.
+
+They are, however, a standard and usable complement to other
+solutions, and can be used as complimentary evidence. They sometimes
+are enough to find the beginning of an intrusion scheme as shown by
+Spitzner for his HoneyPot Project [20].
+
+=== System Call Interception ===
+
+A very popular monitoring technique, which is still being heavily
+researched is the one referred to as "_System Call Interposition_" or
+"_System Call Interception_".
+
+This technique relies on the fact that during the lifetime of any
+given process or system activity of any kind, the _kernel_'s core system
+functions (the "_system calls_"), will be called upon to execute
+specific actions.
+
+It is possible to "intercept" at runtime calls to these core
+functions. This is a very common method, used for instance by debugger
+and tracers. It can be achieved in various ways; however, the level of
+granularity and reliability varies.
+
+One could for instance use standard tools implementing the POSIX
+tracing interfaces, such as the strace binary and its underlying
+ptrace() function [11]. Ptrace() allows a program to spawn a new
+process and follow its execution from start to finish, allowing the
+observer to monitor each call of the child process to the system's
+function, and to retrieve memory states. Similarly, one can attach to
+an existing ptrace() to undergo the same examination.
+
+However, ptrace() is not a reliable and viable solution in the case of
+highly-critical systems, and for strict forensics examinations.
+
+First of all ptrace() executes in userland and may be tampered with
+relatively easily, allowing one to circumvent its action. Following a
+similar approach, an attacker could also use ptrace() to its own
+advantage to de-route a tracing system using ptrace() by using strace
+himself, thus tracing the tracer and redefining its behavior at
+runtime.
+
+Finally, ptrace()'s performance might not be satisfactory. This not
+only comes from the fact that a program executing ptrace() would
+reside in userland, thus being undermined by resource-consuming
+context switches, but also that it would not have a granularity
+accurate enough to record timestamped valid system traces.
+
+The other solution, which is employed by several projects [9, 12, 23],
+is to patch the _kernel_ source or to design it with the tracing in mind
+from the beginning (the difference being mostly on the architecture
+and goals chosen as well on the post-processing tools). However, one
+need to be careful about the different problems inherent to this style
+of instrumentation as referenced in the "Traps and Pitfalls"
+publication by Garfinkel [8].
+
+=== Hooks and Checkpoints ===
+
+We call a "hook" or a "checkpoint" a deliberate modification to a
+system's _kernel_ meant to inject a static piece of software to record
+activity occurring at this specific point of the _kernel_'s execution
+flow.
+
+This approach can be used for various purposes, for instance to
+examine the performance of a system (by defining checkpoints one could
+re-use to benchmark it) or to realize system call interposition (which
+we have treated as a special case earlier).
+
+This method provide the advantage of being quite flexible when it
+comes to its penetration capabilities: it is virtually possible to
+define static hooks for any part of software, as long as one can edit
+this software and recompile it for future - traced - execution. LTTng
+chose this approach while managing to incur a low performance impact
+when the checkpoints are not enabled [5].
+
+This technique is also used by projects, which goals are to only
+monitor a subset of the _kernel_ operations. Sarmoria [19] used it to
+update the Linux Kernel Memory Management code in order to instrument
+accesses realized on Memory-Mapped files.
+
+The obvious trade-off, however is the complexity of such a
+manipulation, as it implies the tampering of software components of
+the _kernel_. Such a system has to be validated against any new update
+of the _kernel_ source tree to ensure that new code changes do not
+influence its behavior and/or stability. On the other hand, Sun with
+its Solaris OS doe not incur this penalty since the DTrace patches are
+maintained and integrated by the _kernel_ developers themselves [4].
+
+In addition, static hooks are paradoxically inflexible, as you cannot
+redefine at a user's request and at any given time the behavior of a
+given checkpoint.
+
+=== Software Probes ===
+
+Software _probes_ are in the logical continuity of static system
+_hooks_. We call a software _probe_ any piece of software that allows
+the injection at a given point of a software (the _hook_) another
+independent piece of software (the _probe_), which will commonly be
+used for monitoring purposes.
+
+We call _kernel probes_ instances of such a system that are implemented
+in kernel land and allow an observer to monitor the _kernel_'s activity.
+
+Their underlying logic is the same as the one of a system hook, except
+they allow one to reconfigure (relatively) dynamically new probes with
+different behaviors, and to hook itself up at any point of the
+execution flow, granted one knows to which address to jump.
+
+A _probe_ will simply stop the execution of a process to execute as a
+"side behavior" when a checkpoint or hook is reached, and call its
+defined handler(s). Upon completion of the _probe_'s task(s), the
+_probe_ redirects the execution flow to the exact point where it left
+off.
+
+Various systems have been experimented with, benchmarking different
+scenarios to reach decent level of granularity and verbosity for
+low-level tracing, with very little overhead. On Linux, the most
+common ones are *Kprobe* [14] and *Djprobe* [12]; the difference between
+the two being their implementations. The approach which has been
+chosen by Djprobe allow for less performance impact although in some
+case Kprobe mechanism will need to be used as a fall-back.
+
+These probing facilities has lead to some interesting projects like the
+*Systemtap* architecture [7] which based on these probes allow a more
+generic approach for generating probing points or like *Uprobes* [11]
+which duplicate the behavior of *Kprobes* for userland application.
+
+== Information Reporting Specifics ==
+
+Reporting is the act of transferring information from the source
+system (also referred to as the audited system), where we gather
+information from, to the storage system (also referred to as the watch
+system), where we store the relevant collected data and process it to
+extract information.
+
+Reporting can be done in various ways, and even though it isn't in
+itself a specific research point we are focusing on, it poses some
+interesting security issues and design constraints, which impact its
+use and the functionalities of the complete surveillance system.
+
+=== Reporting Architecture ===
+
+We distinguish three different approaches to the design of the reporting
+architecture.
+
+==== Local ====
+
+Some systems use a simple local reporting and storage system. This
+system is the simplest by design, and requires no additional hardware
+and diminishes the overhead of having to manage separate systems.
+
+However, it can also be rendered completely pointless if an attacker
+gets full control of the compromised system and tampers with the
+recorded data after they have been collected. Similarly, if the system
+is brought to a complete failure, it might not be possible to recover
+the collected data, thus rendering the whole surveillance system
+ineffective.
+
+Linux Kernel modules for instance, can use the relayfs [24] facility
+to do so, making their logs more easily accessible by userland
+software thus avoiding to bloat the logs captured by syslogd.
+
+==== Remote and Centralized ====
+
+This problem can be addressed by the use of a remote storage
+system. The reporting agents located on the source host transmit all
+the required data over a (preferably secure) communication
+channel. This is the architecture chosen by *Forensix* [9].
+
+As the collected data is being sent over the network to a separate
+host, we can assign another team to the surveillance task, thus
+separating concerns and restricting accesses. In the case of a
+complete failure of the monitored system, the data can still be
+accessed and the forensics analysis undergone post-mortem.
+
+However, a remote architecture, as opposed to a local one, also poses
+several security issues. An attacker might be able to interfere with
+such a system by:
+
+ * Intercepting the collected data to get some knowledge on the
surveillance system
+ * Penetrating and compromising the remote storage point itself
+ * Severing the connection between the remote storage point and the
monitored host before attempting to break into the later.
+
+==== Remote, Decentralized and Redundant ====
+
+A possible alternative to the previous solution is to use a
+decentralized and/or redundant architecture.
+
+By "decentralized", we mean an architecture where only partial
+information about the monitored host is being stored on a single
+storage point. Reconstructing the data could only be done by
+retrieving it from various storage points, thus adding another layer
+of anonymity (and possibly reducing load and separating concerns
+again, by allocating one storage point to network data, and another
+one to local IO, for instance).
+
+By "redundant", we mean an architecture where the data is being
+duplicated on various hosts, ensuring that it can be cross-checked for
+future use. Thus, the risk of loss of information by accidental loss
+or malevolent tampering is reduced, as the complexity increases for
+the attacker to convey his tracks.
+
+However, we have not found any reference to any project using such a
+design to this day.
+
+=== Reporting Modes ===
+
+We expose here two different concepts in the use of reporting techniques.
+
+==== Uni-Directional ====
+
+We call "uni-directional" reporting the act of only reporting
+information from one host to another. In such a situation, we are only
+in presence of a monitored source host, and a storage target host.
+
+Though the information can be processed either in _near real-time_ or
+after a given delay, in this concept the surveillance system will not
+provide feedback to the monitored source to report on recommended
+preventive or mitigating actions and counter-measures, should an
+attack attempt be detected.
+
+==== Bi-Directional ====
+
+We call "bi-directional" a setting where both the monitored source and
+the storage target communicate and interact with one another.
+
+In this setting, the surveillance system makes it possible for the
+host to react to an attack attempt or a system failure by receiving
+information from the storage point, which takes care of the defensive
+decision-taking process.
+
+Such a design allows for dynamic defenses, and opens a very broad
+range of perspective for surveillance systems, and aims to defer the
+use of forensics analysis to a last resort.
+
+Furthermore, a monitored system might be instructed to disconnect of
+the network or shut down if an aggression is being detected and cannot
+be stopped, to prevent theft of sensitive information.
+
+However, because of their complexity, it is difficult to implement
+this kind of surveillance system in a practical setting, considering
+the actual state of the hardware components, in terms of processing
+power and bandwidth.
+
+=== Response Times ===
+
+The response time of a surveillance system - be it an _IPS_, _IDS_, a
+hybrid system or a honeypot - can be sorted in one of the following
+categories.
+
+==== Delayed / Post-Mortem ====
+
+A "delayed" or "post-mortem" analysis is performed after the event
+occurred. Either the delay is so important that any relevant action
+cannot be undertaken to have any noticeable effect, or the system is
+already considered compromised, and eventually damaged beyond repair.
+
+We consider it a "delayed" response time if we are in presence of a
+slow, but still reactive surveillance system (either automated or
+controlled by a human).
+
+We consider it a "post-mortem" response time if no attempt is being
+made to process the information at runtime, and the data is only being
+inspected to undertake a forensic examination after an incident.
+
+==== Near Real-Time ====
+
+A "_near real-time_" response time is considered when a monitoring
+system is capable of reacting to an attempted attack in a very short
+delay, possibly while the attack is being performed.
+
+There is however still a delay, as the surveillance system only
+collects and records data after an activity occurs, thus making it
+impossible to guarantee (though it remains possible) the termination
+an attack before it reaches a critical state.
+
+==== Real-Time ====
+
+A "_real-time_" surveillance system is one that can actually process and
+react to the collected information before any further action can be
+undertaken (either by the attacker or the monitored
+system).
+
+Typically, this would be possible by using a bi-directional reporting
+system if we were in a setting allowing a monitored system to trace
+and intercept all user activity, report it to the watch system, which
+would then validate the pending action based on its knowledge of the
+actual state of the monitored system. It would then provide it with
+informative feed-back to block the required action (and possibly
+contain or expel the attacker) or allow it to executed.
+
+This approach, as idealist as it may appear, is theoretically
+possible. It is the logical evolution and combination of our hybrid
+defense systems and honeypots, combining reactive monitoring and
+control for dynamic intrusion prevention, detection, containment and
+mitigation. Unfortunately, considering the current state of hardware
+components at the day of this writing, the impact on the monitored
+system's performance would probably render it unusable for most
+enterprise use cases, and we assume this is the only reason why such a
+system has not been designed and implemented yet.
+
+== Forensics Analysis Specifics ==
+
+We call here forensics analysis the act of processing collected data
+to examine a digital crime-scene, for instance by reproducing an
+attempted attack (or any other critical or casual activity).
+
+This is made possible through extensive tracing of all low-level user
+activity as described in the previous sections, which allows the
+development of forensics tools to go through a legitimate or
+illegitimate user step-by-step.
+
+Eventually, such an approach allows one to not only reproduce an
+attack, but also to:
+
+ * understand how the system has been penetrated and compromised,
+ * recover a damaged or destroyed system.
+
+There are various appropriate techniques we list as reference on how
+to reproduce an attack. Some approaches rely on the very fine
+granularity of the monitored data and its timestamping to build a
+textual and possibly visual representation of the penetration.
+
+An example of textual representation would consist in the querying of
+the collected database to extract only information relative to a given
+time frame, for a given user's execution of a very specific command.
+
+An example of a visual representation could be derived from such
+queries to form a graphical workflow of every _atomic_ actions leading
+ultimately to the compromised state, from the evidence of the attacker
+exploiting a vulnerability to his covering of his tracks.
+
+== Reconstruction and Recovery Specifics ==
+
+We provide in this section a brief overview of the existing data
+recovery methods, as well as the existing system and scenario
+reconstruction techniques used by commercial and open source software
+solutions.
+
+We outline their specific strengths and weaknesses, to lead to the
+conclusion that they could be combined for improved performance
+depending on the favored angle of the size/speed/integrity trade-off.
+
+=== Imaging: Snapshots, Clones and Ghosts ===
+
+In the world of software backup, an image refers to a bit-to-bit copy,
+which can be saved on storage units. In the occurrence of a system
+crash, the whole image of the damaged system can be copied back onto
+the live hardware (or on another machine with similar hardware
+specifications). This technique can be found in various commercial and
+open source software solutions.
+
+The term snapshot is often used in virtualized environments, like in
+the xVM VirtualBox [29] or VMWare [30] products. In this case, a
+snapshot of the virtual machine is captured to be reverted to at a
+later stage. Various file systems, like ZFS [27], also use the same
+systems to revert to previous states.
+
+The term clone is more general, and usually used in the context of
+disk cloning, when a bit-to-bit copy of single hard drive produced. It
+is also sometimes referred to as ghosting, as a drive is being
+completely cloned, hence being frozen as a shadow copy. This term is
+used for instance by Symantec's Norton Ghost [31] product line.
+
+Imaging solutions provide a relatively convenient method for
+safeguarding the loss of sensitive data. However, they also come with
+some major drawbacks, as they might require quite a huge amount of
+storage space, and a reliable infrastructure to automate the
+snapshots. The principle of "incremental backup" comes in very handy
+here, as the successive storage of regular snapshots allows one to
+remove previous images.
+
+In addition to this, if a snapshot is taken after the state of a
+machine has been (even partially) compromised, and the previous
+backups have not been preserved, then we are left with a potentially
+harmful snapshot.
+
+=== Revisions and Rollbacks ===
+
+Revision Control Systems provide incremental backup features to
+document management systems or file systems, while allowing them to
+spare some storage space. Such solutions allow the user to roll back
+to a previous state, as one would do with an image, but by using
+incremental differential revisions of a file. In this case, a given
+file is not being backed up following at regular time intervals, but
+every time a modification occurs. Of course, it means this setting
+will potentially affect the required storage space on very active
+systems.
+
+Such systems exist for various purposes, and are very common as
+Software Configuration Management suites or Version Control
+System. Classics of the likes are of course CVS [32], SubVersioN [33],
+Bazaar [34], Mercurial [35], git [36]... each and every one of them
+coming with similar capabilities extended by sets of more specific
+features, depending on the desired software development approach
+(distributed or centralized, for instance).
+
+Some experimental file systems also resort to revision control, like
+Wayback [28], RepoFS [25] or the Carbon Copy FileSystem [26].
+
+=== Virtualization / Sandboxing ===
+
+Virtualization is a technique allowing a complete system to run on top
+of another system without being aware of it. This has become quite
+common in the past couple of years, rendering hardware abstraction
+transparent, and allowing graceful degradation of damaged systems.
+
+ReVirt [6] had proven that it is possible, by capturing and logging
+all non-deterministic events generated by a target system running
+inside a virtual machine, to be able to reproduce actively the
+execution of a system from a given checkpoint. However, this technique
+has the disadvantage to take as much time to reconstruct and discover
+what happened as the time it took to the intrusion to be realized.
+
+=== Reconstruction ===
+
+We have found two different but interesting methods for reconstructing
+a path of event.
+
+The first one is the approach taken by *Forensix* [9] which can generate
+SQL-based queries to retrieve a succession of events depending on
+different key input (PID, Timefame, paths, ...).
+
+The second approach, which Bactracker [12] implements, starts from the
+intrusion detection point and then goes backwards, building a graph of
+all possible chains of events, which could have led to the
+intrusion. This allows the identification of the process used by the
+intruder to achieve his mischief.
+
+
+---------------------------------------------------------------------------
+
+
+= Lemona =
+
+== Background ==
+
+=== System Calls ===
+
+In computer systems, the _kernel_ is the main component of the operating
+system. It is the piece of software in charge of managing the system
+resources (i.e. communicating with the hardware devices). Being the
+lowest abstraction layer between software and hardware, every
+operation needing support from the underlying hardware or requesting
+to update the hardware's state will go through it.
+
+Thus, applications that need to access the hardware use the _kernel_
+layer interface: the _system calls_ (commonly called _syscalls_).
_Syscalls_
+are functions invoked by user applications to request some service or
+resource from the _kernel_ [37].
+
+With this in mind, it is obvious that it is possible to have a
+complete monitoring of what applications request to do with the
+resources of the computer system by adding logging mechanisms at the
+entrance and exit of these functions.
+
+=== Scheduler ===
+
+We have seen that we can monitor access to the computer system's
+resources by simply logging the usage of the _kernel_ _syscalls_. However,
+due to the nature of the _kernel_, we need to capture log from other
+parts of the _kernel_ to be able to draw a timeline of an application's
+execution, namely the ones in charge of allocating time slices to
+applications: the _scheduler_. The _scheduler_ is the part of the _kernel_
+which decides who can access the CPU at a given moment in time and for
+how long. Indeed, a system gives the illusion of having several
+applications running concurrently. However (multi-processor systems
+aside) only one single program can make use of the CPU at any given
+moment: the _scheduler_ allocates a tiny amount of time to each
+application which leads to this illusion of concurrency [38].
+
+=== Memory Mapping ===
+
+Monitoring _syscalls_ permits to see easily every change made to files
+on disk. However, modern _kernel_s give the possibility to an
+application to map files in memory by using the mmap _syscall_. This
+allows an application to access directly the content of a file as if
+it was part of its own memory space (i.e. without having to issue the
+corresponding _syscall_ when it needs to read or write to it). It also
+permits to reduce the performance impact on data manipulation and
+moves by avoiding the intermediate copy of the data in a _kernel_
+buffer. Every time the application will try to access a part of the
+file not yet mapped into memory, the system will emit a so-called Page
+Fault. This event will suspend the application execution and inform
+the _kernel_ that the application tried to read or write a yet unmapped
+position. Then the _kernel_ will load the relevant data into memory and
+allow the application to resume its execution, starting over from the
+point preceding the _Page Fault_. The application can now read or write
+from or to the portion of the file as intended. In order to log the
+data modified and accessed by this kind of application, it is thus
+necessary to alter the part of the _kernel_ in charge of handling Page
+Faults in addition to the logging of the mmap related _syscalls_.
+
+Because these file portions are loaded using the granularity of a
+so-called _Page Size_ (usually 4 Kilobytes), the logging mechanism is
+bound to use the same. This is translated by the fact that the logging
+of the changes made to a portion of the file can only be done when the
+application is removed from the CPU by the _scheduler_. Doing so earlier
+would not guarantee that every changes made to a given portion (page)
+of the file will be logged.
+
+== Architecture ==
+
+http://lemona.googlecode.com/svn/docs/images/architecture/lemona-architecture-5.png
+
+*Lemona* is a compound of several components (Figure
[http://lemona.googlecode.com/svn/docs/images/architecture/lemona-architecture-5.png
4] and Figure 5), separated in three categories:
+
+{{{ TODO: get diagram from PDF version here }}}
+
+ * Monitoring Components
+ * Kernel Patches
+ * Loadable Driver Modules
+ * Logging Components
+ * Database Servers
+ * Logging Servers
+ * Forensics Components
+ * Database-Querying Applications
+ * Data Mining Tools
+
+=== Monitoring Components ===
+
+==== Kernel Patches ====
+
+These are alterations to the Linux Kernel codebase needed to monitor
+_syscalls_ entry and exit points, as well as application scheduling and
+memory mapped files related _Page Faults_. The patches are designed to
+allow a literate user to remove the *Lemona* functionalities (and thus
+their processing overhead and memory footprint) from the _kernel_ if he
+chooses to disable it upon compilation.
+
+==== Loadable Kernel Modules ====
+
+While the patches add code at key _kernel_ points for logging, the
+driver does the actual logging. It structures the logged information
+and sends the result to the various designated backends. Even Though
+we made it possible to dynamically load and unload the *Lemona*
+functionalities, we only did so to facilitate debugging and initial
+testing, or for end-users convenience. However, administrators of
+production systems should build the module statically for security
+reasons. This will for instance avoid early logs to be lost.
+
+So far, the modules can only transmit the generated logs via two
+different methods:
+
+ * Via an (un)encrypted socket
+ * Via the kernel's relay functionality
+
+===== Socket Method =====
+
+Upon startup, the module will prompt the user for the passphrase used
+to encrypt sent to the logging server. Alternatively, this passphrase
+could be stored on the local file system for automatic startup; but
+this increase the security risk. If the link to the server is
+considered sure enough, it is possible to disable the encryption
+mechanism and thus reduce the CPU consumption impact of *Lemona*.
+
+===== Kernel Relay Method =====
+
+This method exists mainly for testing purposes. Using the relay
+functionality of the _kernel_, the logs are made available to user space
+applications using the debugfs file system. User applications are then
+free to transfer the data back to a logging server or on a local
+database.
+
+It is important to note that data can easily be lost using any of
+these techniques. In the first case, if the server does not answer or
+is too slow to treat the data, incoming log data will be simply
+drop. The same applies to the relay facility. Once the buffers are
+full, new incoming logs will be discarded until the user application
+processes enough data to free up one or more buffers.
+
+=== Logging Components ===
+
+==== Databases Servers ====
+
+SQL databases are used to stock the logs, making it easy for forensics
+expert to query them and request specific data. This allows for quick
+access to relevant informational datasets, and for easy and flexible
+narrowing of the search domain (e.g. by date, applications,
+etc...). Depending on the database server used on the backend, useful
+requests could be recorded for future reference. In addition, we plan
+to rely heavily on database caching mechanisms to speed up information
+retrieval.
+
+==== Logging Servers ====
+
+This application is in charge of receiving the logs from the
+_kernel_. Upon startup, the *Lemona* driver will connect to the logging
+server and start sending him data as it comes through. In order to
+maximize throughput, the server should preferably be directly
+connected to the monitoring machine via a fast link (i.e. Gigabit
+Ethernet). For security reasons, the machine hosting the server should
+be hardened, and avoid unnecessary services should be disabled and
+uninstalled altogether.
+
+The server application is actually a compound of two programs. The
+first one is in charge of receiving the data sent by the driver. It
+immediately appends the data to a set of buffer files without any
+modification. Once a file is full, another one is created. The second
+application will read already filled files, decrypt them and insert
+the data in the database. This architecture avoids heavy slowdowns
+upon data reception by delegating the load of the decryption (if any)
+on the receiving application.
+
+=== Forensics Component ===
+
+==== Database-Querying Applications ====
+
+We are currently working on a querying application to provide simple
+interactive access to the database data. This application should
+permit the creation of specific queries without the user having to
+know the database schema. It would present the output in a meaningful
+way targeted at forensics expert.
+
+This part of the *Lemona* architecture is still under heavy R&D though.
+
+
+== Experimentation ==
+
+Three different tests have been planned to evaluate *Lemona*'s output
+and assert its usability, one for each type of components.
+
+==== Monitoring Components ====
+
+For these components, we want to check the performance impact of the
+*Lemona* system for "standard" system usage, based on tests presented by
+*Forensix* [9]: the build of a _kernel_ from scratch and the total
+throughput of an Apache web server. Every test is carried using 12
+different configurations:
+
+ * *Lemona* has neither been built, nor loaded
+ * *Lemona* has been built as a module, but the module is not loaded
+ * *Lemona* has been built as a module and the module has been loaded:
+ * Relay reporting is enabled / Socket reporting is disabled
+ * Relay reporting is disabled / Socket reporting is enabled with
encryption
+ * Relay reporting is disabled / Socket reporting is enabled without
encryption
+ * Relay and Socket reporting are enabled (with encryption)
+ * Relay and Socket reporting are enabled (without encryption)
+ * *Lemona* has been built statically:
+ * Relay reporting is enabled / Socket reporting is disabled
+ * Relay reporting is disabled / Socket reporting is enabled with
encryption
+ * Relay reporting is disabled / Socket reporting is enabled without
encryption
+ * Relay and Socket reporting are enabled (with encryption)
+ * Relay and Socket reporting are enabled (without encryption)
+
+
+==== Logging Components ====
+
+For each of the Monitoring Components' tests described above, the CPU
+and I/O usage of the logging server will be recorded, along with the
+average size of the log files and number of database entries.
+
+==== Forensics Components ====
+
+Using real case scenarios (to be determined), the application will be
+tested in order to determine:
+
+ * if information relevant in the context of a security breach or a
system failure can be easily retrieved from the database;
+
+ * the amount of time needed to query the database depending on this size.
+
+== Results ==
+
+ NOTE: As of this writing, results are still being collected. We are
+ just providing below some insights on our expectations and our
+ checkpoints.
+
+=== Coverage ===
+
+We intend to cover close to 100% of a system's activity, in that we
+actually trace all existing _system calls_ and monitor memory mapped
+areas' read and write accesses. We should therefore be able to monitor
+all actions executed by all (human and machine) users logged onto a
+system as the system's core activity. We expect to be able to collect
+enough data to virtually reconstruct a complete system from a given
+checkpoint.
+
+=== Performance ===
+
+==== CPU ====
+
+Though results are not available yet, we expect a significant impact
+on the monitored system's performance. The related projects have
+experienced similar difficulties, but remained usable. However they
+did not use the same accuracy and granularity as *Lemona*. Our reference
+test will be similar to the one presented by *Forensix* [9], which uses
+a web-server with medium load to produce benchmarks. We expect to have
+performance impacts double as aggressive than *Forensix*. This will of
+course also depend on the system's settings, and the activation of
+optional modules (for instance, the encryption of the reported
+traces).
+
+==== Network ====
+
+As for the CPU consumption, the network resources take a significant
+hit if *Lemona* is configured to transmit its traces over the
+network. Considering *Lemona* will be transmitting all the traces over a
+network interface, it will be under considerable load. However, we
+might implement buffering techniques to improve this, and we think the
+solution should provide satisfactory results to be used on a
+_LAN_. Benchmarks will show if this is a viable solution for systems
+residing on more distant networks, such as _WAN_/_VPNs_.
+
+==== Memory ====
+
+As of today, the tracing structures' memory footprint is less than 30
+bytes, if they do not require additional parameters. Regarding the
+memory consumption on the storage point's side, which we expect to
+have a high rate, we are not able as of today to provide valuable
+benchmarks, for various reasons, which are explained below.
+
+ * *Lemona*'s internal storage structures are morphing because:
+ * We are still in the process of modifying our API;
+ * They depend on which system call is being traced.
+ * The rate at which the traces will be generated is really variable
because:
+ * It depends on the kind of load the monitored host is being put under;
+ * It depends on the kind of activity the monitored host is performing
(watching a movie or doing a full-text search will not have the same system
call throughputs)
+
+
+---------------------------------------------------------------------------
+
+
+= Conclusion =
+
+ NOTE: Conclusions will be drawn once the results have been collected
+ and tested. Some limitations exist from the ground up and they are
+ listed below. We also reference future work and ideas we might deal
+ with in future versions of *Lemona*, but which are currently out of
+ the scope of our research project.
+
+
+== Limitations ==
+
+*Lemona*, though designed to be a more complete monitoring facility than
+the existing solutions, is not foolproof. These are the various (we
+could think of so far) to circumvent its surveillance, or render it
+inefficient.
+
+=== Break the Pipe or Break the Storage Point ===
+
+If the connection between *Lemona*'s storage point and the
+*Lemona*-enabled monitored host can be severed, then of course the
+system will not be able to be recovered from future
+crashes.
+
+Similarly, an attacker could get control over the storage point, which
+mean he or she could not only prevent monitoring of malicious
+activities, but also destroy already collected evidence or recovery
+data, and erase tracks of its presence.
+
+Possible solutions to this problem reside in the hardening of the
+transfer connections and host systems, and maybe also in the use of
+multiple storage points and connections. There is however no
+completely foolproof solution to this issue, and there will not ever
+be one, neither with *Lemona* or any other system.
+
+=== Stuck the Pipe ===
+
+If an attacker can somehow manage to block outgoing connections or to
+overflow the network's throughput, then he or she may be able to
+perform malevolent activities that will not have the time to be
+reported to the storage point before the system is brought to an
+unrecoverable crash or before he can tamper with *Lemona*'s system.
+
+A solution to this issue would be to force the system to attribute the
+highest priority to outgoing packets emitted by the *Lemona* reporting
+components. This is technically possible, but we have not implemented
+such a feature so far.
+
+=== Break Lemona ===
+
+It is actually possible that an attacker, who would succeed in
+performing a privilege escalation, might render *Lemona* unusable. If
+combined with one of the previous method, it means the system will not
+be monitored anymore. If it is not, then it means we will still be
+able to review the attack procedure used by the attacker, but we will
+not be able to examine the data that was compromised once *Lemona* was
+taken down. This could be a problem if an organization needs to assert
+its losses and their criticality.
+
+There is no real solution to avoid this. *Lemona* is enabled on a host,
+and if the host can be compromised, so can be the tracing and
+reporting modules.
+
+
+== Future Work ==
+
+There are countless possible improvements to *Lemona*, some of which are
+listed below.
+
+=== IDS/IPS Integration ===
+
+Conceptualy, *Lemona* might as well be used as an _IPS_/_IDS_ or
+interoperate with one. This would of course require the dynamic
+analysis of the monitored information. To have *Lemona* act as or
+collaborate with an _IDS_, it would require its full
+tracing/reporting/analysis/response process to be completed in near
+real-time. Furthermore, to turn *Lemona* in or have it operate with an
+_IPS_, then it would require this same process to be _atomic_ and
+_real-time_, as we discussed earlier in our "related work" section. Both
+solutions would also require *Lemona* to be packaged and/or communicate
+with a database of exploits' workflows.
+
+=== Improvements to the Static Statistical Analyzer ===
+
+There are many methods to analyze traces of a monitored system. The
+flow of _system calls_ can for instance be compared to a database of
+well-known exploits, as stated earlier. We could then use pattern
+matching techniques to identify attacks spread over longer
+timeframes. Several pattern detection mechanisms would be suitable for
+this purpose, though they might not be usable in an _atomic_ and
+_real-time_ scheme to turn *Lemona* into an _IPS_. They would however be
+very useful to reduce the noise in the traces and analyze brand new
+vulnerabilities.
+
+The analyzer could also rely on new geometrical analysis
+methods. These methods typically imply the representation of the
+host's activity as a mathematical curve or geometric form, which can
+then be compared to a database of preset forms matching the exploits'
+database.
+
+Those are only two of the many improvements *Lemona* could benefit of,
+and numerous other variants are already being studied.
+
+= References =
+
+
+= Appendices =
+
+
+-----
+
+[WorkshopPapers Workshop Papers] > [Workshop20081114 IT-SOFT 2008 Workshop
Paper (2008.11.14)]
\ No newline at end of file