Convert to trusted LibreOffice file

66 views
Skip to first unread message

Michael Carbone

unread,
Nov 14, 2017, 12:37:43 PM11/14/17
to qubes-devel, Raphaël Vinot
Hi folks,

A colleague at CIRCL recently released ODFCleaner:

https://github.com/CIRCL/ODFCleaner

Could be worth exploring integration as an additional feature similar to
Convert to trusted PDF.

--
Michael Carbone

Qubes OS | https://www.qubes-os.org
@QubesOS <https://www.twitter.com/QubesOS>

PGP fingerprint: D3D8 BEBF ECE8 91AC 46A7 30DE 63FC 4D26 84A7 33B4



Marek Marczykowski-Górecki

unread,
Nov 14, 2017, 6:07:21 PM11/14/17
to Michael Carbone, qubes-devel, Raphaël Vinot
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Tue, Nov 14, 2017 at 12:37:27PM -0500, Michael Carbone wrote:
> Hi folks,
>
> A colleague at CIRCL recently released ODFCleaner:
>
> https://github.com/CIRCL/ODFCleaner
>
> Could be worth exploring integration as an additional feature similar to
> Convert to trusted PDF.

Well, this indeed could be useful. Also, running such tool in DispVM
makes sense. But the security model here is very different than PDF
converter. In PDF converter we have two parts:
- complex one: rendering PDF in DispVM, returning "simple
representation"
- simple one - running in calling VM, responsible for parsing
trivial(!) format returned data from the first part and assembling it
back into PDF

In ODFCleaner I don't see any simple representation in between. So, if
that code got exploited(*), the resulting file may still be hostile.
So, running this tool in DispVM may be useful to guard file-storing VM.
But it will not guarantee that the output file is safe.

(*) which is IMO less likely for this code, than for full LibreOffice.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJaCBAhAAoJENuP0xzK19csVNwH/RdxcXuwAQzj8qeNW+q+APQm
61bAvYpuUo9dmzF+t3rTxfiWUGDygKDhIu7M1UJL7QTCGeHZxjrsERx8luIg6+hy
ig7pm4sKHhnA/oA+EP54KudWYwJ7KGCDfs1nuZO6LEUC3NXsrFuFAc1yAQmVYkn2
XWo0E1gkBrVt8TG2Z4Dq/7ueFl0G63b00vuo4V4gA4uUV0i/5whnmGtmZqbwBvx3
yQDcCUmeNbESQXfxI79s/QlD8CKyNQCBld1dLxG/8aJCTmKXbS6Pwv+94xCtL6ht
5lGL8XhH7yb8a/6I2V7O5AyapylOEt6xUkYV8KxVUiDzsASRmdT+8bF8Mlu/CI8=
=v/S/
-----END PGP SIGNATURE-----

Andrew Clausen

unread,
Nov 15, 2017, 3:51:22 AM11/15/17
to Marek Marczykowski-Górecki, Michael Carbone, qubes-devel, Raphaël Vinot
Hi all,

On 14 November 2017 at 23:07, Marek Marczykowski-Górecki <marm...@invisiblethingslab.com> wrote:
Well, this indeed could be useful. Also, running such tool in DispVM
makes sense. But the security model here is very different than PDF
converter. In PDF converter we have two parts:
 - complex one: rendering PDF in DispVM, returning "simple
   representation"
 - simple one - running in calling VM, responsible for parsing
   trivial(!) format returned data from the first part and assembling it
   back into PDF

In ODFCleaner I don't see any simple representation in between.

What about using pandoc?  [1, 2] It would be possible to use markdown as the intermediate representation.

I like that pandoc is almost entirely written in Haskell, so that rules out a large class of potential vulnerabilities.  In fact, I'm not sure there would be much to be gained by using a disposable VM -- especially if the non-Haskell bits are disabled.

Kind regards,
Andrew


Raphaël Vinot

unread,
Nov 15, 2017, 9:06:38 AM11/15/17
to Marek Marczykowski-Górecki, Michael Carbone, qubes-devel
Hi all,

On 11/15/2017 12:07 AM, Marek Marczykowski-Górecki wrote:
> On Tue, Nov 14, 2017 at 12:37:27PM -0500, Michael Carbone wrote:
>> Hi folks,
>
>> A colleague at CIRCL recently released ODFCleaner:
>
>> https://github.com/CIRCL/ODFCleaner
>
>> Could be worth exploring integration as an additional feature similar to
>> Convert to trusted PDF.
>
> Well, this indeed could be useful. Also, running such tool in DispVM
> makes sense. But the security model here is very different than PDF
> converter. In PDF converter we have two parts:
> - complex one: rendering PDF in DispVM, returning "simple
> representation"
> - simple one - running in calling VM, responsible for parsing
> trivial(!) format returned data from the first part and assembling it
> back into PDF
>
> In ODFCleaner I don't see any simple representation in between. So, if
> that code got exploited(*), the resulting file may still be hostile.
> So, running this tool in DispVM may be useful to guard file-storing VM.
> But it will not guarantee that the output file is safe.
>
> (*) which is IMO less likely for this code, than for full LibreOffice.


Just a few more things: it is far from being a complete protection: it
simply does some cleanup in the XML content and remove extra parts. I'm
relatively certain it's going to let some potential active code through
but it's better than nothing.

I simply ported personal code and XSLT of Jos van den Oever to python
and I need to test it against malicious documents (I haven't done that yet).

I don't think it will ever replace opening the document in a DispVM, but
it could be a starting point for a sane-ish sanitizer of ODF files.

Cheers,

--
Raphaël Vinot
CIRCL - Computer Incident Response Center Luxembourg

41, Avenue de la Gare
L-1611 Luxembourg

(+352) 247 88444 - in...@circl.lu - www.circl.lu

Vít Šesták

unread,
Nov 18, 2017, 6:24:27 AM11/18/17
to qubes-devel
Running in DispVM can prevent some class of attack that extracts data using techniques like path traversal, XXE (hmm, …) or other attacks below RCE.

Regards,
Vít Šesták 'v6ak'
Reply all
Reply to author
Forward
0 new messages