Status update March 2019

21 views
Skip to first unread message

Dieter Weber

unread,
Mar 8, 2019, 8:12:05 AM3/8/19
to libe...@googlegroups.com
Dear all,

after a longer pause, it is my pleasure to inform you about several
interesting updates!

# Google Summer of Code

LiberTEM is participating in the Google Summer of Code under the
umbrella of the Python Software Foundation:
https://libertem.github.io/LiberTEM/gsoc.html

If you know students who would like to get paid for three months to dive
into LiberTEM programming, just point them in our direction! This could
be a good preparation for a Bachelor's or Master's thesis to level up
skills in Python, numpy, "big data" and software engineering, for example.

# Cluster benchmark

After long delays we could FINALLY benchmark LiberTEM on an optimized
cluster.

System configuration: Supermicro Microcloud 5038MD-H8TRF blade system
with 8 blades. Each blade: Intel Xeon CPU D-1541 @ 2.10GHz (8 cores), 32
GB RAM, 2x Samsung SSD 970 EVO 2TB as a software RAID0, CentOS 7.6, XFS
file system, 10 GbE Ethernet, using Python 3.6.6 and Pytorch 1.0.1.post2
with Intel MKL back-end. Separate head node running the Dask scheduler
and client with Intel Xeon W-2195 CPU @ 2.30GHz (18 cores).

In the IO-limited case (480 GiB of float32, frame size 128x128), the
throughput scaled nearly linearly with the number of nodes to 46 GiB/s
on eight nodes. The CPU-bound (24 GiB float32) limit on a single cluster
node is 11 GiB/s. The head node reaches 19 GiB/s in the CPU-limited
case. Appended you find plots with more details on the scaling behavior.

Alex put in significant work to develop a reader for RAW files that uses
Direct IO to avoid thrashing the file system cache when the files are
larger than the available memory. That leads to a significant
improvement in this scenario. Memory mapping remains the most efficient
method to access files that are in the file system cache.

Generally, we can recommend a RAID0 of 2x Samsung 970 EVO 2TB SSDs to
work with large data sets: Fast, large, cost-efficient.

# Developments

Since the last update in November, the following features were added:

* Support for n-dimensional data sets in the back-end and API
* Support for SER files
* Support for complex numbers
* GUI usability improvements
* Support for cross-platform remote dask cluster
* Improvements for MIB and K2IS reader

A prototype for user-defined functions is working very well. We are
targeting to release version 0.2 as soon as this feature is finished.
With that release we are planning to streamline our release process so
that we can release more often.

For the time after the release, the following items are planned:

* Generalize and streamline the IO part of LiberTEM so that more than
just RAW files can benefit from optimized reading methods like Direct
IO, and allow more targeted reading of portions of a file.
* Full cluster user operation, including authentication and local
partitioned caching of remote data sets.

Furthermore, we are eagerly awaiting the decision on our ATTRACT
proposal HOLATEM that we hope will fund development of live data
acquisition and processing.

Additionally to our ERC Proof-of-concept grant VIDEO and support by
Forschungszentrum Jülich, we now gratefully acknowledge funding from the
European Network for Electron Microscopy (ESTEEM3): "This project has
received funding from the European Union’s Horizon 2020 research and
innovation programme under grant agreement No 823717 – ESTEEM3."

With best regards,
Dieter

--
Dr. Dieter WEBER

Peter Grünberg Institute, Microstructure Research (PGI-5)
Ernst Ruska-Centre for Microscopy and Spectroscopy with Electrons (ER-C)
Forschungszentrum Jülich
52425 Jülich, Germany

Email: d.w...@fz-juelich.de
Phone: +49 2461 61 85118


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

IO bound madmax.png
CPU bound single node.png
IO bound madmax - read methods.png
Reply all
Reply to author
Forward
0 new messages