Serialization in AiiDA

27 views
Skip to first unread message

Otto Kohulák

unread,
Jun 30, 2022, 5:57:36 AM6/30/22
to aiida...@googlegroups.com
Dear AiiDA community,

I have a project where I was trying to serialize the AiiDA Data classes into files. I was looking around and noticed there is some 'aiida.orm.utils.serialize' module in AiiDA. But it acts kinda strange:

from aiida.orm.utils.serialize import serialize
x = Int(1)
x.store()
serialize(x)
"!aiida_node '717f18d2-79f0-47bc-915a-e5a20b609b7c'\n"

It only shows uuid and no data stored in that Data class. Is there a native serialization tool? There should be one since in the end everything is stored in the database.

This is for my project where I try to send data between different python interpreters where one has access to a database and the other has not (but it runs on a supercomputer).

Best regards,

O.

Sebastiaan Huber

unread,
Jun 30, 2022, 6:27:13 AM6/30/22
to aiida...@googlegroups.com
Hi Otto,

The serialization module you reference is used to communicate nodes across RabbitMQ.
Since the other end (the daemon workers) have access to the same instance (Python environment and database) it can simply reload it.

The serialization of node data to JSON format so it can be stored in Postgres is done by this function:
https://github.com/aiidateam/aiida-core/blob/cb82327e5e4587b8752674a6b57e64613a5e11f2/aiida/orm/implementation/utils.py#L40

This should do what you are looking for.

That being said, may I ask what you are planning on doing, or what problem you are trying to solve?

Hope that helps,

Sebastiaan
--
AiiDA is supported by the NCCR MARVEL (http://nccr-marvel.ch/), funded by the Swiss National Science Foundation, and by the European H2020 MaX Centre of Excellence (http://www.max-centre.eu/).
 
Before posting your first question, please see the posting guidelines at http://www.aiida.net/?page_id=356 .
---
You received this message because you are subscribed to the Google Groups "aiidausers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aiidausers+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aiidausers/CAK%3DGx5iMTzs0HwL1c7445F9OZaDD3Rsuzx9jBSRX8De2EwQjBQ%40mail.gmail.com.

Otto Kohulák

unread,
Jun 30, 2022, 7:41:21 AM6/30/22
to aiida...@googlegroups.com
Dear Sebastiaan,

I tried your suggestion it worked for Int, however, when I tried to pass SinglefileData I got an error:

type `<class 'aiida.orm.nodes.data.singlefile.SinglefileData'>` is not supported as it is not json-serializable

Do I need to do something more?

My motivation can be found in my emails I sent to this googlegroup two weeks ago (I don't know how to link them) and in aiida github discussions [1]. Basically I have python code that I would like to run not on my workstation but on a supercomputer. In other words I would like to write a calcfunction that have all of the nifty features of CalcJobs (dedicated working directory, supercomputer resources, ExitCodes, input/output data validation ... etc). One solution is to use aiida-dynamic-workflows but I trying to do a solution that would require less time to set it up. The idea can be seen on a sketch I have made [2] (look at tests/classme.py). I made my own serializator for the most common data types that can be found in aiida.orm. But I would like to have something more general.


št 30. 6. 2022 o 12:27 'Sebastiaan Huber' via aiidausers <aiida...@googlegroups.com> napísal(a):

Sebastiaan Huber

unread,
Jun 30, 2022, 8:03:27 AM6/30/22
to aiida...@googlegroups.com
Dear Otto,

What you are referring to, a serializer for any `Node` type to JSON, doesn't exist (yet).
There is an open issue for this on the repository: https://github.com/aiidateam/aiida-core/issues/5464

The function I linked only serializes the "attributes" of a node, since that is what is stored as JSON in a database.
However, a node can of course also store files in the repository.

You could write your own generic serializer (essentially serializing all attributes, extras and repository objects), however, a generic deserializer is not possible.
Since `Data` can be subclassed, it is not possible to recreate an instance without actually having the implementation of the subclass.

If you "just" need the JSON data on the other computer, you could write an off-the-cuff serializer by doing something like:
import json
import pathlib

# Get information stored in the database
serialized = {
    'pk': node.pk,
    'uuid': node.uuid,
    'label': node.label,
    'description': node.description,
    'node_type': node.node_type,
    'node_type': node.node_type,
    'process_type': node.process_type,
    'extras': node.extras,
    'attributes': node.attributes,
    'objects': {}
}

# Get content from the repository
for root, dirnames, filenames in node.base.repository.walk():
    for filename in filenames:
        key = pathlib.Path(root) / filename
        with node.base.repository.open(key) as handle:
            serialized['objects'][str(key)] = handle.read()

json.dumps(serialized)
Note that I haven't tested this and it isn't complete.
It doesn't include things like the `ctime`, `mtime` and the user and computer of the node.
But maybe this can be useful for your use case.

Again though, this is not useful if you are looking to automatically "reconstruct" the Python instance on the other side.
This generic approach doesn't exist yet and is not trivial without updating all existing `Data` types.

Hope that helps,

Sebastiaan
Reply all
Reply to author
Forward
0 new messages