Hi Simon,
The primary object matminer works with is the
pandas dataframe. You can use matminer without the dataframe but is a lot easier to just use it. You don't need a MongoDB database, just the dataframe.
Here's an example of how to go from a bunch of cif files and properties to dataframe:
import os
import pandas as pd
from pymatgen import Structure
properties = []
structures = []
for i, structure_file in enumerate(os.listdir("path/to/cif/files"):
property = get_property_from_index(i)
structure = Structure.from_file(structure_file)
properties.append(property)
structures.append(structure)
df = pd.DataFrame({"some_property": properties, "structure": structures})
print(df) # make sure the dataframe appears like you intended
df.to_pickle("/path/where/u/want/to/save/ur/dataframe.p")
You can then load your dataset later with:
df = pd.read_pickle("/path/where/u/want/to/save/ur/dataframe.p")
By the way, if your dataset is open source, published in a peer reviewed journal, and not already in
matminer , please consider adding it to matminer via our
dataset addition guide!