Best way to insert your own dataset?

36 views
Skip to first unread message

simonb...@gmail.com

unread,
Mar 30, 2019, 10:52:50 PM3/30/19
to matminer
Hi all, 

I was wondering what would be the best way to use MatMiner with my own dataset. I'd like to use the featurization and plotting capabilities, but have a separate data set. Ideally, I'm looking for something where I can start from .cif files + properties and integrate that with MatMiner. Would the best way be to load those into pymatgen, build my own MongoDB database from that and use that to interface with MatMiner? Or is there a better way to go about this? 

Thanks, 
Simon

ard...@lbl.gov

unread,
Apr 1, 2019, 12:41:52 AM4/1/19
to matminer
Hi Simon,

The primary object matminer works with is the pandas dataframe. You can use matminer without the dataframe but is a lot easier to just use it. You don't need a MongoDB database, just the dataframe.

Here's an example of how to go from a bunch of cif files and properties to dataframe:

import os
import pandas as pd
from pymatgen import Structure


properties
= []
structures
= []
for i, structure_file in enumerate(os.listdir("path/to/cif/files"):
    property
= get_property_from_index(i)
    structure
= Structure.from_file(structure_file)
    properties
.append(property)
    structures
.append(structure)


df
= pd.DataFrame({"some_property": properties, "structure": structures})
print(df) # make sure the dataframe appears like you intended
df
.to_pickle("/path/where/u/want/to/save/ur/dataframe.p")


You can then load your dataset later with:

df = pd.read_pickle("/path/where/u/want/to/save/ur/dataframe.p")


By the way, if your dataset is open source, published in a peer reviewed journal, and not already in matminer , please consider adding it to matminer via our dataset addition guide!

Simon Batzner

unread,
Apr 1, 2019, 1:03:12 AM4/1/19
to ard...@lbl.gov, matminer
Great, thank you! 

--
You received this message because you are subscribed to a topic in the Google Groups "matminer" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/matminer/Vs7FxTeH1XA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to matminer+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages