SKDB and materials

Atrus

unread,

Jul 21, 2010, 9:09:07 AM7/21/10

to Open Manufacturing

I noticed on the SKDB wanted list there were materials listed. I'm
assuming that some sort of format the specifies material properties,
plus a large (or at least frequently used) listing of differing
materials in that format is what is being asked for.

For a "spec" sheet of materials I was thinking either of defining an
XML DTD or an SQLite database.

Pros and Cons of XML:

Pro:
* Human readable
* Easy to edit
* Can be custom implemented for each material (some material
properties don't apply to every material)
* 1 XML sheet per material.

Cons:
* Text based, inherently take more storage
* A comprehensive material list would require >50,000 files
* XML can be slow at times.

Pros and Cons of a DB:

Pro:
* Smaller storage
* Faster Execution

Cons:
* Binary Format
* Potential unused rows
* More difficult to create subsets
* Harder to add additional properties (possible, depends on DB
design)

***
Personally I'm leaning toward an XML based format, however I'd like
some input into that decision from you.

Also, if this works out, I wouldn't mind people emailing me any
material spec sheets that they have, I have a few thousand, but more
never hurts!

Bryan Bishop

unread,

Jul 21, 2010, 9:57:59 AM7/21/10

to openmanu...@googlegroups.com, Bryan Bishop

On Wed, Jul 21, 2010 at 8:09 AM, Atrus wrote:
> For a "spec" sheet of materials I was thinking either of defining an
> XML DTD or an SQLite database.

The only reason you would use XML is because of its DTD and,
specifically, when you want to use strict typing. There are other
possible frameworks though, like duck typing. That's why, so far, skdb
has been using YAML. There have been people arguing for DTDs though
(Smári McCarthy, Sam Rose, and maybe Matt Campbell IIRC).

> Pros and Cons of XML:
>
> Pro:
> * Human readable

debatable
> * Easy to edit
debatable

> * 1 XML sheet per material.

meh a lot of people violate this in giant xml archives (but whatever)

> Cons:
> * Text based, inherently take more storage
> * A comprehensive material list would require >50,000 files

number of files doesn't really matter to anyone these days- besides,
once you have a data set, you can translate it back and forth after
writing translators, and if you use XML with a schema or DTD you can
use one of the automatic dtd2sql scripts.

> * XML can be slow at times.

Those are not the real "cons" of XML, my friend. ;-) Well, perhaps in
this context, just maybe. Here's a broader overview of problems with
XML, which aren't really relevant in this context, but are worth
knowing about: .. okay, maybe not. I owe you a link (I have someone
searching for the link, so I'll get back to this shortly and send this
email for now.)

> Pros and Cons of a DB:
>
> Pro:
> * Smaller storage
> * Faster Execution

btw just about everything on the face of the planet is faster than yaml parsing

> Cons:
> * Binary Format
> * Potential unused rows
> * More difficult to create subsets

i'd also add "you can't commit it to a distributed revision control
and expect to get out a usable, human readable diff that we can use",
a biggie!

> * Harder to add additional properties (possible, depends on DB
> design)

that's the same with DTDs in general

> Personally I'm leaning toward an XML based format, however I'd like
> some input into that decision from you.

Have you checked the skdb samples?

a possible way to represent materials in yaml
http://designfiles.org/skdb/doc/proposals/materials.yaml

if not, and if you haven't seen any of the *.yaml stuff yet, you
should spend some time clicking around here:
http://designfiles.org/skdb/

there's a readme for the directory structure stuff here:
http://designfiles.org/skdb/readme

also, i spent some time (and so did others) documenting different
material properties into a list:
http://designfiles.org/skdb/doc/lists/material_properties.txt

someone (probably Smári or Christian Siefkes) did do an XML example of
a manufacturing process though:
http://designfiles.org/skdb/doc/proposals/hall-heroult.process
based off of this DTD:
http://www.tangiblebit.org/xml/process-1.0.dtd

... and fenn spent a lot of time representing some manufacturing processes:
http://designfiles.org/skdb/processes.yaml

a really poorly thought-out dependency tree of parts/tools for transhuman tech:
http://designfiles.org/skdb/doc/proposals/trans-tech.yaml

basic example of using tags (!foo) in yaml:
http://designfiles.org/skdb/doc/proposals/tags.yaml

general architecture description readme thing:
http://designfiles.org/skdb/doc/architecture

there's a lot of background, but suffice it to say as part of the
Automated Design Lab i did an update to some of the other VOICED
participants on some of the technical details, but this email has
probably been more informative:
http://adl.serveftp.org/lab/presentations/updates-from-austin.pdf

Ben Lipkowitz, myself and Smári McCarthy were talking in #hplusroadmap
(on irc.freenode.net) about this nearly an exact year ago, if you want
to see the logs:
http://gnusha.org/logs/2009-07-17.log
and if you feel like downloading 20 MB of logs: http://gnusha.org/irclogs.txt

> Also, if this works out, I wouldn't mind people emailing me any
> material spec sheets that they have, I have a few thousand, but more
> never hurts!

Has anyone noticed that octopart.com has somehow been able to convince
some of the suppliers to submit data for their electronics? Not just
pdf datasheets, but actual data. I talked with Andre once and I think
what they are doing is a little sly: (1) sometimes they actually get
data from their supplier in a Microsoft Excel spreadsheet, CSV file,
or something else, and they are very happy, but most commonly (2) they
just have their pdf text search engine look at the different values
and parameters in each of the pdf data sheets. It's a horrible,
horrible data wrangling problem, but if it could be solved for all the
millions of data sheets out there, we all know how wonderful life can
become.

Strangely nobody has figured out how to *do* this from a practical
standpoint. Octopart.com's strategy seems to be "become popular and
use that to get the electronics manufacturers to submit data to us,
and we pray that they have digitized their catalog sufficiently". I
have been thinking about spending some of my funds for SKDB on just
converting data sheets over to a parseable format, for some subset of
interesting components or whatever, but my funds are not infinite and
chipping away at this problem by using manual human labor (like via
Amazon's Mechanical Turk or even just hiring goons off craigslist) is
a really easy way to burn money for little comparable gain. My
preference would be to find something that looks more like a geometric
function for amount of funds spent compared to data sheet gain, or
something.

So anyway.. regardless of the initial file format, a big remaining
unsolved problem is the process by which data gets put into the
system. fenn has talked about this a few times on this mailing list,
like about giant proprietary materials data sets, whether or not the
hardness of steel is "public data" or what (legal issues), if we could
just start transcribing from library books without any fretting, or
how that is supposed to work. Are package maintainers for individual
hardware projects to be responsible for documenting the unique
materials in their projects? or is openmaterials.org going to work on
some data sets for us (hi Catarina!). or is there a way that a
"business" front like octopart.com can convince manufacturers,
material suppliers, etc., to become more standardized? My guess is no,
business standardization initiatives are ridiculously hard work and
have been around forever- something different has to happen (like
public discussion of WTF, as we're doing now).

Also, I'll get that link about actual cons re: XML soon. Thank you for
the email!

- Bryan
http://heybryan.org/
1 512 203 0507

Andrew Plumb

unread,

Jul 21, 2010, 10:10:25 AM7/21/10

to openmanu...@googlegroups.com

The biggest issue I run into on a daily basis in EDA flows is the ability to "diff" the changes and implement useful revision control when binary database formats are involved.

XML is slow if you use a DOM parser to read the whole thing into memory. When multi-GB files are involved (eg using XML as an intermediate format for SPICE to VerilogAMS netlist translation) you definitely want to use a more streamed/serialized approach and do multiple passes over the data to access what you need.

Using the streamed XML approach also means you can zip the text for better compression. Then you can pipe unzipped stream to xml parser stream pretty easily. No need to decompress the contents to an intermediate file and/or space in memory.

Andrew.

> --
> You received this message because you are subscribed to the Google Groups "Open Manufacturing" group.
> To post to this group, send email to openmanu...@googlegroups.com.
> To unsubscribe from this group, send email to openmanufactur...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/openmanufacturing?hl=en.
>

--

"The future is already here. It's just not very evenly distributed" -- William Gibson

Me: http://clothbot.com/wiki/

Atrus

unread,

Jul 21, 2010, 2:04:56 PM7/21/10

to Open Manufacturing

To be honest, I hadn't looked at any of the yaml files in the skdb
repo...

However after looking at the example, I noticed two things:
1: Two different materials were entered into the steel section, the
"steel" and "steel, zinc-plated".
2: In addition, there is steel in material. Personally, I believer it
would be better to put each material in a separate file.
3: Also about the steel, it's just steel. Now I realize that it wasn't
supposed to be a comprehensive example, but there are many different
kinds of steel, and each should have their own entry.

But otherwise, I'm not against yaml, it's just that my experience is
limited in it (pretty much App Engine).
Either one can be neat and easily readable (if designed right) and
both are good formats. I'll see about the cons of XML one I see that
email ^^

I did read the material properties list, and have already marked up my
own personal copy with various notes about each property.

And finally, I'll read both the PDF and logs later tonight, and put my
idea of an XML material template up either later tonight or tomorrow.

Thanks for the response!
Tim

> a possible way to represent materials in yamlhttp://designfiles.org/skdb/doc/proposals/materials.yaml

> - Bryanhttp://heybryan.org/
> 1 512 203 0507

Atrus

unread,

Jul 22, 2010, 9:16:16 AM7/22/10

to Open Manufacturing

XML overview, obviously not complete.

</density>

<tensile_strength_ultimate unit="MPa">

</temperature>

<value>

</value>

</tensile_strength>

<electrical_resistivity units="ohm/cm">

</electrical_resistivity>

<dielectric_constant>

<dielectric_constant>

<dielectric_strength units="kV/mm">

</dielectric_strength>

<thermal_expansion units="um/m * C">

</thermal_expansion>

<melting_point units"C">

</melting_point>

<vicat_softening_point units="C">

</vicat_softening_point>

</material>

I would say it's fairly readable, and with a DTD it would be verifiable (not sure if yaml has anything like that), but, honestly both XML and yaml work for me.

Reply all

Reply to author

Forward