Re: Struggling.....

44 views

Skip to first unread message

Michael Hunger

unread,

Aug 6, 2017, 4:39:43 PM8/6/17

to Roger Stone, ne...@googlegroups.com

Hi Roger,

best to join the neo4j.com/slack neo4j-users slack or the google group. there are a lot of helpful people.

For your questions:

1. you don't need the "root" nodes, the label itself takes the function of that

2. for continuous variables I wouldn't create "categorical" nodes, I would just store them as indexed variables.

3. you might if you want to add some "range" nodes that cover a semantic range if that helps you with modeling or querying.

4. you can still create "categorical" nodes for other attributes that are more like enumerations

5. for time it's up to you if you want to employ a time-tree or not, range searches on indexed properties work too.

Much of the modeling depends on what you want to do with the data later on.

HTH Michael

On Sun, Aug 6, 2017 at 9:27 PM, Roger Stone <rds...@rdsengineering.com> wrote:

Hi Michael,

I am afraid that I am not a data base person (so at least I don't bring any RDBMS baggage) but I am not a coder either. I have an idea for a database i want to create - which means gathering and formatting the data as well as sorting out the db itself. No simple "download the data, clean it up, convert it into a .csv and import for me! I have been working on understanding Cypher but I am not fluent in Java, js, Python or any of these languages.

PLEASE DON'T HESITATE TO SAY IF THIS IS TOO BIG A QUESTION! I have no wish to impose upon you and I will be attending some of the London training over the coming months - but I have been trying to get ahead.

I am an engines nerd and I want to create a database of engines - I won't bore you with the astonishing variety and complexity of the piston engines that have been built over the years - I am struggling with basic things relating to bog-standard passenger car engines!

A lot of the data I will be dealing with are continuous variables - like time but I don't want to create dozens and dozens of clouds of nodes representing 1 mm to 3000 mm or more in 0.01mm steps only to ever use 5% of them! I know there are time trees that let you generate missing nodes on demand and insert them into the tree. I have written some Cypher code to do this for my variable - for the cylinder bore it is shown in the attachment below (these are the notes I write myself both as a manual to help me use the db later and so i can find bits of code I want too re-use). This works a treat and I was very pleased... I want to generate these so that searches such as "show me the petrol engines of bore >60 mm and less than 90 mm introduced in the last 5 years" can be answered without having to chack the bore value for every engine in the database.

Also attached is are

1) Sample input spreadsheet - at a very Mickey mouse level! A real input sheet could have very many columns but only a few rows - maybe even only 1 and

2) An Arrows file showing a whiteboard of the basis of the db structure.

So, my immediate question is - I can see how to import from the spreadsheet into the db - but if I import a new value for bore for example, I understand how I can make sure that a new, unique node is created - but I can't see how to integrate it into my "Bore" array unless I run that piece of cost as part of my csv import and I suppose i could. I am just concerned that if I do that for bore, stroke, cylinder centres, rod length, big end dia, main dia, gudgeon pin dia, ring widths etc etc etc - then my import code is going to be a monster.

This made me wonder if I was just taking completely the wrong approach???

Kind Regards,

Roger

Reply all

Reply to author

Forward

0 new messages