Skybuck's Universal Data Structure

skybuck2000

unread,

Apr 3, 2021, 9:09:45 AM4/3/21

to

Today I present to the world "Skybuck's Universal Data Structure".

This new invention describes how to use "Skybuck's Universal Code".

This new invention is ment to describe high level data structures which offers the same kind of flexiblity as Skybuck's Universal Code but at a high level.

Take note that this document is only a "draft" and might need further work, but it does describe the general idea.

The general idea for Skybuck's Universal Data Structure is to describe again the data in terms of "interleaving". However this time the meta data is not a terminator, but a type field. Humans like describing data in terms of types. This is crucial and essential to give data meaning. A terminator for example is already a type. Basically an escape code.

However it is undesireable to introduce escape codes into a universal data structure or encoding. Thus instead of terminating and scanning, interleaving is used. Scanning for a terminator or encoding terminators will become problematic as it requires raw binary data to be transformed to prevent wrong interpretation or missing interpretation, such as a missing terminator.

Also the meta bit of 1 in Skybuck's Univeral Code can be considered a switch statement, it indicates to the machine/reader that it is now switching to a different field.

This combined insight is what led to the discovery/determination that a type field should be introduced which performs functionalities:

1. Switch between "meta data" and "raw data".

2. Terminate data structures

3. Describe the contents of data structures.

Basically this leads to the following design:

<type><data><type><data><type><data>

To see why this could be a superior data structure we could take a look at "Unicode".

In Unicode (not to be confused with Skybuck's Universal Code which is ment for raw data description) all alphabets of the world are thrown together to create one big mess of alphabet soup.

Why was this done ? To facilitate communication between computers ?

But could it not have been done different ? The russians complain the unicode is twice as big for them because of inefficiency of encodement of their part of
the alphabet and that is a valid objection against unicode.

In the past there were codepages which described the alphabet soup in a more efficient way.

Perhaps the problem back then was the lack of software to universally describe these code pages and to embed them into a universal data structure.

Now with this new invention and insight in hind sight the unicode could have been designed as follows:

<code page><alphabet string soup><code page><alpabet string soup>

and so forth. However the necessary software and hardware to facilitate this switching between types was not present.

Now back to Skybuck's Universal Data structure, one of the immediate desires is to create a list of available free memory for further segmentation and allocation and use for data structures and data fields and such.

Immediately the design of Windows Operating System comes to mind where lists of pages are describes to segment and describe the available memory pages and such.

So for operating system design it is essential to be able to describe a list of some sort.

Here is where it does become a bit fuzzy and it might require further work.

One possible idea is to describe a "Universal Type" like "Unicode".

Where data structures are described by a number.

Type 0 would be raw binary data, basically unknown data.
Type 1 would be the start of a list of universal data structures
Type 2 would be the end of a list of universal data structures.
Type 3 would be the start of a list of same type data. "efficient list"
Type 4 would be the end of a list of same type data. "efficient list"

Example of a generic list:

<generic list begin><data type><data content><data type><data content><generic list end>

Example of a efficient list:

<efficient list begin><data><data><data><data><efficient list end>

These types could be collected and described in "Universal Types" like unicode.

Bye for now,
Skybuck.

Melzzzzz

unread,

Apr 3, 2021, 10:43:55 AM4/3/21

to

What's that different then tagged union, already present in some
languages?

--
current job title: senior software engineer
skills: x86 aasembler,c++,c,rust,go,nim,haskell...

press any key to continue or any other to quit...

skybuck2000

unread,

Apr 5, 2021, 10:13:00 PM4/5/21

to

You are getting warm.

From the document I wrote you could indeed infer that it is programming language related.

However there is some difference.

This document is about storing data in binary form. Not in source code form.

The document is a bit misleading because it uses <> which mind remind some of XML, JSON, HTML and such.

Perhaps I should have used comma's like so (as was the case in Skybuck's Universal Code):

type, data, type, data, type, data

The sad part about programming languagues is that it throws away all it's information/source code and produces instructions and addresses, the rest is basically lost. (Because memory/RAM was expensive and CPUs were relatively slow)

This new idea out of necessity and flexibility requirements aims to store information in binary form so it's not lost and becomes part of the data.

Perhaps this document/idea should be expanded with "names" as well, which basically function as the address of the variable or data structure, then it may look like:

name, type, data, name, type, data, name, type, data

This is indeed basically how a pascal structure/record looks like in source code form:

type
TSomeType = integer;

TSomeRecord = record
SomeField : TSomeType;
SomeField2 : TSomeType;
end;

var
SomeRecord : TSomeRecord;

Now this can be stored in binary form as follows:

TSomeType = 0
TSomeRecord = 1

SomeRecord = 1000;
SomeField = 1001;
SomeField2 = 1002;

Binary Storage using Skybuck's Universal Code:

Conceptually:

SomeRecord, TSomeRecord, SomeField, TSomeType, <SomeFieldData>, SomeField2, TSomeType, <SomeField2Data>

Digital (comma's added for clearity):

10 00 00 01, 11, 10 00 00 11, 01, X, etc

To decode it now is some code necessary to understand how to interpret it, some exceptions could be made if certain types are encountered and thus encoded with if statements:

ReadName
ReadType

if Type = Record then
ReadName
ReadType
ReadData

^ it is clear to read more fields, a record terminator is required.

For now this is just a vague example of what might be possible, it's late and I am tired, but I hope you get the idea now a little bit better :)

It's about storing information in pure binary, with meta bits, and also interleaving of bits and fields to keep it somewhat efficient and especially flexible.

Better code would be:

TRecordBegin = 10
TRecordEnd = 11

if Type = TRecordBegin
repeat
ReadRecordField
until Type = TRecordEnd

proc ReadRecordField
ReadName
ReadType
ReadData

Bye for now,
Skybuck.

Melzzzzz

unread,

Apr 6, 2021, 11:11:28 AM4/6/21

to

You can do in Rust:
if let(TRecordBegin) = get_next_field() {
while true {
if let(TRecordEnd)) == get_next_field() {break}
}
}
So?

skybuck2000

unread,

Apr 7, 2021, 5:10:30 PM4/7/21

to

Ok, so now encode that in an efficient binary way that is flexible too ! =D

Bye,
Skybuck =D