A few ideas of engine framework

9 views
Skip to first unread message

KaiGai Kohei

unread,
Apr 1, 2010, 2:46:31 AM4/1/10
to trond....@gmail.com, memc...@googlegroups.com, dsal...@gmail.com, tmae...@gmail.com
Hello,

As we discussed before, I focus on access control facilities in memcached.
In the "memcached and access control" thread, I was suggested to implement
it as an engine module rather than core features.
Except for a few matters, it seems to be a feasible approach.

The issue about remove() method was resolved. (Thanks!)

However, we still have matters which can prevent access control feature
as an engine module. So, I'd like to discuss the following ideas.

* A new server API to obtain socket file descriptor.

SELinux provides an API to retrieve security context of the peer process
for the given socket file descriptor. (See, getpeercon(3))
It allows server processes to know privilege of the client process, and
we can utilize the privilege for access control decision inside server
process.
Right now, conn->sfd is socket file descriptor of the session. The "conn"
data is delivered to engine module as an opaque "cookie". In actually, we
can fetch the socket file descriptor based on the knowledge of internal
format of the structure. But it seems to me basically unpreferable.

I'd like to add a new server API, as follows:

int (*get_socket_file_descriptor)(const void *cookie);


* New APIs to retrieve length of key/data

Right now, the item structure has nbytes (uint32_t) and nkey (uint16_t) fields,
and the core memcached can see the length of key/data without any interposition
of engine modules.
Encapsulation of the references to nbytes/nkey is an awaited feature for me,
because it makes possible to store security properties of access control stuff,
such as ownership or security context.

I plan the upcoming selinux module intermediate between the core memcached
and the secondary module (such as default_engine.so). For example, when get()
method is called, selinux's get() shall be invoked at first. Then, it checks
user's privilege and calls the secondary module if allowed, like bucket engine.

I think the most portable approach to store security properties of items is
to inject these properties as a part of data field prior to invocation of
the secondary module.
For example, if user tries to store a pair of {key:"aaa", value:"foovarbaz"},
the memcached invokes the primary engine module with this key-value pair.
Assume the default security context of the item object in this case is
"classified". So, SELinux module shall inject into the value, then it calls
the secondary module. The secondary module see the modified key-value pair,
as if {key:"aaa", value:"classified\0foovarbaz"} is given.
^^^^^^^^^^^^
I believe this approach will work with various kind of secondary modules,
because they can see just a bit longer data than the original.
The get_item_data() method gives the primary module a chance to fix up
pointer of the data field. For example, SELinux can return the address
next to the first '\0' in the returned data field from the secondary
module.

However, the memcached refers nbytes and nkey field of item structure without
any method calls, so it misunderstand length of the data, because item->nbytes
contains total length of the security context and actual data.

So, I'd like to add the following two engine APIs:

uint16_t (*item_get_nkey)(const item *item);

uint32_t (*item_get_ndata)(const item *item);

It allows the primary module (selinux) to return the modified length of the
data, being consistent with item_get_data() method.

In my personal opinion, I don't think nkey and nbytes are necessary fields
in the common item structure. References to them should be encapsulated to
the engine module.


* Flags to show what features are provided with the engine module.

Right now, when the primary module (like selinux or bucket) load secondary
engine modules, there is no way to know what features are provided with
the secondary modules.

For example, if the secondary module does not have persistent storage support,
the selinux module wants to store the security attribute of the item in the
security identifier form rather than flat text form, from the performance
perspective.

I'd like to add a "flags" member within engine_interface structure, to inform
the caller
Perhasp, we can consider the following flags right now?

ENGINE_FEATURE_PSEUDO = 0x0001 /* set, if module does not manage items actually */
ENGINE_FEATURE_STORAGE = 0x0002 /* set, if item can be stored in persistent storage */
:

* Isn't settings.engine.v1->xxxx() every time ugly?

Right now, the core memcached code calls engine methods using function pointer
every time. In my sense, it should be wrapped with a thin layer, like:

static inline ENGINE_ERROR_CODE
engine_initialize(ENGINE_HANDLE* handle, const char* config_str)
{
if (settings.engine.v0.interface == 1)
return settings.engine.v1->initialize(handle, config_str);
else
return ENGINE_EINVAL;
}

Here is no functional differences, but it enables to keep code clean and
to handle v2, v3 or later version easily.


If we need a patch, I'll submit it later.
Any opinions?

Thanks,
--
KaiGai Kohei <kai...@ak.jp.nec.com>

KaiGai Kohei

unread,
Apr 1, 2010, 9:15:23 PM4/1/10
to trond....@gmail.com, memc...@googlegroups.com, dsal...@gmail.com, tmae...@gmail.com
The attached patch reworked the engine framework based on the following
three ideas.

* A new server API to obtain socket file descriptor.

The get_socket_fd() server API allows engines to know properties of
connection socket.

* New APIs to retrieve length of key/data

The item_get_nkey() and item_get_ndata() engine APIs replaced direct
references to item->nkey and item->nbytes.

* Flags to show what features are provided with the engine module.

The 'features' field of ENGINE_HANDLE enables to inform the core
memcached (or intermediation module, like bucket) features of the
loaded engine module.

Any comments are welcome.

Thanks,

mcd-engine-reworks.1.patch

Dustin Sallings

unread,
Apr 1, 2010, 10:29:34 PM4/1/10
to KaiGai Kohei, tmae...@gmail.com, dsal...@gmail.com, memc...@googlegroups.com, trond....@gmail.com

Can you submit this as a series of independent changes, or perhaps a just put up a repo somewhere?

We appreciate the work, but we'll need to break it up into consumable chunks.

dustin sallings

On Apr 1, 2010 6:35 PM, "KaiGai Kohei" <kai...@ak.jp.nec.com> wrote:

The attached patch reworked the engine framework based on the following
three ideas.


* A new server API to obtain socket file descriptor.

The get_socket_fd() server API allows engines to know properties of
connection socket.


* New APIs to retrieve length of key/data

The item_get_nkey() and item_get_ndata() engine APIs replaced direct
references to item->nkey and item->nbytes.


* Flags to show what features are provided with the engine module.

The 'features' field of ENGINE_HANDLE enables to inform the core
memcached (or intermediation module, like bucket) features of the
loaded engine module.

Any comments are welcome.

Thanks,


(2010/04/01 15:46), KaiGai Kohei wrote:
> Hello,
>

> As we discussed before, I focus on access con...

KaiGai Kohei

unread,
Apr 2, 2010, 1:31:16 AM4/2/10
to Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com, memc...@googlegroups.com, trond....@gmail.com
(2010/04/02 11:29), Dustin Sallings wrote:
> Can you submit this as a series of independent changes, or perhaps a
> just put up a repo somewhere?
>
> We appreciate the work, but we'll need to break it up into consumable
> chunks.

http://github.com/kaigai/memcached-selinux

I set up this repository.

The 'master' branch is a clone of the Trond's engine tree.

$ git clone git://github.com/kaigai/memcached-selinux.git
$ cd memcached-selinux
$ git diff origin/master origin/reworks_1
-> It adds get_socket_fd() server API to obtain socket file descriptor
for the given cookie.

$ git diff origin/reworks_1 origin/reworks_2
-> It adds item_get_nkey() and item_get_ndata() engine APIs, to inject
security attribute as a part of values by intermediation modules
(such as bucket or selinux).

$ git diff origin/reworks_2 origin/reworks_3
-> It adds the 'features' field at engine_interface structure, to
inform what features are supported by the loaded module.

$ git diff origin/reworks_3 origin/reworks_4
-> It replaces settings.engine.v1->xxx(...) invocations by wrapper
functions.

# The 'selinux' branch is not available yet.

Thanks,

> dustin sallings
>
>> On Apr 1, 2010 6:35 PM, "KaiGai Kohei" <kai...@ak.jp.nec.com

>> <mailto:kai...@ak.jp.nec.com>> wrote:
>>
>> The attached patch reworked the engine framework based on the following
>> three ideas.
>>
>>
>> * A new server API to obtain socket file descriptor.
>>
>> The get_socket_fd() server API allows engines to know properties of
>> connection socket.
>>
>>
>> * New APIs to retrieve length of key/data
>>
>> The item_get_nkey() and item_get_ndata() engine APIs replaced direct
>> references to item->nkey and item->nbytes.
>>
>>
>> * Flags to show what features are provided with the engine module.
>>
>> The 'features' field of ENGINE_HANDLE enables to inform the core
>> memcached (or intermediation module, like bucket) features of the
>> loaded engine module.
>>
>> Any comments are welcome.
>>
>> Thanks,
>>
>>
>> (2010/04/01 15:46), KaiGai Kohei wrote:
>> > Hello,
>> >
>> > As we discussed before, I focus on access con...
>>


--
KaiGai Kohei <kai...@ak.jp.nec.com>

Trond Norbye

unread,
Apr 2, 2010, 3:13:45 AM4/2/10
to KaiGai Kohei, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com, memc...@googlegroups.com

I'm currently up in the mountains at our cabin without my computer.
I'll look at the patches tomorrow when I get back home.

Trond

Sent from my iPhone

Trond Norbye

unread,
Apr 4, 2010, 3:17:53 PM4/4/10
to KaiGai Kohei, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com, memc...@googlegroups.com

On 2. apr. 2010, at 07.31, KaiGai Kohei wrote:

> (2010/04/02 11:29), Dustin Sallings wrote:
>> Can you submit this as a series of independent changes, or perhaps a
>> just put up a repo somewhere?
>>
>> We appreciate the work, but we'll need to break it up into consumable
>> chunks.
>
> http://github.com/kaigai/memcached-selinux
>
> I set up this repository.
>
> The 'master' branch is a clone of the Trond's engine tree.
>
> $ git clone git://github.com/kaigai/memcached-selinux.git
> $ cd memcached-selinux
> $ git diff origin/master origin/reworks_1
> -> It adds get_socket_fd() server API to obtain socket file descriptor
> for the given cookie.
>

I've applied this patch..


> $ git diff origin/reworks_1 origin/reworks_2
> -> It adds item_get_nkey() and item_get_ndata() engine APIs, to inject
> security attribute as a part of values by intermediation modules
> (such as bucket or selinux).
>

What is the primary motivation for doing this? I don't see why backends would "dynamically change" these values for an item. The reason I added a function to get the key and data was because one could imagine that they could be stored on different locations (or memory mapped data areas)... CAS is called through the api to allow the cas to be optional in the backend if you don't want to waste 8 bytes per item... From what I've seen earlier you chose to store your security information as a textual string after the key? you could still do that but then let nkey contain the number of bytes in the key, and keep the other information somewhere else..


> $ git diff origin/reworks_2 origin/reworks_3
> -> It adds the 'features' field at engine_interface structure, to
> inform what features are supported by the loaded module.
>

What are you going to use this information for??? Your patch doesn't contain any use of the field making it hard for me to understand the use case..

> $ git diff origin/reworks_3 origin/reworks_4
> -> It replaces settings.engine.v1->xxx(...) invocations by wrapper
> functions.
>

The source code does not follow the same coding standard as the rest of memcached...

Cheers,

Trond

KaiGai Kohei

unread,
Apr 4, 2010, 8:37:56 PM4/4/10
to Trond Norbye, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com, memc...@googlegroups.com
(2010/04/05 4:17), Trond Norbye wrote:
>
> On 2. apr. 2010, at 07.31, KaiGai Kohei wrote:
>
>> (2010/04/02 11:29), Dustin Sallings wrote:
>>> Can you submit this as a series of independent changes, or perhaps a
>>> just put up a repo somewhere?
>>>
>>> We appreciate the work, but we'll need to break it up into consumable
>>> chunks.
>>
>> http://github.com/kaigai/memcached-selinux
>>
>> I set up this repository.
>>
>> The 'master' branch is a clone of the Trond's engine tree.
>>
>> $ git clone git://github.com/kaigai/memcached-selinux.git
>> $ cd memcached-selinux
>> $ git diff origin/master origin/reworks_1
>> -> It adds get_socket_fd() server API to obtain socket file descriptor
>> for the given cookie.
>>
>
> I've applied this patch..

Thanks,

>> $ git diff origin/reworks_1 origin/reworks_2
>> -> It adds item_get_nkey() and item_get_ndata() engine APIs, to inject
>> security attribute as a part of values by intermediation modules
>> (such as bucket or selinux).
>>
>
> What is the primary motivation for doing this? I don't see why backends
> would "dynamically change" these values for an item. The reason I added
> a function to get the key and data was because one could imagine that
> they could be stored on different locations (or memory mapped data areas)...
> CAS is called through the api to allow the cas to be optional in the backend
> if you don't want to waste 8 bytes per item... From what I've seen earlier
> you chose to store your security information as a textual string after the
> key? you could still do that but then let nkey contain the number of bytes
> in the key, and keep the other information somewhere else..

What I try to do is to store the security information as a part of the value
for the secondary modules, rather than just after keys.
In this approach, the secondary module doesn't need special treatments on the
items with security information, because the it just stores the given value
as is.

As someone pointed out before, I don't think it is not good idea to handle
the security information specially and independently from existing keys and
values, because it is unclear whether the secondary modules pay mention about
this security properties.
If the secondary module see the security information just a part of values,
it shall be handled correctly. If not, it is just a bug in the secondary one.
So, I want to modify the value when it is delivered from the primary module
to the secondary one, and want to split up the item when it is delivered from
the primary module to the memcached core.

The existing get_item_data() allows the primary module to modify the pointer
of data field, although it might be an invention of new usage, but we cannot
fix up the length of the data field right now,
The purpose of get_item_ndata() is that the primary module inject its security
information transparently for both of the core memcached and the secondary
modules.

User
| ^
| | {key = "abcd", value = "foovarbaz"}
v |
Memcached
| ^
| | {key = "abcd", nkey=4, value = "foovarbaz", nbytes=9}
v |
Primary engine module
| ^
| | {key = "abcd", nkey=4, value = "secret\0foovarbaz", nbytes=16}
v | ^^^^^^^^ ... transparently injected
Secondary engine module <----> [it's item storage]


>> $ git diff origin/reworks_2 origin/reworks_3
>> -> It adds the 'features' field at engine_interface structure, to
>> inform what features are supported by the loaded module.
>>
>
> What are you going to use this information for???
> Your patch doesn't contain any use of the field making it hard for me to
> understand the use case..

Sorry, the upcoming selinux module shall see whether the secondary module
has persistent storage support, or not. It is a needed information to optimize
its performance and memory consumption.

If the secondary module has persistent storage support, the format of security
information must be portable, even if the memcached is rebooted or the data
files are copied.
The flat text representation (such as, "unclassified") is portable. It enables
us to understand security information of the items after rebooting, but needs
to parse when it makes access control decision.

If we can ensure duration of the item limited to memcached server process,
the security information is not necessary in text. For example, it can be
a pointer to the parsed security information for more quick access control
decision.

So, I would like to know what features are supported in the secondary module.


>> $ git diff origin/reworks_3 origin/reworks_4
>> -> It replaces settings.engine.v1->xxx(...) invocations by wrapper
>> functions.
>>
>
> The source code does not follow the same coding standard as the rest
> of memcached...

Does the coding standard mean tab, indent, case arc and others?
If so, I'll fix up them according to rest of memcached.

Or, are you saying the wrapper functions are not coding standard in the
memcached, so unnecessary?

Trond Norbye

unread,
Apr 7, 2010, 8:36:54 PM4/7/10
to KaiGai Kohei, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com, memc...@googlegroups.com

I'm still having a hard time to see how this will work in practice... How would you request the object from the underlying engine? The key would be longer and contain data you don't know about? or are you saying that engines have to call the function to internally determine the length of their keys for lookup?

Personally I don't think it is a good idea to store the security attribute on a per item basis. you are going to waste a _lot_ of memory (we made CAS optional so that users could save 8 bytes per item, this is going to be more). From a memory usage perspective I guess it's better to do something like Dustins bucket engine, and store items with the same label in a separate container. The drawback with this is that you have to search all the containers the connection dominates to find the object. In theory this could be a _lot_ of containers, but I would guess that in most setups it would most likely be a handful...

>
>>> $ git diff origin/reworks_3 origin/reworks_4
>>> -> It replaces settings.engine.v1->xxx(...) invocations by wrapper
>>> functions.
>>>
>>
>> The source code does not follow the same coding standard as the rest
>> of memcached...
>
> Does the coding standard mean tab, indent, case arc and others?
> If so, I'll fix up them according to rest of memcached.
>

Look at how the rest of the source code is formatted..

> Or, are you saying the wrapper functions are not coding standard in the
> memcached, so unnecessary?

But I don't see how it increase the readability of the source ;-) I would guess that when we are going to support more interface protocols we need to modify more than just the invocation of the function (new conditions etc). but who knows..

Cheers,

Trond

KaiGai Kohei

unread,
Apr 7, 2010, 10:15:11 PM4/7/10
to Trond Norbye, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com, memc...@googlegroups.com

In the allocate() method, selinux module calls the secondary allocate() with
modified length which allows to store both of security attribute and original
value.
Then, in the set() method, selinux module copies the original value on its
local buffer, and the combination of its security attribute and original value
will be delivered to the secondary engine.

It does not touch the key and nkey.

I'm not clear what means the function to internally determine the length of their
keys.
However, we may need to take "offset" argument to inform secondary (or third, ...)
engine module how much length of the data field from the head is used by the upper
engine modules. Of course, if we have single stack, the offset is always zero.


When we request an item, get() of selinux will be called. Then, it also calls
get() of the secondary engine with same key. If found, it checks permissions
between the client and item. Then, the item will be returned, if allowed.

When memcached core references the data of the item, it calls get_item_data()
and get_item_ndata(). On get_item_data(), selinux module calls the secondary
one, and increments the returned pointer by the length of security attribute.
On get_item_ndata(), selinux module also calls the secondary one, and decrements
the returned length by the length of security attribute.
In the result, the memcached core see the pointer next to end of the security
attribute, and the length which exclude the security attribute.

This idea enables intermediating module (such as selinux engine) to inject its
metadata transparently for both of the memcached core and secondary modules.

> Personally I don't think it is a good idea to store the security attribute on
> a per item basis. you are going to waste a _lot_ of memory (we made CAS optional
> so that users could save 8 bytes per item, this is going to be more). From a memory
> usage perspective I guess it's better to do something like Dustins bucket engine,
> and store items with the same label in a separate container. The drawback with this
> is that you have to search all the containers the connection dominates to find the
> object. In theory this could be a _lot_ of containers, but I would guess that in most
> setups it would most likely be a handful...

At first, system administrator needs to understand here is a trade-off between
better security and better performance or resource usage. The upcoming selinux
engine module will be installed, if they consider it is acceptable trade-off.

The approach with multiple containers will not work well, because we cannot
estimate how many containers are necessary in the start-up time.
Perhaps, we will have to initialize a new secondary container in run-time,
when a client with unprepared security attribute connected to.
If so, we have no way to control maximum usage of the memory consumption,
although user specified a certain value using -m.

In addition, here is no guarantee the traffics are flat per security attributes
of the clients. It can make dead space.

>>>> $ git diff origin/reworks_3 origin/reworks_4
>>>> -> It replaces settings.engine.v1->xxx(...) invocations by wrapper
>>>> functions.
>>>>
>>>
>>> The source code does not follow the same coding standard as the rest
>>> of memcached...
>>
>> Does the coding standard mean tab, indent, case arc and others?
>> If so, I'll fix up them according to rest of memcached.
>>
>
> Look at how the rest of the source code is formatted..
>
>> Or, are you saying the wrapper functions are not coding standard in the
>> memcached, so unnecessary?
>
> But I don't see how it increase the readability of the source ;-)
> I would guess that when we are going to support more interface protocols
> we need to modify more than just the invocation of the function (new conditions
> etc). but who knows..

Hmm. It depends on a subjective view which is more beautiful code. :-)
This wrapper functions are not affects to their functionalities, so I don't
push it strongly.

Trond Norbye

unread,
Apr 10, 2010, 2:52:31 AM4/10/10
to KaiGai Kohei, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com, memc...@googlegroups.com

Sorry for being silent for a bit, but I was trying to think this through. I've refactored the code now so that it should allow you to do what you want (but in a slightly different way). Instead of providing functions to query the individual parts of the item, the core calls a single function to fill a structure with all the information it needs. This means that all of the members of the old item structure is gone...

I've got all tests passing and will push the change when I'm done reviewing the change...

Trond

KaiGai Kohei

unread,
Apr 11, 2010, 8:56:29 PM4/11/10
to memc...@googlegroups.com, Trond Norbye, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com

Thanks for your efforts. It seems to me the right direction.

I understand an item object identified with a key has multiple properties
from perspective of the model.
The value is the most important property of the item. In addition, it already
has cas value and expired time; these are also properties of items.

It seems to me the new get_item_info() interface intends to contain various
kind of properties more than bare threes (value, cas and expired time),
because we can easily expand the number of 'struct iovec value[]' in the
new item_info.

If so, I think the allocate() interface should be also modified as follows:

ENGINE_ERROR_CODE (*allocate)(ENGINE_HANDLE* handle,
const void* cookie,
item **item,
const void* key,
! int nslots, /* # of nbytes[] array */
! const size_t nbytes[], /* # of properties */
const int flags);

It enables to suggest number of properties and length of them for each
property slot.
Maybe, we use three slots for value, cas and expired time.
The caller knows what slot is reserved for what purpose.

For example, if slot=0 is reserved for value, and slot=1 is reserved for
exptime, we calls the allocate() method with nbytes[0]=(length of value)
and nbytes[1]=sizeof(rel_time_t).

If here is an intermediating engine module (such as selinux), it can increment
the given nslot before calling the secondary engine, and the added slot will
be available for the intermediating engine transparently.

I'll try to implement a proof of concept patch.

KaiGai Kohei

unread,
Apr 12, 2010, 12:42:00 AM4/12/10
to Trond Norbye, memc...@googlegroups.com, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com

BTW, the current get_item_info() interface needs to share the identical
structure format between memcached-core and engine-module.
It may need to increment API version when we add minor changes into the
item_info structure in the future.

It seems to me the following interfaces are more portable to get/set
a certain attribute of the item. I'd like to see your opinion.

typedef enum {
ITEM_ATTR_KEY = 0,
ITEM_ATTR_VALUE,
ITEM_ATTR_EXPIRE_TIME,
:
ITEM_ATTR_CAS,
ITEM_LEAST_ATTRS
} ENGINE_ATTRIBUTE;
=> Any item have ITEM_LEAST_ATTRS of attributes, so all the engine
module has to provide a capability to store these attributes.
If here is an intermediating module, it can inject its attribute
with a larger attribute number.

const void *get_item_attr(ENGINE_HANDLE *handle,
ENGINE_ATTRIBUTE attnum,
const item *item,
uint32_t *nbytes);

It returns the pointer to the attribute field and its length.

bool set_item_attr(ENGINE_HANDLE *handle,
ENGINE_ATTRIBUTE attnum,
const item *item,
const uint8_t *data,
uint32_t nbytes);

It sets the given data with nbytes length on the specified number of
attribute field. The given nbytes length has to be less or equal with
the pre-allocated length in allocate() method.

Currently, we still have item_set_cas() method in engine.
It seems to me above two methods enables to get/set any attributes
without caller knowing any engine specific data layout.

Trond Norbye

unread,
Apr 12, 2010, 3:24:16 AM4/12/10
to KaiGai Kohei, memc...@googlegroups.com, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com

On 11. apr. 2010, at 17.56, KaiGai Kohei wrote:
>
> It seems to me the new get_item_info() interface intends to contain various
> kind of properties more than bare threes (value, cas and expired time),
> because we can easily expand the number of 'struct iovec value[]' in the
> new item_info.
>

That's not what the iovector is intended for. The network protocol specifies the attributes we use for an item (we need a new network protocol to encode other attributes), so that wasn't what I planned to use the iovector for. The usecase for the iovector is to allow the engines to store their items as multiple fragments (and not have to concatenate them into a single object before returning it to the core. (please note that currently the iovector is _never_ bigger than one element, but I'm planning to fix that this week.

The reason I didn't add that to the allocate function is that it would complicate the state machine in the core. I would think that most engine writers could live with allocating a continuous segment during allocation, and then store it in multiple segments when the user calls store().

Trond

KaiGai Kohei

unread,
Apr 12, 2010, 5:18:26 AM4/12/10
to Trond Norbye, memc...@googlegroups.com, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com
(2010/04/12 16:24), Trond Norbye wrote:
>
> On 11. apr. 2010, at 17.56, KaiGai Kohei wrote:
>>
>> It seems to me the new get_item_info() interface intends to contain various
>> kind of properties more than bare threes (value, cas and expired time),
>> because we can easily expand the number of 'struct iovec value[]' in the
>> new item_info.
>>
>
> That's not what the iovector is intended for. The network protocol specifies
> the attributes we use for an item (we need a new network protocol to encode
> other attributes), so that wasn't what I planned to use the iovector for.
> The usecase for the iovector is to allow the engines to store their items as
> multiple fragments (and not have to concatenate them into a single object
> before returning it to the core. (please note that currently the iovector
> is _never_ bigger than one element, but I'm planning to fix that this week.

Sorry, I'm a bit confusing.

Are you saying the data fragments are not necessary to be placed on the
continuous region in physically?
If so, it seems to me allocate() method needs to be revised to take an
arguments to indicate the number of fragments (with their lengths) on
allocation time?

However, it seems to me the following description introduces all the
needed region is allocated on the allocation time at once, then the data
region will be segmented into several chunks.

> The reason I didn't add that to the allocate function is that it would
> complicate the state machine in the core. I would think that most engine
> writers could live with allocating a continuous segment during allocation,
> and then store it in multiple segments when the user calls store().


BTW, here is a point I missed it.
Currently, do_add_delta() and do_store_item() with APPEND/PREPEND option
assume the data field only contains the original data. So, if we inject
extra attributes into the data region, it performs unexpectedly.

For example, if we append {key="aaa", value="mokemoke"} after the
{key="aaa", value="fugafuga"}, and both have "classified\0" as security
label, it will concatenate them without eliminating extra attributes,
like {key="aaa", value="fugafugaclassified\0mokemoke"}.
^^^^^^^^^^^^
So, we need to inform secondary module where is the data segment and where
is the extra attributes of items, even if the secondary module allocate
data region in flat.

Trond Norbye

unread,
Apr 12, 2010, 3:28:07 PM4/12/10
to memc...@googlegroups.com, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com

On 12. apr. 2010, at 02.18, KaiGai Kohei wrote:

> Are you saying the data fragments are not necessary to be placed on the
> continuous region in physically?
> If so, it seems to me allocate() method needs to be revised to take an
> arguments to indicate the number of fragments (with their lengths) on
> allocation time?
>
> However, it seems to me the following description introduces all the
> needed region is allocated on the allocation time at once, then the data
> region will be segmented into several chunks.
>

Yes, the intention of the iovector is so that you can store the value for a key as multiple fragments that would be concatenated onto the wire in the return packet. This means that we could implement append / prepend really fast by just adding another element to the internal iovector of an item.

We don't need to modify the allocate functions, because the core will call the method to get the iovector for the item when it needs to access the value.

An important piece that is missing right now is the function the engine can call to get the maximum number of fragments that the core supports. I should get that in there as soon as possible.

>
> BTW, here is a point I missed it.
> Currently, do_add_delta() and do_store_item() with APPEND/PREPEND option
> assume the data field only contains the original data. So, if we inject
> extra attributes into the data region, it performs unexpectedly.
>
> For example, if we append {key="aaa", value="mokemoke"} after the
> {key="aaa", value="fugafuga"}, and both have "classified\0" as security
> label, it will concatenate them without eliminating extra attributes,
> like {key="aaa", value="fugafugaclassified\0mokemoke"}.
> ^^^^^^^^^^^^
> So, we need to inform secondary module where is the data segment and where
> is the extra attributes of items, even if the secondary module allocate
> data region in flat.

If you're storing meta-data inside the real data of an item, you would just have to add your own implementation for append/prepend where you do the operation on the items and then try to call a cas set operation to the engine.

Cheers,

Trond


Trond Norbye

unread,
Apr 12, 2010, 9:44:10 PM4/12/10
to KaiGai Kohei, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com, memc...@googlegroups.com

On 1. apr. 2010, at 22.31, KaiGai Kohei wrote:

>
> $ git diff origin/reworks_2 origin/reworks_3
> -> It adds the 'features' field at engine_interface structure, to
> inform what features are supported by the loaded module.

I couldn't see this branch anymore, but I pushed a proposal for this today. I changed the signature for the existing "get_info" call to allow the engine to return an arbitrary number of features it support. If I remember correctly from your proposal you proposed this as a bitmask? Since this isn't going to be part of any time/space-critical loops, I think we shouldn't limit the design to a fixed number of features.

To make it even more "flexible" I decided to pass this as an array of features, each containing a value and a description.

The next thing we need to do is to start "registering" different features :-)

Cheers,

Trond


KaiGai Kohei

unread,
Apr 13, 2010, 4:17:58 AM4/13/10
to Trond Norbye, memc...@googlegroups.com, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com
(2010/04/13 4:28), Trond Norbye wrote:
>
> On 12. apr. 2010, at 02.18, KaiGai Kohei wrote:
>
>> Are you saying the data fragments are not necessary to be placed on the
>> continuous region in physically?
>> If so, it seems to me allocate() method needs to be revised to take an
>> arguments to indicate the number of fragments (with their lengths) on
>> allocation time?
>>
>> However, it seems to me the following description introduces all the
>> needed region is allocated on the allocation time at once, then the data
>> region will be segmented into several chunks.
>>
>
> Yes, the intention of the iovector is so that you can store the value for
> a key as multiple fragments that would be concatenated onto the wire in
> the return packet. This means that we could implement append / prepend
> really fast by just adding another element to the internal iovector of
> an item.
>
> We don't need to modify the allocate functions, because the core will call
> the method to get the iovector for the item when it needs to access the
> value.
>
> An important piece that is missing right now is the function the engine
> can call to get the maximum number of fragments that the core supports.
> I should get that in there as soon as possible.

Rather than the server API to inform maximum number of fragments,
it seems to me the get_item_info() should provides total number of
fragments in the specified item, and the memcached core calls engine
api to set up an iovector of the specified fragment of the item.

If so, a part of intermediating module will be able to drop a part of
the fragments which contains security context.

Of course, a large number of iteration of append/prepend may make
the data structure inefficient, so I think any mechanism to reorganize
well hashed fragments...

>> BTW, here is a point I missed it.
>> Currently, do_add_delta() and do_store_item() with APPEND/PREPEND option
>> assume the data field only contains the original data. So, if we inject
>> extra attributes into the data region, it performs unexpectedly.
>>
>> For example, if we append {key="aaa", value="mokemoke"} after the
>> {key="aaa", value="fugafuga"}, and both have "classified\0" as security
>> label, it will concatenate them without eliminating extra attributes,
>> like {key="aaa", value="fugafugaclassified\0mokemoke"}.
>> ^^^^^^^^^^^^
>> So, we need to inform secondary module where is the data segment and where
>> is the extra attributes of items, even if the secondary module allocate
>> data region in flat.
>
> If you're storing meta-data inside the real data of an item, you would just
> have to add your own implementation for append/prepend where you do the
> operation on the items and then try to call a cas set operation to the
> engine.

If we can store meta-data in a certain fragment, is it unnecessary to handle
append/preprend and delta operations in the intermediating module?

KaiGai Kohei

unread,
Apr 13, 2010, 4:35:15 AM4/13/10
to Trond Norbye, memc...@googlegroups.com, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com
(2010/04/13 10:44), Trond Norbye wrote:
>
> On 1. apr. 2010, at 22.31, KaiGai Kohei wrote:
>
>>
>> $ git diff origin/reworks_2 origin/reworks_3
>> -> It adds the 'features' field at engine_interface structure, to
>> inform what features are supported by the loaded module.
>
> I couldn't see this branch anymore, but I pushed a proposal for this today.

Sorry, I already removed it...

> I changed the signature for the existing "get_info" call to allow the engine
> to return an arbitrary number of features it support.
> If I remember correctly from your proposal you proposed this as a bitmask?
> Since this isn't going to be part of any time/space-critical loops, I think
> we shouldn't limit the design to a fixed number of features.

Do you have a plan to provide guideline of the feature's signature?
Is it informed via feature_info->feature? or need to parse description?

> To make it even more "flexible" I decided to pass this as an array of features,
> each containing a value and a description.
>
> The next thing we need to do is to start "registering" different features :-)

In your tree, default_engine.c and corresponding files are deployed at the
top level directory... I think an engine module has an individual directory
to store corresponding files, when we register different features.

Trond Norbye

unread,
Apr 13, 2010, 12:06:32 PM4/13/10
to KaiGai Kohei, memc...@googlegroups.com, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com

On 13. apr. 2010, at 01.35, KaiGai Kohei wrote:

>
>> I changed the signature for the existing "get_info" call to allow the engine
>> to return an arbitrary number of features it support.
>> If I remember correctly from your proposal you proposed this as a bitmask?
>> Since this isn't going to be part of any time/space-critical loops, I think
>> we shouldn't limit the design to a fixed number of features.
>
> Do you have a plan to provide guideline of the feature's signature?
> Is it informed via feature_info->feature? or need to parse description?
>

The intention is that we define different features as constants in the headerfile. People may however define their own feature sets that they want to be able to report, but is meaningless for others to care about...

>> To make it even more "flexible" I decided to pass this as an array of features,
>> each containing a value and a description.
>>
>> The next thing we need to do is to start "registering" different features :-)
>
> In your tree, default_engine.c and corresponding files are deployed at the
> top level directory... I think an engine module has an individual directory
> to store corresponding files, when we register different features.

The default engine is special in the way that it is bundled with memcached. Other engines should be developed in separate git repositories...

Cheers,

Trond


Trond Norbye

unread,
Apr 13, 2010, 2:59:20 PM4/13/10
to KaiGai Kohei, memc...@googlegroups.com, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com

On 13. apr. 2010, at 01.17, KaiGai Kohei wrote:
>
> Rather than the server API to inform maximum number of fragments,
> it seems to me the get_item_info() should provides total number of
> fragments in the specified item, and the memcached core calls engine
> api to set up an iovector of the specified fragment of the item.
>
> If so, a part of intermediating module will be able to drop a part of
> the fragments which contains security context.
>
> Of course, a large number of iteration of append/prepend may make
> the data structure inefficient, so I think any mechanism to reorganize
> well hashed fragments...
>

I don't like the idea that the engine should be able to require the server to support an arbitrary number of fragments. The core needs to allocate space for the iovector for transfer of the item, so if we make this completely arbitrary we cannot determine the memory usage we need for an item transfer.


>>> BTW, here is a point I missed it.
>>> Currently, do_add_delta() and do_store_item() with APPEND/PREPEND option
>>> assume the data field only contains the original data. So, if we inject
>>> extra attributes into the data region, it performs unexpectedly.
>>>
>>> For example, if we append {key="aaa", value="mokemoke"} after the
>>> {key="aaa", value="fugafuga"}, and both have "classified\0" as security
>>> label, it will concatenate them without eliminating extra attributes,
>>> like {key="aaa", value="fugafugaclassified\0mokemoke"}.
>>> ^^^^^^^^^^^^
>>> So, we need to inform secondary module where is the data segment and where
>>> is the extra attributes of items, even if the secondary module allocate
>>> data region in flat.
>>
>> If you're storing meta-data inside the real data of an item, you would just
>> have to add your own implementation for append/prepend where you do the
>> operation on the items and then try to call a cas set operation to the
>> engine.
>
> If we can store meta-data in a certain fragment, is it unnecessary to handle
> append/preprend and delta operations in the intermediating module?
>

I really don't want to force all engines to support storing attributes (or use io vectors)

Trond

KaiGai Kohei

unread,
Apr 13, 2010, 8:54:13 PM4/13/10
to Trond Norbye, memc...@googlegroups.com, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com
(2010/04/14 1:06), Trond Norbye wrote:
>
> On 13. apr. 2010, at 01.35, KaiGai Kohei wrote:
>
>>
>>> I changed the signature for the existing "get_info" call to allow the engine
>>> to return an arbitrary number of features it support.
>>> If I remember correctly from your proposal you proposed this as a bitmask?
>>> Since this isn't going to be part of any time/space-critical loops, I think
>>> we shouldn't limit the design to a fixed number of features.
>>
>> Do you have a plan to provide guideline of the feature's signature?
>> Is it informed via feature_info->feature? or need to parse description?
>>
>
> The intention is that we define different features as constants in the
> headerfile. People may however define their own feature sets that they
> want to be able to report, but is meaningless for others to care about...

Sorry, the worth of this new method is to inform what features are supported
in this engine module, so it is completely meaningless if people defines
their own feature sets.

Could you define a set of features in the core, like error code?
And, I don't think description for each features are necessary, because it
should be obvious when we see the feature code.

Example)
typedef enum {
ENGINE_FEATURE_CAS = 0, /**< has compare-and-set operation */
ENGINE_FEATURE_PERSISTENT_STORAGE, /**< has persistent storage support */
ENGINE_FEATURE_SECONDARY_ENGINE, /**< performs as pseudo engine */
ENGINE_FEATURE_ACCESS_CONTROL, /**< has access control feature */
NUM_OF_ENGINE_FEATURES,
} ENGINE_FEATURE_CODE;

Perhaps, # of features is changed depending on initialization option.
If "use_cas=false" will be given, the default_engine does not need to
return ENGINE_FEATURE_CAS, and so on.

>>> To make it even more "flexible" I decided to pass this as an array of features,
>>> each containing a value and a description.
>>>
>>> The next thing we need to do is to start "registering" different features :-)
>>
>> In your tree, default_engine.c and corresponding files are deployed at the
>> top level directory... I think an engine module has an individual directory
>> to store corresponding files, when we register different features.
>
> The default engine is special in the way that it is bundled with memcached.
> Other engines should be developed in separate git repositories...

Hmm. It seems to me I misunderstood the meaning of the "registering" different

mcd-get_info.patch

Trond Norbye

unread,
Apr 13, 2010, 10:26:28 PM4/13/10
to KaiGai Kohei, memc...@googlegroups.com, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com

On 13. apr. 2010, at 17.54, KaiGai Kohei wrote:
>
> Example)
> typedef enum {
> ENGINE_FEATURE_CAS = 0, /**< has compare-and-set operation */
> ENGINE_FEATURE_PERSISTENT_STORAGE, /**< has persistent storage support */
> ENGINE_FEATURE_SECONDARY_ENGINE, /**< performs as pseudo engine */
> ENGINE_FEATURE_ACCESS_CONTROL, /**< has access control feature */
> NUM_OF_ENGINE_FEATURES,
> } ENGINE_FEATURE_CODE;
>


This is exactly what I was meaning by registering the stuff in the headed file. I still want to be able to present this in a textual form to the user, so that's why I want to the description there.

I know for sure that I got some ideas for thing I could want to put into a feature code in _my_ engines, but I'm pretty sure that they are meaningless to other folks. I don't want to inform the rest of the world about my private extensions, so I don't want to modify a public header. At the same time I don't want to hack my memcached server to be able to print the features in a more sane matter...

Trond

KaiGai Kohei

unread,
Apr 15, 2010, 2:27:04 AM4/15/10
to Trond Norbye, memc...@googlegroups.com, Dustin Sallings, tmae...@gmail.com, dsal...@gmail.com

Indeed, the textual description might be necessary to log private extensions also.

As long as we can have a common set of feature identifiers, I have no concerns
to implement my engine module.
OK, I'll begin to implement my engine module based on the current interface.

Please wait for while I'm working on. Thanks,
--
KaiGai Kohei <kai...@ak.jp.nec.com>

Reply all
Reply to author
Forward
0 new messages