[CSI] When is a ListVolumesRequest starting_token (in)valid?

15 views
Skip to first unread message

Nicolas Trangez

unread,
May 5, 2020, 5:14:56 AM5/5/20
to container-storage-...@googlegroups.com
All,

While building a CSI driver and using the `csi-sanity` test-suite to
validate it, I'm running in a couple of situations where either the
test-suite seems to make incorrect assumptions when compared to the
spec (filing PRs for those), or where a test fails against my driver
but the spec is *not* clear as to whether the test is 'right' or not.

In this particular case, there's a test [1] which sends a `ListVolumes`
request to the driver with a (supposedly) invalid `starting_token`.
Indeed, this `starting_token` is hard-coded in the test and *not*
received from the driver as `next_token` from a prior
`ListVolumesResponse`.

The spec indeed lists 'Invalid starting_token` as one of the
`ListVolumes` error conditions [2], however, I fail to find where
validity of such `starting_token` is defined (modulo string length).

If there are no rules w.r.t. token values, then the test seems
incorrect, making the assumption such token is to be the `next_token`
of a prior request, which would make the driver more stateful than it
should be (e.g., when the set of volumes was changed between the
various calls). If I'm not mistaken, a valid implementation of
`ListVolumes` for a driver where no volumes are created yet could even
be to return a response with 0 entries for *any* `starting_token`
(there won't be any volumes after any such token, ever).


I'd appreciate some clarification.

Thanks in advance,

Nicolas

[1]
https://github.com/kubernetes-csi/csi-test/blob/99b7db913362368d25246ab05261ca14e92c6534/pkg/sanity/controller.go#L195
[2]
https://github.com/container-storage-interface/spec/blob/87a27a2496565857ed564284eaa762587feb9453/spec.md#listvolumes-errors

Patrick Ohly

unread,
May 5, 2020, 6:02:52 AM5/5/20
to Nicolas Trangez, container-storage-...@googlegroups.com
'Nicolas Trangez' via container-storage-interface-community
<container-storage-...@googlegroups.com> writes:
> While building a CSI driver and using the `csi-sanity` test-suite to
> validate it, I'm running in a couple of situations where either the
> test-suite seems to make incorrect assumptions when compared to the
> spec (filing PRs for those), or where a test fails against my driver
> but the spec is *not* clear as to whether the test is 'right' or not.
>
> In this particular case, there's a test [1] which sends a `ListVolumes`
> request to the driver with a (supposedly) invalid `starting_token`.
> Indeed, this `starting_token` is hard-coded in the test and *not*
> received from the driver as `next_token` from a prior
> `ListVolumesResponse`.

That assumption is that this starting token ("invalid-token") is
unlikely to be valid in any CSI driver.

> The spec indeed lists 'Invalid starting_token` as one of the
> `ListVolumes` error conditions [2], however, I fail to find where
> validity of such `starting_token` is defined (modulo string length).

It isn't. The CSI driver itself may use whatever it wants as tokens.

> If there are no rules w.r.t. token values, then the test seems
> incorrect, making the assumption such token is to be the `next_token`
> of a prior request, which would make the driver more stateful than it
> should be (e.g., when the set of volumes was changed between the
> various calls). If I'm not mistaken, a valid implementation of
> `ListVolumes` for a driver where no volumes are created yet could even
> be to return a response with 0 entries for *any* `starting_token`
> (there won't be any volumes after any such token, ever).

So you are arguing that it is valid to skip input validation if there
are no volumes? That seems a bit fishy to me.

Is this usage of "invalid-token" causing problems in practice, i.e. are
there drivers where it is a valid token?

One pattern that has been used elsewhere
(https://github.com/kubernetes-csi/csi-test/blob/99b7db913362368d25246ab05261ca14e92c6534/pkg/sanity/sanity.go#L155-L158)
is to let driver-specific code provide strings for testing. But that
becomes a bit difficult when using the csi-sanity command.

It might be necessary to test advanced usages of ListVolumes,
though. Pagination testing had to be disabled because it was based on
too many assumptions to be useful:
https://github.com/kubernetes-csi/csi-test/commit/46ef6af5f8d1b4d6d8a8c701160c805810a1e584

--
Best Regards

Patrick Ohly

Nicolas Trangez

unread,
May 5, 2020, 7:16:05 AM5/5/20
to Patrick Ohly, container-storage-...@googlegroups.com
Thanks for getting back.

On Tue, 2020-05-05 at 12:02 +0200, Patrick Ohly wrote:
> 'Nicolas Trangez' via container-storage-interface-community
> <container-storage-...@googlegroups.com> writes:
> > While building a CSI driver and using the `csi-sanity` test-suite
> > to
> > validate it, I'm running in a couple of situations where either the
> > test-suite seems to make incorrect assumptions when compared to the
> > spec (filing PRs for those), or where a test fails against my
> > driver
> > but the spec is *not* clear as to whether the test is 'right' or
> > not.
> >
> > In this particular case, there's a test [1] which sends a
> > `ListVolumes`
> > request to the driver with a (supposedly) invalid `starting_token`.
> > Indeed, this `starting_token` is hard-coded in the test and *not*
> > received from the driver as `next_token` from a prior
> > `ListVolumesResponse`.
>
> That assumption is that this starting token ("invalid-token") is
> unlikely to be valid in any CSI driver.
>
> > The spec indeed lists 'Invalid starting_token` as one of the
> > `ListVolumes` error conditions [2], however, I fail to find where
> > validity of such `starting_token` is defined (modulo string
> > length).
>
> It isn't. The CSI driver itself may use whatever it wants as tokens.

OK, clear.

>
> > If there are no rules w.r.t. token values, then the test seems
> > incorrect, making the assumption such token is to be the
> > `next_token`
> > of a prior request, which would make the driver more stateful than
> > it
> > should be (e.g., when the set of volumes was changed between the
> > various calls). If I'm not mistaken, a valid implementation of
> > `ListVolumes` for a driver where no volumes are created yet could
> > even
> > be to return a response with 0 entries for *any* `starting_token`
> > (there won't be any volumes after any such token, ever).
>
> So you are arguing that it is valid to skip input validation if there
> are no volumes? That seems a bit fishy to me.

Input validation based on what? Especially since many CSI drivers are
likely 'fronts' for some other storage system, turning the driver
itself stateless (ideally), the following scenario seems not unlikely:

Let's assume the CSI driver is indeed a front, and the backing storage
system supports listing of volumes with pagination, and a starting/next
token. The ID of a volume is its (unique) name, and pagination happens
in alphabetical order over said name.

SP state: volumes = [avolume, invalid-token, zvolume]
CO -> CSI: ListVolumes(2, "")
CSI -> SP: ListVolumes(2, "")
SP -> CSI: Volumes([avolume, invalid-token], invalid-token)
CSI -> CO: Volumes([avolume, invalid-token], invalid-token)

Other SP client: DeleteVolume(zvolume)
SP state: volumes = [avolume, invalid-token]

CO -> CSI: ListVolumes(2, invalid-token)

If I understand correctly, the current test would require this request
to fail because the start token would be invalid (it's not, it was the
next token of a prior ListVolumes). To my understanding, the response
should now be

CSI -> SP: ListVolumes(2, invalid-token)
SP -> CSI: Volumes([], "") # No more volumes after 'invalid-token'
CSI -> CO: Volumes([], "") # Pagination is over

The time between the first CO->CSI ListVolumes call and the second CO-
>CSI ListVolumes call could be unbounded, *anything* could have
happened in-between (and the spec correctly explains there's no
consistency requirements of the pagination results towards the CO).

> Is this usage of "invalid-token" causing problems in practice, i.e.
> are
> there drivers where it is a valid token?

Unless the driver were stateful, there's no way for it to know whether
any kind of token is 'valid' or not, except if tokens would be signed
by the driver itself or something along those lines, which seems
overkill/useless (CO <-> driver communication is assumed 'privileged',
right?)

> One pattern that has been used elsewhere
> (
> https://github.com/kubernetes-csi/csi-test/blob/99b7db913362368d25246ab05261ca14e92c6534/pkg/sanity/sanity.go#L155-L158
> )
> is to let driver-specific code provide strings for testing. But that
> becomes a bit difficult when using the csi-sanity command.

Sure. Though again, this only works under the assumptions tokens *do*
adhere to some particular format which can be validated, which is not
necessarily the case, it depends on the backing storage system and its
database/API.

>
> It might be necessary to test advanced usages of ListVolumes,
> though. Pagination testing had to be disabled because it was based on
> too many assumptions to be useful:
> https://github.com/kubernetes-csi/csi-test/commit/46ef6af5f8d1b4d6d8a8c701160c805810a1e584

Since pagination is not necessarily consistent (by design), this is
indeed an issue.

Nicolas

Patrick Ohly

unread,
May 5, 2020, 8:59:00 AM5/5/20
to Nicolas Trangez, container-storage-...@googlegroups.com
Nicolas Trangez <nicolas...@scality.com> writes:
> Thanks for getting back.
>
> On Tue, 2020-05-05 at 12:02 +0200, Patrick Ohly wrote:
>> 'Nicolas Trangez' via container-storage-interface-community
>> <container-storage-...@googlegroups.com> writes:
>> > If there are no rules w.r.t. token values, then the test seems
>> > incorrect, making the assumption such token is to be the
>> > `next_token`
>> > of a prior request, which would make the driver more stateful than
>> > it
>> > should be (e.g., when the set of volumes was changed between the
>> > various calls). If I'm not mistaken, a valid implementation of
>> > `ListVolumes` for a driver where no volumes are created yet could
>> > even
>> > be to return a response with 0 entries for *any* `starting_token`
>> > (there won't be any volumes after any such token, ever).
>>
>> So you are arguing that it is valid to skip input validation if there
>> are no volumes? That seems a bit fishy to me.
>
> Input validation based on what?

Based on whatever tokens the CSI driver uses. My expectation is that the
CSI driver itself generates the tokens or at least knows what the tokens
should look like and thus can detect garbage tokens which do not match
the expected format.

If they are completely opaque to the driver itself, then it indeed
cannot validate them. Is that the case in your driver?

If the token was generated by the CSI driver and merely has expired,
then it is not invalid and the driver must do something else.

> CO <-> driver communication is assumed 'privileged', right?

Yes.

>> One pattern that has been used elsewhere
>> (
>> https://github.com/kubernetes-csi/csi-test/blob/99b7db913362368d25246ab05261ca14e92c6534/pkg/sanity/sanity.go#L155-L158
>> )
>> is to let driver-specific code provide strings for testing. But that
>> becomes a bit difficult when using the csi-sanity command.
>
> Sure. Though again, this only works under the assumptions tokens *do*
> adhere to some particular format which can be validated, which is not
> necessarily the case, it depends on the backing storage system and its
> database/API.

In that case a nil token generate could disable this check.

Nicolas Trangez

unread,
May 5, 2020, 10:16:34 AM5/5/20
to Patrick Ohly, container-storage-...@googlegroups.com
Kind-of. As in, I *could* validate them (the CSI driver is merely a
proxy/translator to the SP API) because I happen to know the SP
internals. However, this feels like a layering violation.

The backing SP performs pagination on # of entries, so the 'token' is
merely a number. If you ask for '100 entries after the 99999999th
entry', it will (IMHO correctly) return an empty list.

Anyway, if I 'sign' the 'tokens' (numbers) in the CSI driver, e.g. by
simply adding a hardcoded prefix, I should be able to 'validate' them.

> If the token was generated by the CSI driver and merely has expired,
> then it is not invalid and the driver must do something else.

Fair enough.

>
> > CO <-> driver communication is assumed 'privileged', right?
>
> Yes.
>
> > > One pattern that has been used elsewhere
> > > (
> > > https://github.com/kubernetes-csi/csi-test/blob/99b7db913362368d25246ab05261ca14e92c6534/pkg/sanity/sanity.go#L155-L158
> > > )
> > > is to let driver-specific code provide strings for testing. But
> > > that
> > > becomes a bit difficult when using the csi-sanity command.
> >
> > Sure. Though again, this only works under the assumptions tokens
> > *do*
> > adhere to some particular format which can be validated, which is
> > not
> > necessarily the case, it depends on the backing storage system and
> > its
> > database/API.
>
> In that case a nil token generate could disable this check.

Agree.

Thanks for the clarification,

Nicolas

Patrick Ohly

unread,
May 5, 2020, 3:02:12 PM5/5/20
to Nicolas Trangez, container-storage-...@googlegroups.com
I get your point. But the spec does ask for a specific error code in
this situation and so csi-test has a test for it.

What happens if you just pass through this invalid token (and
"invalid-token" is invalid, because it isn't the number that the backend
uses)? Does the backend validate its input?

It doesn't have to be the CSI driver which does the validation. It could
also be the backend as long as the error codes that it returns are then
mapped to what is in the spec.

I don't know how useful this specific error code is. Someone must have
cared enough to add it. :shrug:
Reply all
Reply to author
Forward
0 new messages