Thanks for getting back.
On Tue, 2020-05-05 at 12:02 +0200, Patrick Ohly wrote:
> 'Nicolas Trangez' via container-storage-interface-community
> <
container-storage-...@googlegroups.com> writes:
> > While building a CSI driver and using the `csi-sanity` test-suite
> > to
> > validate it, I'm running in a couple of situations where either the
> > test-suite seems to make incorrect assumptions when compared to the
> > spec (filing PRs for those), or where a test fails against my
> > driver
> > but the spec is *not* clear as to whether the test is 'right' or
> > not.
> >
> > In this particular case, there's a test [1] which sends a
> > `ListVolumes`
> > request to the driver with a (supposedly) invalid `starting_token`.
> > Indeed, this `starting_token` is hard-coded in the test and *not*
> > received from the driver as `next_token` from a prior
> > `ListVolumesResponse`.
>
> That assumption is that this starting token ("invalid-token") is
> unlikely to be valid in any CSI driver.
>
> > The spec indeed lists 'Invalid starting_token` as one of the
> > `ListVolumes` error conditions [2], however, I fail to find where
> > validity of such `starting_token` is defined (modulo string
> > length).
>
> It isn't. The CSI driver itself may use whatever it wants as tokens.
OK, clear.
>
> > If there are no rules w.r.t. token values, then the test seems
> > incorrect, making the assumption such token is to be the
> > `next_token`
> > of a prior request, which would make the driver more stateful than
> > it
> > should be (e.g., when the set of volumes was changed between the
> > various calls). If I'm not mistaken, a valid implementation of
> > `ListVolumes` for a driver where no volumes are created yet could
> > even
> > be to return a response with 0 entries for *any* `starting_token`
> > (there won't be any volumes after any such token, ever).
>
> So you are arguing that it is valid to skip input validation if there
> are no volumes? That seems a bit fishy to me.
Input validation based on what? Especially since many CSI drivers are
likely 'fronts' for some other storage system, turning the driver
itself stateless (ideally), the following scenario seems not unlikely:
Let's assume the CSI driver is indeed a front, and the backing storage
system supports listing of volumes with pagination, and a starting/next
token. The ID of a volume is its (unique) name, and pagination happens
in alphabetical order over said name.
SP state: volumes = [avolume, invalid-token, zvolume]
CO -> CSI: ListVolumes(2, "")
CSI -> SP: ListVolumes(2, "")
SP -> CSI: Volumes([avolume, invalid-token], invalid-token)
CSI -> CO: Volumes([avolume, invalid-token], invalid-token)
Other SP client: DeleteVolume(zvolume)
SP state: volumes = [avolume, invalid-token]
CO -> CSI: ListVolumes(2, invalid-token)
If I understand correctly, the current test would require this request
to fail because the start token would be invalid (it's not, it was the
next token of a prior ListVolumes). To my understanding, the response
should now be
CSI -> SP: ListVolumes(2, invalid-token)
SP -> CSI: Volumes([], "") # No more volumes after 'invalid-token'
CSI -> CO: Volumes([], "") # Pagination is over
The time between the first CO->CSI ListVolumes call and the second CO-
>CSI ListVolumes call could be unbounded, *anything* could have
happened in-between (and the spec correctly explains there's no
consistency requirements of the pagination results towards the CO).
> Is this usage of "invalid-token" causing problems in practice, i.e.
> are
> there drivers where it is a valid token?
Unless the driver were stateful, there's no way for it to know whether
any kind of token is 'valid' or not, except if tokens would be signed
by the driver itself or something along those lines, which seems
overkill/useless (CO <-> driver communication is assumed 'privileged',
right?)
Sure. Though again, this only works under the assumptions tokens *do*
adhere to some particular format which can be validated, which is not
necessarily the case, it depends on the backing storage system and its
database/API.
Since pagination is not necessarily consistent (by design), this is
indeed an issue.
Nicolas