Data volume cloning does not complete

66 views
Skip to first unread message

Manuri Perera

unread,
Feb 10, 2020, 8:14:29 AM2/10/20
to kubevirt-dev
Hi,

I cloned a data volume following[1] about 2 weeks back, and assumed it was completed. 
But now when I check the status I can see the following. 

Status:
  Phase:     CloneInProgress
  Progress:  6.24%

It does not seem to be progressing further. Is there a way to figure out the reason behind?


Thanks,
Manuri

Alexander Wels

unread,
Feb 10, 2020, 8:20:23 AM2/10/20
to Manuri Perera, kubevirt-dev
Hi, there are 3 pods that are important for this:

1. There will be a source pod in your namespace
2. There will be a target pod in your namespace
3. The upload proxy in the cdi namespace.
 
Both source and target pods should have the name of the pvc in them. If you check the logs of those pods it should give you a clue on what is wrong.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/19d63bfd-4b16-485a-85ee-3d62843288eb%40googlegroups.com.

Manuri Perera

unread,
Feb 10, 2020, 9:21:55 AM2/10/20
to kubevirt-dev
Thanks for the response Alexander!

I checked the logs of the cdi pod in the cdi namespace and saw the following error.


1 clone-controller.go:226] error processing pvc "myns/myapp": error verifying token: square/go-jose/jwt: validation failed, token is expired (exp)

Wondering what access token this is about. Any idea?

Thanks,
Manuri


On Monday, February 10, 2020 at 2:20:23 PM UTC+1, Alexander Wels wrote:
On Mon, Feb 10, 2020 at 8:14 AM Manuri Perera <amay...@gmail.com> wrote:
Hi,

I cloned a data volume following[1] about 2 weeks back, and assumed it was completed. 
But now when I check the status I can see the following. 

Status:
  Phase:     CloneInProgress
  Progress:  6.24%

It does not seem to be progressing further. Is there a way to figure out the reason behind?


Thanks,
Manuri

Hi, there are 3 pods that are important for this:

1. There will be a source pod in your namespace
2. There will be a target pod in your namespace
3. The upload proxy in the cdi namespace.
 
Both source and target pods should have the name of the pvc in them. If you check the logs of those pods it should give you a clue on what is wrong.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevi...@googlegroups.com.

Alexander Wels

unread,
Feb 10, 2020, 9:33:09 AM2/10/20
to Manuri Perera, kubevirt-dev
On Mon, Feb 10, 2020 at 9:22 AM Manuri Perera <amay...@gmail.com> wrote:
Thanks for the response Alexander!

I checked the logs of the cdi pod in the cdi namespace and saw the following error.


1 clone-controller.go:226] error processing pvc "myns/myapp": error verifying token: square/go-jose/jwt: validation failed, token is expired (exp)

Wondering what access token this is about. Any idea?

Thanks,
Manuri


So a clone is actually a special case upload. And in order to access the upload endpoint, you have to get a token (the token verifies the user is allowed to upload to the namespace). Once the token is generated, it is passed to the endpoint as part of the header. The error you are seeing is that the token is expired. The token is normally only valid for 5 minutes. The strange thing is, you said the transfer started, which means the token was accepted, and then the transfer stopped. So I am thinking that particular error is not related to the stall of the transfer. Once the transfer is started the token is not checked again.

To check if any errors occurred can you kubectl describe pvc myapp -n myns if any errors happened the events on the pvc will tell us (assuming the datavolume name is myapp, and the namespace is myns).
 
On Monday, February 10, 2020 at 2:20:23 PM UTC+1, Alexander Wels wrote:


On Mon, Feb 10, 2020 at 8:14 AM Manuri Perera <amay...@gmail.com> wrote:
Hi,

I cloned a data volume following[1] about 2 weeks back, and assumed it was completed. 
But now when I check the status I can see the following. 

Status:
  Phase:     CloneInProgress
  Progress:  6.24%

It does not seem to be progressing further. Is there a way to figure out the reason behind?


Thanks,
Manuri

Hi, there are 3 pods that are important for this:

1. There will be a source pod in your namespace
2. There will be a target pod in your namespace
3. The upload proxy in the cdi namespace.
 
Both source and target pods should have the name of the pvc in them. If you check the logs of those pods it should give you a clue on what is wrong.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/19d63bfd-4b16-485a-85ee-3d62843288eb%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/a674a0dc-e02a-479f-8682-a16897feffce%40googlegroups.com.

Manuri Perera

unread,
Feb 10, 2020, 9:42:01 AM2/10/20
to kubevirt-dev


On Monday, February 10, 2020 at 3:33:09 PM UTC+1, Alexander Wels wrote:


On Mon, Feb 10, 2020 at 9:22 AM Manuri Perera <amay...@gmail.com> wrote:
Thanks for the response Alexander!

I checked the logs of the cdi pod in the cdi namespace and saw the following error.


1 clone-controller.go:226] error processing pvc "myns/myapp": error verifying token: square/go-jose/jwt: validation failed, token is expired (exp)

Wondering what access token this is about. Any idea?

Thanks,
Manuri


So a clone is actually a special case upload. And in order to access the upload endpoint, you have to get a token (the token verifies the user is allowed to upload to the namespace). Once the token is generated, it is passed to the endpoint as part of the header. The error you are seeing is that the token is expired. The token is normally only valid for 5 minutes. The strange thing is, you said the transfer started, which means the token was accepted, and then the transfer stopped. So I am thinking that particular error is not related to the stall of the transfer. Once the transfer is started the token is not checked again.
Thanks for explaining this! 

To check if any errors occurred can you kubectl describe pvc myapp -n myns if any errors happened the events on the pvc will tell us (assuming the datavolume name is myapp, and the namespace is myns).
There aren't any events. (Events: <none>)
 And the status is 'Bound'

However, I can see the following as well on the logs.

1 upload-controller.go:396] Error target resources requests storage size is smaller than the source validating clone spec, ignoring

May be this has something to do with the problem.
 
On Monday, February 10, 2020 at 2:20:23 PM UTC+1, Alexander Wels wrote:


On Mon, Feb 10, 2020 at 8:14 AM Manuri Perera <amay...@gmail.com> wrote:
Hi,

I cloned a data volume following[1] about 2 weeks back, and assumed it was completed. 
But now when I check the status I can see the following. 

Status:
  Phase:     CloneInProgress
  Progress:  6.24%

It does not seem to be progressing further. Is there a way to figure out the reason behind?


Thanks,
Manuri

Hi, there are 3 pods that are important for this:

1. There will be a source pod in your namespace
2. There will be a target pod in your namespace
3. The upload proxy in the cdi namespace.
 
Both source and target pods should have the name of the pvc in them. If you check the logs of those pods it should give you a clue on what is wrong.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/19d63bfd-4b16-485a-85ee-3d62843288eb%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevi...@googlegroups.com.

Alexander Wels

unread,
Feb 10, 2020, 9:53:53 AM2/10/20
to Manuri Perera, kubevirt-dev
On Mon, Feb 10, 2020 at 9:42 AM Manuri Perera <amay...@gmail.com> wrote:


On Monday, February 10, 2020 at 3:33:09 PM UTC+1, Alexander Wels wrote:


On Mon, Feb 10, 2020 at 9:22 AM Manuri Perera <amay...@gmail.com> wrote:
Thanks for the response Alexander!

I checked the logs of the cdi pod in the cdi namespace and saw the following error.


1 clone-controller.go:226] error processing pvc "myns/myapp": error verifying token: square/go-jose/jwt: validation failed, token is expired (exp)

Wondering what access token this is about. Any idea?

Thanks,
Manuri


So a clone is actually a special case upload. And in order to access the upload endpoint, you have to get a token (the token verifies the user is allowed to upload to the namespace). Once the token is generated, it is passed to the endpoint as part of the header. The error you are seeing is that the token is expired. The token is normally only valid for 5 minutes. The strange thing is, you said the transfer started, which means the token was accepted, and then the transfer stopped. So I am thinking that particular error is not related to the stall of the transfer. Once the transfer is started the token is not checked again.
Thanks for explaining this! 

To check if any errors occurred can you kubectl describe pvc myapp -n myns if any errors happened the events on the pvc will tell us (assuming the datavolume name is myapp, and the namespace is myns).
There aren't any events. (Events: <none>)
 And the status is 'Bound'
 
However, I can see the following as well on the logs.

1 upload-controller.go:396] Error target resources requests storage size is smaller than the source validating clone spec, ignoring

May be this has something to do with the problem.

We do some checks before allowing a clone to happen. For instance, you are not allowed to clone from a PVC into a smaller one, as we don't know if the contents will fit into the new PVC. But that will block the clone completely and the cloning process will not start at all. That is the message written to log if that happens. Did you try to clone something into a smaller PVC?
 
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/a136feb1-e528-4a33-81e6-ae1df3db66af%40googlegroups.com.

Manuri Perera

unread,
Feb 10, 2020, 10:00:16 AM2/10/20
to kubevirt-dev


On Monday, February 10, 2020 at 3:53:53 PM UTC+1, Alexander Wels wrote:


On Mon, Feb 10, 2020 at 9:42 AM Manuri Perera <amay...@gmail.com> wrote:


On Monday, February 10, 2020 at 3:33:09 PM UTC+1, Alexander Wels wrote:


On Mon, Feb 10, 2020 at 9:22 AM Manuri Perera <amay...@gmail.com> wrote:
Thanks for the response Alexander!

I checked the logs of the cdi pod in the cdi namespace and saw the following error.


1 clone-controller.go:226] error processing pvc "myns/myapp": error verifying token: square/go-jose/jwt: validation failed, token is expired (exp)

Wondering what access token this is about. Any idea?

Thanks,
Manuri


So a clone is actually a special case upload. And in order to access the upload endpoint, you have to get a token (the token verifies the user is allowed to upload to the namespace). Once the token is generated, it is passed to the endpoint as part of the header. The error you are seeing is that the token is expired. The token is normally only valid for 5 minutes. The strange thing is, you said the transfer started, which means the token was accepted, and then the transfer stopped. So I am thinking that particular error is not related to the stall of the transfer. Once the transfer is started the token is not checked again.
Thanks for explaining this! 

To check if any errors occurred can you kubectl describe pvc myapp -n myns if any errors happened the events on the pvc will tell us (assuming the datavolume name is myapp, and the namespace is myns).
There aren't any events. (Events: <none>)
 And the status is 'Bound'
 
However, I can see the following as well on the logs.

1 upload-controller.go:396] Error target resources requests storage size is smaller than the source validating clone spec, ignoring

May be this has something to do with the problem.

We do some checks before allowing a clone to happen. For instance, you are not allowed to clone from a PVC into a smaller one, as we don't know if the contents will fit into the new PVC. But that will block the clone completely and the cloning process will not start at all. That is the message written to log if that happens. Did you try to clone something into a smaller PVC?
In that case this is also not relevant because cloning did start and it is in "CloneInProgress" status. Unfortunately I can't see any other error logs in the cdi pod's logs :(  
 

Manuri Perera

unread,
Feb 10, 2020, 12:19:56 PM2/10/20
to kubevirt-dev
I started over and found out that there is an error stack trace saying the connection timed out when sending a POST request to the cdi upload endpoint, in the source pod's logs. However I could see that cdi-upload pod is running fine and there is a service running on the ip/port the POST request is being sent to. 

Alexander Wels

unread,
Feb 10, 2020, 1:06:28 PM2/10/20
to Manuri Perera, kubevirt-dev
On Mon, Feb 10, 2020 at 12:20 PM Manuri Perera <amay...@gmail.com> wrote:
I started over and found out that there is an error stack trace saying the connection timed out when sending a POST request to the cdi upload endpoint, in the source pod's logs. However I could see that cdi-upload pod is running fine and there is a service running on the ip/port the POST request is being sent to. 


Anything interesting in the upload server pod? You can do kubectl logs -p upload-server-pod to get the logs from the pod before it restarted if it crashed for some reason. Also is the restart count on the upload server pod higher than 0?
 
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/26044f10-c4c9-470f-b10b-624b77be69e6%40googlegroups.com.

Manuri Perera

unread,
Feb 11, 2020, 3:26:21 AM2/11/20
to kubevirt-dev
The upload pod is running fine and there are no restarts at all.

Following are the logs from the pod.

I0210 16:36:37.444122       1 uploadserver.go:62] Upload destination: /data/disk.img
I0210 16:36:37.444579       1 uploadserver.go:64] Running server on 0.0.0.0:8443

Manuri Perera

unread,
Feb 11, 2020, 4:15:07 AM2/11/20
to kubevirt-dev
Since both source and target pods are running fine, I suspect the problem is communication across namespaces? Are there any limitations in doing this, or configurations that could be missing?

Manuri Perera

unread,
Feb 11, 2020, 5:56:08 AM2/11/20
to kubevirt-dev
Finally figured out it was a network policy blocking the communication! Thanks for the help!

Alexander Wels

unread,
Feb 11, 2020, 8:23:08 AM2/11/20
to Manuri Perera, kubevirt-dev
If you want to clone across namespace you will need a specific rbac rule for your users, this is explained here: https://github.com/kubevirt/containerized-data-importer/blob/master/doc/RBAC.md#pvc-cloning

To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/d037af5f-e040-4fd6-a0f7-17f9f9ed5a11%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages