Please read and adhere to the contribution guidelines.
Please tick the following:
https://github.com/SyneRBI/SIRF/pull/1305
(2 files)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans requested changes on this pull request.
Some detailed comments, but overall, I think we still need to decide on how to wrap AcquisitionModel, AcquisitionSensitivityModel, Resampler etc, i.e. anything that has forward and backward members. This should be written only once. You had https://github.com/SyneRBI/SIRF-Exercises/blob/71df538fe892ce621440544f14fa992c230fd120/notebooks/Deep_Learning_PET/sirf_torch.py#L38-L39, which seems completely generic, aside from naming of variables. So, something like this
class OperatorModule(torch.nn.Module): def __init__(self, sirf_operator, sirf_src, sirf_dest): """constructs wrapper. WARNING operations will overwrite content of `sirf_src` and `sirf_dest`""" ...
with usage
acq_model = pet.AcquisitionModelParallelProj() # etc etc torch_acq_model = sirf.torch.OperatorModule(acq_model, image, acq_data)
Ideally, we also provide for being able to call save_for_backward
> @@ -0,0 +1,123 @@
+try:
+ import torch
+except ImportError:
+ raise ImportError('Failed to import torch. Please install PyTorch first.')
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+class _objectiveFunctionModule3D(torch.autograd.Function):
why 3D? and why Module? (currently not derived from nn.Module)
> + components that vary between samples from the dataset. + We need to be able to choose the correct the acquisition model wrapper based + on the data, also there is the assumption that we have all the same data + components for every sample in the dataset, but heyho this is a start."""
I still don't know why we need this. Constructing all acquisition objects is not a torch functionality to me.
Of course, we could try to do multiplication with sensitivities in torch as opposed to SIRF/numpy, but that is out-of-scope for this PR in my opinion.
I see this situation more like
> + # find a way of checking if the input or output is an image + # then throw a warning and choose whether the wrapper should be forward + # or backward
don't do this. This is the job of the user. A wrapper cannot know. (take for instance Resampler where both are images.)
> + def __init__(self, acq_model, sirf_data_in_template, sirf_data_out_template, device, *args): + # find a way of checking if the input or output is an image + # then throw a warning and choose whether the wrapper should be forward + # or backward + if len(args) > 0: + # detemine the modality of the data + self.modality = args[0] + if len(args) > 1: + # determine the additional components of the data + self.second_arg = args[1] + # ... and so on + # then begin to assign the correct acquisition model wrapper + + + +class DatasetAcquisitionModels(torch.nn.Module):
I can't see why we need this, see above
> + ctx.sirf_obj = sirf_obj + ctx.image_template = image_template + ctx.x = x.detach().cpu().numpy().squeeze() + ctx.x = ctx.image_template.fill(ctx.x) + value_np = ctx.sirf_obj.get_value(ctx.x) + return torch.tensor(value_np).to(ctx.device) + + @staticmethod + def backward( + ctx, + in_grad): + grads_np = ctx.sirf_obj.get_gradient(ctx.x).as_array() + grads = torch.from_numpy(grads_np).to(ctx.device) * in_grad + return grads.unsqueeze(dim=0), None, None, None + +class _AcquisitionModelForward(torch.autograd.Function):
Why Forward? this should be the default. If there isn't a torch function to swap forward/backward, then we could have BackwardAcquisitionModel I guess
> + ctx.x = x.detach().cpu().numpy().squeeze() + ctx.x = ctx.image_template.fill(ctx.x)
these operations are going to occur many places, and are prone to changes, so I suggest to have a function
def fill_from_torch(sirf_object, tensor)
I suppose we could even put this somewhere in the main SIRF DataContainer as a member as well, but let's leave that for later.
> + grads_np = ctx.sirf_obj.get_gradient(ctx.x).as_array() + grads = torch.from_numpy(grads_np).to(ctx.device) * in_grad
Similarly to above, we should isolate this into a function
def torch_from_sirf(sirf_object, device): return torch.from_numpy(sirf_object.as_array()).to(ctx.device)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
Added a README with the some notes for a design discussion here.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
I'd prefer not to have too much code in the design discussion (duplication as well as distracting from actual design). You could include pointers to implementation in the README, but it's still small enough that this probably isn't worth the effort.
Just keep a skeleton in the README, the shorter the better...
How do you want people to react to your questions in the README? Via a review? (not conventional, but could work)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
> + +# based on +# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+ + +def sirf_to_torch(sirf_src, torch_dest, requires_grad=False): + if requires_grad: + return torch.tensor(sirf_src.as_array(), requires_grad=True).to(torch_dest.device) + else: + return torch.tensor(sirf_src.as_array()).to(torch_dest.device) + +def torch_to_sirf(torch_src, sirf_dest): + return sirf_dest.fill(torch_src.detach().cpu().numpy()) + + +class _ObjectiveFunctionModule(torch.autograd.Function):
naming: currently not a Module.
> + + sirf_measurements = ctx.sirf_acq_mdl.forward(torch_to_sirf(grad_output, ctx.sirf_backward_projected)) + grad = sirf_to_torch(sirf_measurements, ctx.torch_measurements, requires_grad=True) + return grad, None, None, None, None + +class AcquisitionModelForward(torch.nn.Module): + def __init__(self, acq_mdl, sirf_image_template, sirf_measurements_template, device = "cpu", ): + super(AcquisitionModelForward, self).__init__() + # get the shape of image and measurements + self.acq_mdl = acq_mdl + self.sirf_image_shape = sirf_image_template.as_array().shape + self.sirf_measurements_shape = sirf_measurements_template.as_array().shape + self.sirf_image_template = sirf_image_template + self.sirf_measurements_template = sirf_measurements_template + + self.torch_measurements_template = torch.tensor(sirf_measurements_template.as_array(), requires_grad=False).to(device)*0
do we need this template? These things are large...
> + + sirf_measurements = ctx.sirf_acq_mdl.forward(torch_to_sirf(grad_output, ctx.sirf_backward_projected)) + grad = sirf_to_torch(sirf_measurements, ctx.torch_measurements, requires_grad=True) + return grad, None, None, None, None + +class AcquisitionModelForward(torch.nn.Module): + def __init__(self, acq_mdl, sirf_image_template, sirf_measurements_template, device = "cpu", ): + super(AcquisitionModelForward, self).__init__() + # get the shape of image and measurements + self.acq_mdl = acq_mdl + self.sirf_image_shape = sirf_image_template.as_array().shape + self.sirf_measurements_shape = sirf_measurements_template.as_array().shape + self.sirf_image_template = sirf_image_template + self.sirf_measurements_template = sirf_measurements_template + + self.torch_measurements_template = torch.tensor(sirf_measurements_template.as_array(), requires_grad=False).to(device)*0
As far as I can see, sirf_to_torch only uses the device from the template.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
By the way, we should avoid running CI on this until it makes sense. Easiest is to put [ci skip] in the message of your last commit that you will push.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 3 commits.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@mehrhardt commented on this pull request.
>
def torch_to_sirf(torch_src, sirf_dest):
return sirf_dest.fill(torch_src.detach().cpu().numpy())
-class _ObjectiveFunctionModule(torch.autograd.Function):
+class _ObjectiveFunction(torch.autograd.Function):
@staticmethod
def forward(ctx,
In order to have a clean implementation, it might be nice to evaluate the ObjectiveFunction on the torch object only. One could for instance give the other arguments to the constructor.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@mehrhardt commented on this pull request.
>
def torch_to_sirf(torch_src, sirf_dest):
return sirf_dest.fill(torch_src.detach().cpu().numpy())
-class _ObjectiveFunctionModule(torch.autograd.Function):
+class _ObjectiveFunction(torch.autograd.Function):
@staticmethod
def forward(ctx,
Just noticed that you did this already for the AcquisitionModel. Why not here, too?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@ckolbPTB commented on this pull request.
> @@ -0,0 +1,276 @@
+try:
+ import torch
+except ModuleNotFoundError:
+ raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+def sirf_to_torch(sirf_src, device, requires_grad=False):
+ if requires_grad:
+ # use torch.tensor to infer data type
+ return torch.tensor(sirf_src.as_array(), requires_grad=True).to(device)
why can't we just provide the requires_grad to torch.tensor?
> + raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+def sirf_to_torch(sirf_src, device, requires_grad=False):
+ if requires_grad:
+ # use torch.tensor to infer data type
+ return torch.tensor(sirf_src.as_array(), requires_grad=True).to(device)
+ else:
+ return torch.tensor(sirf_src.as_array()).to(device)
+
+def torch_to_sirf(torch_src, sirf_dest):
+ return sirf_dest.fill(torch_src.detach().cpu().numpy())
Is fill() not always an in-place operation?
> + device = torch_image.device + sirf_image_template = torch_to_sirf(torch_image, sirf_image_template) + value_np = sirf_obj_func.get_value(sirf_image_template).as_array() + if torch_image.requires_grad: + ctx.save_for_backward(device, sirf_image_template, sirf_obj_func) + return torch.tensor(value_np, requires_grad=True).to(device) + else: + return torch.tensor(value_np).to(torch_image.device) + + @staticmethod + def backward(ctx, + grad_output + ): + + device, sirf_image_template, sirf_obj_func = ctx.saved_tensors + tmp_grad = sirf_obj.get_gradient(sirf_image_template)⬇️ Suggested change
- tmp_grad = sirf_obj.get_gradient(sirf_image_template) + sirf_grad = sirf_obj.get_gradient(sirf_image_template)
> + sirf_image_template = torch_to_sirf(torch_image, sirf_image_template) + value_np = sirf_obj_func.get_value(sirf_image_template).as_array() + if torch_image.requires_grad: + ctx.save_for_backward(device, sirf_image_template, sirf_obj_func) + return torch.tensor(value_np, requires_grad=True).to(device) + else: + return torch.tensor(value_np).to(torch_image.device) + + @staticmethod + def backward(ctx, + grad_output + ): + + device, sirf_image_template, sirf_obj_func = ctx.saved_tensors + tmp_grad = sirf_obj.get_gradient(sirf_image_template) + grad = sirf_to_torch(tmp_grad, device, requires_grad=True)⬇️ Suggested change
- grad = sirf_to_torch(tmp_grad, device, requires_grad=True) + grad = sirf_to_torch(sirf_grad, device, requires_grad=True)
> + @staticmethod + def backward(ctx, + grad_output + ): + + device, sirf_image_template, sirf_obj_func = ctx.saved_tensors + tmp_grad = sirf_obj.get_gradient(sirf_image_template) + grad = sirf_to_torch(tmp_grad, device, requires_grad=True) + return grad_output*grad, None, None, None + +class _AcquisitionModelForward(torch.autograd.Function): + @staticmethod + def forward(ctx, + torch_image, + sirf_image_template, + sirf_acq_mdl⬇️ Suggested change
- sirf_acq_mdl + sirf_acq_model
> + + n_batch = image.shape[0] + if self.sirf_image_shape[0] == 1: + # if 2D + if len(image.shape) == 3: + # if 2D and no channel then add singleton + # add singleton for SIRF + image = image.unsqueeze(1) + else: + # if 3D + if len(image.shape) == 4: + # if 3D and no channel then add singleton + image = image.unsqueeze(1) + n_channel = image.shape[1] + + # This looks horrible, but PyTorch should be able to trace.
Maybe something similar to
https://github.com/PTB-MR/mrpro/blob/534f324d71d224eaedb8bcfe7b11aedc263a9984/src/mrpro/utils/smap.py#L8
could be used here to make it easier to read and better generalizable.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
> @@ -0,0 +1,276 @@
+try:
+ import torch
+except ModuleNotFoundError:
+ raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+def sirf_to_torch(sirf_src, device, requires_grad=False):
+ if requires_grad:
+ # use torch.tensor to infer data type
+ return torch.tensor(sirf_src.as_array(), requires_grad=True).to(device)
sorry, I don't understand this. The point is that we want to have some functions that isolate the murky details. At some point, we change the as_array() stuff, and even just replace it with sirf_src.as_torch(device), but then it can be done in a single place.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> + raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+def sirf_to_torch(sirf_src, device, requires_grad=False):
+ if requires_grad:
+ # use torch.tensor to infer data type
+ return torch.tensor(sirf_src.as_array(), requires_grad=True).to(device)
+ else:
+ return torch.tensor(sirf_src.as_array()).to(device)
+
+def torch_to_sirf(torch_src, sirf_dest):
+ return sirf_dest.fill(torch_src.detach().cpu().numpy())
same comment as above, but currently, SIRF's fill() functions copy data.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> + @staticmethod + def backward(ctx, + grad_output + ): + + device, sirf_image_template, sirf_obj_func = ctx.saved_tensors + tmp_grad = sirf_obj.get_gradient(sirf_image_template) + grad = sirf_to_torch(tmp_grad, device, requires_grad=True) + return grad_output*grad, None, None, None + +class _AcquisitionModelForward(torch.autograd.Function): + @staticmethod + def forward(ctx, + torch_image, + sirf_image_template, + sirf_acq_mdl
As mentioned in #1305 (review), I think we should generalise AcquisitionModel to Operator or similar.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
> +SIRF +├── src +│ ├── torch +│ ├── CMakeLists.txt +│ ├── README.md +│ └── SIRF_torch.py +``` +Currently installed in: +```md +INSTALL +├── python +│ ├── sirf +│ └── SIRF_torch.py +``` + +The classes are accessed as `sirf.SIRF_torch.TheClass`.
I'd prefer sirf.torch.TheClass, which presumably could be done by having __init__.py importing everything from SIRF_torch.
> + if requires_grad: + return torch.tensor(sirf_src.as_array(), requires_grad=True).to(torch_dest.device) + else: + return torch.tensor(sirf_src.as_array()).to(torch_dest.device) +``` + +```python +def torch_to_sirf(torch_src, sirf_dest): + return sirf_dest.fill(torch_src.detach().cpu().numpy()) +``` + + +--- +Next we have subclassed `torch.autograd.Function`s these allow for PyTorch to make use of SIRF's acquisition models and objective functions. + +Generally they require a combinations of `torch_image`, `torch_measurements`, `sirf_image`, `sirf_measurements`, `sirf_acq_mdl`, and/or `sirf_obj_func`.
As mentioned in #1305 (review), we need to generalise acq_model to other SIRF operations. Examples being Reg.Resampler and STIR.AcquisitionSensitivityModel, which have forward and backward operators.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@ckolbPTB commented on this pull request.
> @@ -0,0 +1,276 @@
+try:
+ import torch
+except ModuleNotFoundError:
+ raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py+ + +def sirf_to_torch(sirf_src, device, requires_grad=False): + if requires_grad: + # use torch.tensor to infer data type + return torch.tensor(sirf_src.as_array(), requires_grad=True).to(device)
yes, but instead of the if statement and having the same code twice simply say
return torch.tensor(sirf_src.as_array(), requires_grad=requires_grad).to(device)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> + raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+def sirf_to_torch(sirf_src, device, requires_grad=False):
+ if requires_grad:
+ # use torch.tensor to infer data type
+ return torch.tensor(sirf_src.as_array(), requires_grad=True).to(device)
+ else:
+ return torch.tensor(sirf_src.as_array()).to(device)
+
+def torch_to_sirf(torch_src, sirf_dest):
+ return sirf_dest.fill(torch_src.detach().cpu().numpy())
sorry, what I meant was, will sirf_dest be changed by this or will it create a copy of sirf_dest and then fill. Currently the code suggests that sirf_dist remains unchanged, but I am not sure this is the case
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 2 commits.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
> + raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+def sirf_to_torch(sirf_src, device, requires_grad=False):
+ if requires_grad:
+ # use torch.tensor to infer data type
+ return torch.tensor(sirf_src.as_array(), requires_grad=True).to(device)
+ else:
+ return torch.tensor(sirf_src.as_array()).to(device)
+
+def torch_to_sirf(torch_src, sirf_dest):
+ return sirf_dest.fill(torch_src.detach().cpu().numpy())
discussed in the meeting. Yes, this operation modifies sirf_dest (and returns the same handle). Needs to be commented and potentially renamed with trailing _.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 3 commits.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
I think the test_cases need reviewed again. Perhaps grad_check is unneccessary since SIRF usually does tests for adjointness etc. We could suffice with simpler test checking the things are pasted to and from torch/sirf correctly?
> + # etc etc
+ # self.acq_model = AcquisitionFactory(acq_model, sirf_image_template,
+ # sirf_measurement_template, deviceMR, coil_sens, device)
+ # else:
+ # raise ValueError('acq_model must be a sirf object from STIR or Gadgetron')
+ self.acq_model = acq_model
+ self.sirf_data_in_template = sirf_data_in_template
+ self.sirf_data_out_template = sirf_data_out_template
+ # have a forward that can take a dynamic amount of arguments
+ def forward(self, data_in, *args):
+ # check if data of form [batch, channel, *sirf_data_in_template.shape]
+ if data.shape[2:] != sirf_data_in_template.shape:
+ raise ValueError('Data must be of the form [batch, channel, *sirf_data_in_template.shape]')
+
+ data_out = torch.zeros(data_in.shape[0], data_in.shape[1], *sirf_data_out_template.shape)
+ for i in data.shape[0]:
#1305 this code is just trying to be illustrative (i.e. it doesn't work) about what I tried to explain in the meeting today. If we want to use an acquistion wrapper with a dataset we run into issues with needing extra data components (coil maps, multiplicative, additive components), these are often specific to the dataset sample. In this code I have tried to outline an approach that could potentially work. However I am not sure if this is a useful, as perhaps people may just write their own.
> + "Sum of Differences: ", numpy.abs(torch_image.detach().cpu().numpy() - bp_sirf_measurements.as_array()).sum())
+ torch_measurements.retain_grad()
+ torch_image.sum().backward()
+ # grad sum(A^Tx) = A 1
+ comparison = acq_mdl.forward(bp_sirf_measurements.get_uniform_copy(1))
+ print("Sum of torch: ", torch_measurements.grad.sum(), "Sum of sirf: ", comparison.sum(), "Sum of Differences: ", \
+ numpy.abs(torch_measurements.grad.detach().cpu().numpy() - comparison.as_array()).sum())
+
+
+ # form objective function
+ print("Objective Function Test")
+ obj_fun = pet.make_Poisson_loglikelihood(sirf_measurements)
+ print("Made poisson loglikelihood")
+ obj_fun.set_acquisition_model(sirf_acq_mdl)
+ print("Set acquisition model")
+ obj_fun.set_up(sirf_image_template)
When setting up this 2D PET objective function test, I am getting a segmentation fault, specifically at the obj_fun.set_up call. SIRF seems to be installed correctly with all STIR ctests in SIRF running successfully, @KrisThielemans can you see anything wrong?
> + + @staticmethod + def backward(ctx, + grad_output: torch.Tensor + ): + + sirf_obj = ctx.sirf_obj_func + sirf_image_template = ctx.sirf_image_template + device = ctx.device + # Negative for Gradient Descent as per torch convention + sirf_grad = -sirf_obj.get_gradient(sirf_image_template) + grad = sirf_to_torch(sirf_grad, device, requires_grad=True) + return grad_output*grad, None, None, None + + +class _ObjectiveGradient(torch.autograd.Function):
currently commented and not implemented, just checking my understanding is correct of this implementation: notebooks/Deep_Learning_listmode_PET/snippets/solution_4_1.py. @KrisThielemans
> + components that vary between samples from the dataset. + We need to be able to choose the correct the acquisition model wrapper based + on the data, also there is the assumption that we have all the same data + components for every sample in the dataset, but heyho this is a start."""
The pseudo-code was not meant to construct acquisition objects, but take lists of acq_models (more exactly the components of the data that change from dataset sample to sample). The factory was meant to give to class specific to the components that change. For example give a class that changes only background and no, say, attentuation. This class would manipulate the inputs (updating the background for the acquitions model) for each dataset sample. The class would call the torch.autograd.Function, that would "abstracted" and only care about having templates and the acquisition model.
For now though I have removed the dataset part, and first building up the torch.autograf.Function(s).
> + # find a way of checking if the input or output is an image + # then throw a warning and choose whether the wrapper should be forward + # or backward
took that out
> + def __init__(self, acq_model, sirf_data_in_template, sirf_data_out_template, device, *args): + # find a way of checking if the input or output is an image + # then throw a warning and choose whether the wrapper should be forward + # or backward + if len(args) > 0: + # detemine the modality of the data + self.modality = args[0] + if len(args) > 1: + # determine the additional components of the data + self.second_arg = args[1] + # ... and so on + # then begin to assign the correct acquisition model wrapper + + + +class DatasetAcquisitionModels(torch.nn.Module):
yes I will remove
> + ctx.sirf_obj = sirf_obj + ctx.image_template = image_template + ctx.x = x.detach().cpu().numpy().squeeze() + ctx.x = ctx.image_template.fill(ctx.x) + value_np = ctx.sirf_obj.get_value(ctx.x) + return torch.tensor(value_np).to(ctx.device) + + @staticmethod + def backward( + ctx, + in_grad): + grads_np = ctx.sirf_obj.get_gradient(ctx.x).as_array() + grads = torch.from_numpy(grads_np).to(ctx.device) * in_grad + return grads.unsqueeze(dim=0), None, None, None + +class _AcquisitionModelForward(torch.autograd.Function):
I couldn't find a function that swaps this... So created a AcquisitionModelBackward
> + ctx.x = x.detach().cpu().numpy().squeeze() + ctx.x = ctx.image_template.fill(ctx.x)
yes I made the following functions:
sirf_to_torch (sirf_src, torch_dest, require_grad)
torch_to_sirf (torch_src, sirf_dest)
>
def torch_to_sirf(torch_src, sirf_dest):
return sirf_dest.fill(torch_src.detach().cpu().numpy())
-class _ObjectiveFunctionModule(torch.autograd.Function):
+class _ObjectiveFunction(torch.autograd.Function):
@staticmethod
def forward(ctx,
Hi @mehrhardt, thanks! So I think this is a feature of using the torch.autograd.Function. We don't override the constructor see here.
However there for torch.nn.Module we can instantiate it with what we need to then "cleanly" run the torch.autograd.Function. This is what is done in AcquisitionModelForward module, it only takes the image when called (also potentially other things tbd).
I updated this PR and gave a better description at the beginning that I hope makes more sense.
> @@ -0,0 +1,276 @@
+try:
+ import torch
+except ModuleNotFoundError:
+ raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py+ + +def sirf_to_torch(sirf_src, device, requires_grad=False): + if requires_grad: + # use torch.tensor to infer data type + return torch.tensor(sirf_src.as_array(), requires_grad=True).to(device)
you're right I have updated now.
> + raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+def sirf_to_torch(sirf_src, device, requires_grad=False):
+ if requires_grad:
+ # use torch.tensor to infer data type
+ return torch.tensor(sirf_src.as_array(), requires_grad=True).to(device)
+ else:
+ return torch.tensor(sirf_src.as_array()).to(device)
+
+def torch_to_sirf(torch_src, sirf_dest):
+ return sirf_dest.fill(torch_src.detach().cpu().numpy())
I am not sure, it may suffice to remove the return
> + + n_batch = image.shape[0] + if self.sirf_image_shape[0] == 1: + # if 2D + if len(image.shape) == 3: + # if 2D and no channel then add singleton + # add singleton for SIRF + image = image.unsqueeze(1) + else: + # if 3D + if len(image.shape) == 4: + # if 3D and no channel then add singleton + image = image.unsqueeze(1) + n_channel = image.shape[1] + + # This looks horrible, but PyTorch should be able to trace.
This looks great, I will look at this again. Thanks!
> + if requires_grad: + return torch.tensor(sirf_src.as_array(), requires_grad=True).to(torch_dest.device) + else: + return torch.tensor(sirf_src.as_array()).to(torch_dest.device) +``` + +```python +def torch_to_sirf(torch_src, sirf_dest): + return sirf_dest.fill(torch_src.detach().cpu().numpy()) +``` + + +--- +Next we have subclassed `torch.autograd.Function`s these allow for PyTorch to make use of SIRF's acquisition models and objective functions. + +Generally they require a combinations of `torch_image`, `torch_measurements`, `sirf_image`, `sirf_measurements`, `sirf_acq_mdl`, and/or `sirf_obj_func`.
@KrisThielemans I changed to operators, and also abstracted the functionality to ensure the we are able to take care of different src and dest. We enforce [batch, channel, *src_shape] or [batch, *src_shape] as input, and [batch, channel, *dest_shape] or [batch, *dest_shape] as output.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> + check_shapes(torch_image_shape[1:], self.sirf_image_shape) + if self.sirf_image_shape == torch_image_shape[1:]: + torch_image = torch_image.unsqueeze(1) # add channel dimension + self.channel = True + elif len(torch_image_shape) == len(self.sirf_image_shape) + 2: + check_shapes(torch_image_shape[2:], self.sirf_image_shape) + + n_batch = torch_image.shape[0] + n_channel = torch_image.shape[1] + + # This looks horrible, but PyTorch should be able to trace. + batch_values = [] + for batch in range(n_batch): + channel_values = [] + for channel in range(n_channel): + channel_values.append(_ObjectiveFunction.apply(torch_image[batch, channel], self.sirf_image_template, self.sirf_obj_func))
@KrisThielemans we need to update SIRF's objective functions "on the fly", we could have an input of multiple objective functions?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 5 commits.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
@KrisThielemans @evgueni-ovtchinnikov am I right in thinking that this should be raising an error if the operator is non-linear?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 3 commits.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans requested changes on this pull request.
Various comments in the review. A general comment: I'm not convinced about hiding the "times -1" for the objective function. For one thing, it will fail with CIL functions such as KL. There are a few possible strategies here, which we should discuss.
> + Returns the adjoint operator of the operator, where the adjoint operator + is the forward method and the forward operator is the backward method.⬇️ Suggested change
- Returns the adjoint operator of the operator, where the adjoint operator - is the forward method and the forward operator is the backward method. + Creates the adjoint operator of a linear operator `lin_op`.
> + """Swaps the forward and adjoint operators""" + # By applying the adjoint operator the error catching is done by SIRF + # as operators without adjoint will throw an error, and these operators + # are not supported by the adjoint operator⬇️ Suggested change
- """Swaps the forward and adjoint operators""" - # By applying the adjoint operator the error catching is done by SIRF - # as operators without adjoint will throw an error, and these operators - # are not supported by the adjoint operator + """Calls the adjoint method of the original linear operator""" + # Note: calling `adjoint` will raise an error in SIRF if the operator is not linear.
> + """Swaps the backward and forward operators""" + # As the adjoint is called when forward is called the forward will be + # correct. + return self.operator.forward(x)⬇️ Suggested change
- """Swaps the backward and forward operators""" - # As the adjoint is called when forward is called the forward will be - # correct. - return self.operator.forward(x) + """Calls the `direct` method of the original linear operator""" + # Note: calling `direct` will raise an error in SIRF if the operator is not linear. + return self.operator.direct(x)
> @@ -734,3 +734,24 @@ def get_index_to_physical_point_matrix(self):
arr = numpy.ndarray((4,4), dtype = numpy.float32)
try_calling (pysirf.cSIRF_GeomInfo_get_index_to_physical_point_matrix(self.handle, arr.ctypes.data))
return arr
+
+class AdjointOperator(object):
add direct and adjoint as aliases.
> @@ -0,0 +1,231 @@ +try: + import torch + import sirf + + from sirf.STIR import *
not so happy with that...
> +except ModuleNotFoundError:
+ raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
could have been sirf or sirf.STIR, so make the try ... except statement only apply to torch
> @@ -0,0 +1,231 @@
+try:
+ import torch
+ import sirf
+
+ from sirf.STIR import *
+except ModuleNotFoundError:
+ raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+def sirf_to_torch(
+ sirf_src: object,
what is object?
> @@ -0,0 +1,231 @@
+try:
+ import torch
+ import sirf
+
+ from sirf.STIR import *
+except ModuleNotFoundError:
+ raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+def sirf_to_torch(
+ sirf_src: object,
+ device: torch.device,
is that really a class for typing? I doubt it.
> + # This is an in-place operation - CAREFUL + # Only really to be used within torch.autograd.Function
should be part of docstring
> + # v = grad_output = ∂L/∂g(x), where L is the loss function and g(x) is the + # *conceptual* output of this Function (the gradient of f(x)). v is a vector. + # The Hessian is ∇²f(x) = H|x, evaluated at the input point x. + # The HVP is v^T * H|x. + # This HVP computes ∂L/∂x, the gradient of the loss with respect to the *input* (x) + # of the forward pass, by applying the chain rule: ∂L/∂x = (∂L/∂g(x)) * (∂g(x)/∂x) = v^T * H.
Using f in forward doc above.
I can't imagine that this documentation is correct though, it certainly gives me a headache. (It might be computing the HVP of $f$, but it should compute the VJP of $g$).
> + + @staticmethod + def backward(ctx, + grad_output: torch.Tensor + ): + + sirf_obj_func = ctx.sirf_obj_func + sirf_image = ctx.sirf_image + device = ctx.device + # Negative for Gradient Descent as per torch convention + sirf_grad = -sirf_obj_func.get_gradient(sirf_image) + grad = sirf_to_torch(sirf_grad, device, requires_grad=True) + return grad_output*grad, None, None, None + + +class _ObjectiveFunctionGradient(torch.autograd.Function):
I still don't understand te doc for this class. I understand forward implements
$$ g(x) = \nabla f |_x $$
for some function $f$ (which is minus a SIRF objective function).
The backward operation (or vjp) has therefore to implement
$$ backward(y) = \nabla g|_x * y = H|_x * y $$
I'm not very careful with row-column vectors here, but I believe that for use in an explicit GD or EMML layer, the output of $g(x)$ has to be a tensor of the same shape as $x$. Similarly, backward will be called with an argument of the same shape.
correct?
> + # g(x) = ∇f(x) = J^T (where J is the Jacobian of f(x)). g(x) is a row vector. + # The VJP is [1] * J = ∇f(x), where [1] is a scalar (or a 1x1 tensor). + # We use [1] because we're computing the full gradient of the scalar output f(x). + # The result of the VJP is the gradient itself. The "upstream gradient" + # is implicitly [1] because there are no prior operations. + # Here we pass this gradient.⬇️ Suggested change
- # g(x) = ∇f(x) = J^T (where J is the Jacobian of f(x)). g(x) is a row vector. - # The VJP is [1] * J = ∇f(x), where [1] is a scalar (or a 1x1 tensor). - # We use [1] because we're computing the full gradient of the scalar output f(x). - # The result of the VJP is the gradient itself. The "upstream gradient" - # is implicitly [1] because there are no prior operations. - # Here we pass this gradient. + # g(x) = J^T (where J is the Jacobian of f at x). + # The return value is a tensor of the type and shape as x
reason:
forward does not compute a vjp, as far as I understand. The text confuses me a lot. (Maybe it's correct of course, in which case, keep it).I'm not sure about the transpose by the way. Is it ever defined anywhere if x is a row or column vector for those types of operations? I'm not sure what torch conventions are for Jacobians. But I am fairly sure you'd want the output of forward to be the same type of "vector" as x.
> + if len(torch_src_shape) == len(sirf_src_shape): + raise ValueError(f"Invalid shape of src. Expected batch dim.")
unclear error.
> +import sirf
+msg = sirf.STIR.MessageRedirector(info=None, warn=None, errr=None)
+from sirf.Utilities import examples_data_path
+import matplotlib.pyplot as plt
+import torch
+# set deterministic computing
+torch.use_deterministic_algorithms(True)
+
+from SIRF_torch import *
+# - Gradient checks
+# - Adjointness checks
+
+def get_pet_2d():
+ pet_data_path = examples_data_path('PET')
+ pet_2d_raw_data_file = existing_filepath(pet_data_path, 'thorax_single_slice/template_sinogram.hs')
+ pet_2d_acq_data = AcquisitionData(pet_2d_raw_data_file)
pretty weird you don't need pet here. Presumably because of from sirf.STIR import *. I don't like it though.
> @@ -0,0 +1,156 @@ +import os
this file should be converted to a pytest. Maybe @paskino can help.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 3 commits.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> @@ -2017,8 +2017,13 @@ def adjoint(self, ad, out=None):
Added for CCPi CIL compatibility
https://github.com/CCPPETMR/SIRF/pull/237#issuecomment-439894266
'''
- return self.backward(ad, subset_num=self.subset_num,
- num_subsets=self.num_subsets, out=out)
+ if self.is_linear():
@KrisThielemans am I right in thinking that this isn't neccessary? I will change the comment on the use of adjoint in the adjoint operator class in that case
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> @@ -0,0 +1,231 @@
+try:
+ import torch
+ import sirf
+
+ from sirf.STIR import *
+except ModuleNotFoundError:
+ raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+def sirf_to_torch(
+ sirf_src: object,
sirf.STIR.ImageData, sirf.STIR.AcquisitionData etc etc, just a catch all at the moment as I am not sure of all the data objects that would be typed in this case...
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
> @@ -2017,8 +2017,13 @@ def adjoint(self, ad, out=None):
Added for CCPi CIL compatibility
https://github.com/CCPPETMR/SIRF/pull/237#issuecomment-439894266
'''
- return self.backward(ad, subset_num=self.subset_num,
- num_subsets=self.num_subsets, out=out)
+ if self.is_linear():
what do you mean? adjoint makes no sense unless the operator is linear. That's why we suggested to put this condition here. Otherwise, you have to do checks yourself.
The error message could mention this better of course.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
> @@ -0,0 +1,231 @@
+try:
+ import torch
+ import sirf
+
+ from sirf.STIR import *
+except ModuleNotFoundError:
+ raise ModuleNotFoundError('Failed to import torch. Please install PyTorch first.')
+
+# based on
+# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py
+
+
+def sirf_to_torch(
+ sirf_src: object,
type hints have to be valid python. Maybe you could do
from typing import TypeAlias
SIRFContainer: TypeAlias = sirf.DataContainer
essentially, it needs to have as_array() and fill methods. CIL DataContainers would work as well, I suppose.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> + # g(x) = ∇f(x) = J^T (where J is the Jacobian of f(x)). g(x) is a row vector. + # The VJP is [1] * J = ∇f(x), where [1] is a scalar (or a 1x1 tensor). + # We use [1] because we're computing the full gradient of the scalar output f(x). + # The result of the VJP is the gradient itself. The "upstream gradient" + # is implicitly [1] because there are no prior operations. + # Here we pass this gradient.
yes, you're right it's still JVP forward! Thanks
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> + # v = grad_output = ∂L/∂g(x), where L is the loss function and g(x) is the + # *conceptual* output of this Function (the gradient of f(x)). v is a vector. + # The Hessian is ∇²f(x) = H|x, evaluated at the input point x. + # The HVP is v^T * H|x. + # This HVP computes ∂L/∂x, the gradient of the loss with respect to the *input* (x) + # of the forward pass, by applying the chain rule: ∂L/∂x = (∂L/∂g(x)) * (∂g(x)/∂x) = v^T * H.
I am going to move this to the readme
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> + if len(torch_src_shape) == len(sirf_src_shape): + raise ValueError(f"Invalid shape of src. Expected batch dim.")
I will change
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> @@ -2017,8 +2017,13 @@ def adjoint(self, ad, out=None):
Added for CCPi CIL compatibility
https://github.com/CCPPETMR/SIRF/pull/237#issuecomment-439894266
'''
- return self.backward(ad, subset_num=self.subset_num,
- num_subsets=self.num_subsets, out=out)
+ if self.is_linear():
For the operators we have at the moment the backward and adjoint would always give the same results. It's only the direct/forward where things can go horribly wrong. I guess this is what we talked about being more rigourous with naming. I will leave this in and expand the error message.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 2 commits.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> @@ -0,0 +1,156 @@ +import os
@paskino I tried to turn this into a pytest
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
@paskino so this is an attempt at pytest
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
> @@ -2017,8 +2017,13 @@ def adjoint(self, ad, out=None):
Added for CCPi CIL compatibility
https://github.com/CCPPETMR/SIRF/pull/237#issuecomment-439894266
'''
- return self.backward(ad, subset_num=self.subset_num,
- num_subsets=self.num_subsets, out=out)
+ if self.is_linear():
My point is that "adjoint" operation should not work on a non-linear operator, as the concept doesn't make sense. So, I prefer it to raise an error.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> + `∂L/∂x = (∂L/∂(-f(x))) * (∂(-f(x))/∂x) = grad_output * (-∇f(x))`. Since the forward pass output is a scalar, the Jacobian is simply the gradient `∇f(x)`. The upstream gradient `grad_output` represents `∂L/∂(-f(x))`, and `sirf_obj_func.get_gradient()` provides `-∇f(x)`. The multiplication `grad_output * (-∇f(x))` correctly computes the VJP. + +### `_ObjectiveFunctionGradient` (Forward and Backward Passes) + +* **Forward Pass (JVP with v=1):** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Computes the *gradient* of the *negative* SIRF objective function using `sirf_obj_func.get_gradient()`, which returns `-∇f(x)`. This is equivalent to computing a JVP where the input vector `v` is implicitly `1` (because we are taking the full gradient of the *scalar* output of the objective function). The result is a vector, the gradient. + 3. Returns the (negative) gradient as a PyTorch tensor. + +* **Backward Pass (Hessian-Vector Product - HVP):** + 1. Receives the upstream gradient (`grad_output`), which now represents a *vector* (not a scalar). This vector corresponds to `∂L/∂g(x)`, where `g(x) = -∇f(x)` is the *output* of the forward pass (the negative gradient of the objective function). + 2. Converts `grad_output` to a SIRF object. + 3. Computes the Hessian-vector product (HVP) using `sirf_obj_func.multiply_with_Hessian()`. This method takes two arguments: the current estimate (`sirf_image`, where the Hessian is evaluated) and the input vector (`sirf_grad`, which is `grad_output` converted to a SIRF object). + 4. Returns the (negative) HVP. + + **More on the HVP:** Let `f(x)` be the SIRF objective function, and let `g(x) = -∇f(x)` be the output of the forward pass. We want to compute `∂L/∂x`, where `L` is the overall loss. The backward pass receives `v = grad_output = ∂L/∂g(x)`. The Hessian of `f(x)` is `H = ∇²f(x)`. We compute the HVP: `H|x * v`, where `H|x` denotes the Hessian evaluated at point x. Applying the chain rule:
@KrisThielemans does this seem a better explanation? Also I added GD with sirf acquistion model. There are other use cases in the use_case.py
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
> @@ -734,3 +734,24 @@ def get_index_to_physical_point_matrix(self):
arr = numpy.ndarray((4,4), dtype = numpy.float32)
try_calling (pysirf.cSIRF_GeomInfo_get_index_to_physical_point_matrix(self.handle, arr.ctypes.data))
return arr
+
+class AdjointOperator(object):
resolved how?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans requested changes on this pull request.
We seem to have a disagreement on how this should be documented. Ideally @mehrhardt @gschramm provide an alternative opinion to mine...
By the way, I think using funky characters in the doc is a bad idea and that it is better to use Latex-doc (although it is less readable of course). This is because you don't know what language settings the user has (unless there's a way to enforce unicode in Markdown).
> @@ -0,0 +1,123 @@ +# SIRF-PyTorch Wrapper +This wrapper provides a bridge between the [SIRF](https://github.com/SyneRBI/SIRF) (Synergistic Image Reconstruction Framework) library and PyTorch, enabling the use of SIRF's image reconstruction operators and objective functions within PyTorch's automatic differentiation (autodiff) framework. + +## Core Concepts: Autodiff, Reverse Mode, JVP, VJP, and HVP + +This wrapper leverages PyTorch's `torch.autograd.Function` to achieve integration with SIRF. Understanding the following concepts is crucial for the design: + +* **Automatic Differentiation (Autodiff):** A technique for numerically evaluating the derivative of a function specified by a computer program. PyTorch uses a "reverse mode" autodiff system. + +* **Reverse Mode Autodiff:** In reverse mode, the gradients are computed by traversing the computational graph backwards, from the output (typically a scalar loss function) to the inputs. This is highly efficient for functions with many inputs and a single output, which is common in machine learning. + +* **Jacobian-Vector Product (JVP):** Let `y = f(x)` be a vector-valued function, where `x` is an input vector and `y` is an output vector. The Jacobian matrix `J` contains all the first-order partial derivatives of `f`. The JVP is the product `Jv`, where `v` is a vector. It gives the directional derivative of `f` at `x` in the direction of `v`. *Forward mode* autodiff computes JVPs. + +* **Vector-Jacobian Product (VJP):** With `y = f(x)` as above, the VJP is the product `w^T J`, where `w` is a vector (often called the "upstream/puput gradient"). The VJP represents the gradient of a scalar-valued function that depends on `y` (e.g., a loss function), with respect to the input `x`. *Reverse mode* autodiff computes VJPs.
typo "puput"
> @@ -0,0 +1,123 @@ +# SIRF-PyTorch Wrapper +This wrapper provides a bridge between the [SIRF](https://github.com/SyneRBI/SIRF) (Synergistic Image Reconstruction Framework) library and PyTorch, enabling the use of SIRF's image reconstruction operators and objective functions within PyTorch's automatic differentiation (autodiff) framework. + +## Core Concepts: Autodiff, Reverse Mode, JVP, VJP, and HVP + +This wrapper leverages PyTorch's `torch.autograd.Function` to achieve integration with SIRF. Understanding the following concepts is crucial for the design: + +* **Automatic Differentiation (Autodiff):** A technique for numerically evaluating the derivative of a function specified by a computer program. PyTorch uses a "reverse mode" autodiff system. + +* **Reverse Mode Autodiff:** In reverse mode, the gradients are computed by traversing the computational graph backwards, from the output (typically a scalar loss function) to the inputs. This is highly efficient for functions with many inputs and a single output, which is common in machine learning. + +* **Jacobian-Vector Product (JVP):** Let `y = f(x)` be a vector-valued function, where `x` is an input vector and `y` is an output vector. The Jacobian matrix `J` contains all the first-order partial derivatives of `f`. The JVP is the product `Jv`, where `v` is a vector. It gives the directional derivative of `f` at `x` in the direction of `v`. *Forward mode* autodiff computes JVPs.
"It gives the directional derivative of f at x in the direction of v. " I think that strictly speaking this is only true for unit vectors. @mehrhardt could confirm.
(same comment for other statements below)
> +* **Automatic Differentiation (Autodiff):** A technique for numerically evaluating the derivative of a function specified by a computer program. PyTorch uses a "reverse mode" autodiff system. + +* **Reverse Mode Autodiff:** In reverse mode, the gradients are computed by traversing the computational graph backwards, from the output (typically a scalar loss function) to the inputs. This is highly efficient for functions with many inputs and a single output, which is common in machine learning. + +* **Jacobian-Vector Product (JVP):** Let `y = f(x)` be a vector-valued function, where `x` is an input vector and `y` is an output vector. The Jacobian matrix `J` contains all the first-order partial derivatives of `f`. The JVP is the product `Jv`, where `v` is a vector. It gives the directional derivative of `f` at `x` in the direction of `v`. *Forward mode* autodiff computes JVPs. + +* **Vector-Jacobian Product (VJP):** With `y = f(x)` as above, the VJP is the product `w^T J`, where `w` is a vector (often called the "upstream/puput gradient"). The VJP represents the gradient of a scalar-valued function that depends on `y` (e.g., a loss function), with respect to the input `x`. *Reverse mode* autodiff computes VJPs. + +* **Hessian-Vector Product (HVP):** The Hessian matrix `H` contains all the second-order partial derivatives of a scalar-valued function `f(x)`. The HVP is the product `Hv`, where `v` is a vector. It gives the second-order directional derivative of `f` at `x` in the direction of `v`. + +## Wrapper Design + +The wrapper provides three main classes: + +1. `SIRFTorchOperator`: Wraps a SIRF `Operator` (e.g., a projection operator). Computes the JVP forward and VJP in reverse. +2. `SIRFTorchObjectiveFunction`: Wraps a SIRF `ObjectiveFunction` for calculating its value forward, and gradient in reverse mode.
it still computes VJP, i.e. whatever*gradient.
> + +* **Automatic Differentiation (Autodiff):** A technique for numerically evaluating the derivative of a function specified by a computer program. PyTorch uses a "reverse mode" autodiff system. + +* **Reverse Mode Autodiff:** In reverse mode, the gradients are computed by traversing the computational graph backwards, from the output (typically a scalar loss function) to the inputs. This is highly efficient for functions with many inputs and a single output, which is common in machine learning. + +* **Jacobian-Vector Product (JVP):** Let `y = f(x)` be a vector-valued function, where `x` is an input vector and `y` is an output vector. The Jacobian matrix `J` contains all the first-order partial derivatives of `f`. The JVP is the product `Jv`, where `v` is a vector. It gives the directional derivative of `f` at `x` in the direction of `v`. *Forward mode* autodiff computes JVPs. + +* **Vector-Jacobian Product (VJP):** With `y = f(x)` as above, the VJP is the product `w^T J`, where `w` is a vector (often called the "upstream/puput gradient"). The VJP represents the gradient of a scalar-valued function that depends on `y` (e.g., a loss function), with respect to the input `x`. *Reverse mode* autodiff computes VJPs. + +* **Hessian-Vector Product (HVP):** The Hessian matrix `H` contains all the second-order partial derivatives of a scalar-valued function `f(x)`. The HVP is the product `Hv`, where `v` is a vector. It gives the second-order directional derivative of `f` at `x` in the direction of `v`. + +## Wrapper Design + +The wrapper provides three main classes: + +1. `SIRFTorchOperator`: Wraps a SIRF `Operator` (e.g., a projection operator). Computes the JVP forward and VJP in reverse.
I don't agree. It doesn't compute the JVP in the forward step. It just computes Operator(x). This is NOT the same. (certainly not for non-linear operators).
> + +## Wrapper Design + +The wrapper provides three main classes: + +1. `SIRFTorchOperator`: Wraps a SIRF `Operator` (e.g., a projection operator). Computes the JVP forward and VJP in reverse. +2. `SIRFTorchObjectiveFunction`: Wraps a SIRF `ObjectiveFunction` for calculating its value forward, and gradient in reverse mode. +3. `SIRFTorchObjectiveFunctionGradient`: Wraps a SIRF `ObjectiveFunction` for calculating its gradient forward and computing the Hessian-vector product in reverse mode. + +These classes use custom `torch.autograd.Function` implementations (`_Operator`, `_ObjectiveFunction`, and `_ObjectiveFunctionGradient`) to define the forward and backward passes, handling the conversions between PyTorch tensors and SIRF objects. + +### `_Operator` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Applies the SIRF `Operator.forward()` method. Let's represent the SIRF Operator as `A`. So, this step computes `y = Ax`, where `x` is the input (SIRF object) and `y` is the output (SIRF object).
This is potentially misleading, as it implies the SIRF Operator has to be linear. I'm not sure if it' s useful to have the text from "Let's"
> + +1. `SIRFTorchOperator`: Wraps a SIRF `Operator` (e.g., a projection operator). Computes the JVP forward and VJP in reverse. +2. `SIRFTorchObjectiveFunction`: Wraps a SIRF `ObjectiveFunction` for calculating its value forward, and gradient in reverse mode. +3. `SIRFTorchObjectiveFunctionGradient`: Wraps a SIRF `ObjectiveFunction` for calculating its gradient forward and computing the Hessian-vector product in reverse mode. + +These classes use custom `torch.autograd.Function` implementations (`_Operator`, `_ObjectiveFunction`, and `_ObjectiveFunctionGradient`) to define the forward and backward passes, handling the conversions between PyTorch tensors and SIRF objects. + +### `_Operator` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Applies the SIRF `Operator.forward()` method. Let's represent the SIRF Operator as `A`. So, this step computes `y = Ax`, where `x` is the input (SIRF object) and `y` is the output (SIRF object). + 3. Converts the result back to a PyTorch tensor. + 4. If the input tensor requires gradients, it saves relevant information (the output SIRF object and the operator) in the context (`ctx`) for use in the backward pass. + +* **Backward Pass (Adjoint Operation - VJP):**
it is only the adjoint (applied to a vector) in the linear case. I'd delete this. It is just VJP.
> +1. `SIRFTorchOperator`: Wraps a SIRF `Operator` (e.g., a projection operator). Computes the JVP forward and VJP in reverse. +2. `SIRFTorchObjectiveFunction`: Wraps a SIRF `ObjectiveFunction` for calculating its value forward, and gradient in reverse mode. +3. `SIRFTorchObjectiveFunctionGradient`: Wraps a SIRF `ObjectiveFunction` for calculating its gradient forward and computing the Hessian-vector product in reverse mode. + +These classes use custom `torch.autograd.Function` implementations (`_Operator`, `_ObjectiveFunction`, and `_ObjectiveFunctionGradient`) to define the forward and backward passes, handling the conversions between PyTorch tensors and SIRF objects. + +### `_Operator` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Applies the SIRF `Operator.forward()` method. Let's represent the SIRF Operator as `A`. So, this step computes `y = Ax`, where `x` is the input (SIRF object) and `y` is the output (SIRF object). + 3. Converts the result back to a PyTorch tensor. + 4. If the input tensor requires gradients, it saves relevant information (the output SIRF object and the operator) in the context (`ctx`) for use in the backward pass. + +* **Backward Pass (Adjoint Operation - VJP):** + 1. Receives the "upstream gradient" (`grad_output`) – the gradient of the loss `L` with respect to the *output* of the forward pass: `∂L/∂y`.
I find the grad_output name very confusing, but maybe it's standard in torch doc. In any case, I would remove any reference to loss 'L'. Who knows where this is used. It could easily be part of a chain of operations for instance. If you want to keep it, then I would say "for instance, if this is part of a sequence of operations like L(operator(x))... (but does that really help?)
> +3. `SIRFTorchObjectiveFunctionGradient`: Wraps a SIRF `ObjectiveFunction` for calculating its gradient forward and computing the Hessian-vector product in reverse mode. + +These classes use custom `torch.autograd.Function` implementations (`_Operator`, `_ObjectiveFunction`, and `_ObjectiveFunctionGradient`) to define the forward and backward passes, handling the conversions between PyTorch tensors and SIRF objects. + +### `_Operator` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Applies the SIRF `Operator.forward()` method. Let's represent the SIRF Operator as `A`. So, this step computes `y = Ax`, where `x` is the input (SIRF object) and `y` is the output (SIRF object). + 3. Converts the result back to a PyTorch tensor. + 4. If the input tensor requires gradients, it saves relevant information (the output SIRF object and the operator) in the context (`ctx`) for use in the backward pass. + +* **Backward Pass (Adjoint Operation - VJP):** + 1. Receives the "upstream gradient" (`grad_output`) – the gradient of the loss `L` with respect to the *output* of the forward pass: `∂L/∂y`. + 2. Converts `grad_output` to a SIRF object. + 3. Applies the SIRF `Operator.backward()` method (which computes the adjoint operation). The adjoint operation, `A^T`, effectively computes the VJP. If the forward pass computes `y = Ax`, the backward pass computes `x̄ = A^T ȳ`, where `A^T` is the adjoint of `A`, `ȳ` is `grad_output = ∂L/∂y`, and `x̄` is the gradient with respect to the input `x`. So, `x̄ = ∂L/∂x = (∂L/∂y)(∂y/∂x) = ȳ^T J = (A^T)ȳ`. Here, `J` is the Jacobian of the forward operation `y=Ax`, and it is implicitly represented by the operator `A` itself.
It is more general than the adjoint operation. I personally find it unhelpful to have these examples here. I would prefer to say "Applies the SIRF Operator.backward()method on the output of step 2". That is really the only thing it does. Any other meanings depend on whateverbackward` happens to do (but this function doesn't know).
> + +These classes use custom `torch.autograd.Function` implementations (`_Operator`, `_ObjectiveFunction`, and `_ObjectiveFunctionGradient`) to define the forward and backward passes, handling the conversions between PyTorch tensors and SIRF objects. + +### `_Operator` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Applies the SIRF `Operator.forward()` method. Let's represent the SIRF Operator as `A`. So, this step computes `y = Ax`, where `x` is the input (SIRF object) and `y` is the output (SIRF object). + 3. Converts the result back to a PyTorch tensor. + 4. If the input tensor requires gradients, it saves relevant information (the output SIRF object and the operator) in the context (`ctx`) for use in the backward pass. + +* **Backward Pass (Adjoint Operation - VJP):** + 1. Receives the "upstream gradient" (`grad_output`) – the gradient of the loss `L` with respect to the *output* of the forward pass: `∂L/∂y`. + 2. Converts `grad_output` to a SIRF object. + 3. Applies the SIRF `Operator.backward()` method (which computes the adjoint operation). The adjoint operation, `A^T`, effectively computes the VJP. If the forward pass computes `y = Ax`, the backward pass computes `x̄ = A^T ȳ`, where `A^T` is the adjoint of `A`, `ȳ` is `grad_output = ∂L/∂y`, and `x̄` is the gradient with respect to the input `x`. So, `x̄ = ∂L/∂x = (∂L/∂y)(∂y/∂x) = ȳ^T J = (A^T)ȳ`. Here, `J` is the Jacobian of the forward operation `y=Ax`, and it is implicitly represented by the operator `A` itself. + 4. Converts the resulting SIRF object back to a PyTorch tensor and returns it. This is the gradient with respect to the *input* of the forward pass (`∂L/∂x`).
I'd cut the second sentence
> + +### `_Operator` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Applies the SIRF `Operator.forward()` method. Let's represent the SIRF Operator as `A`. So, this step computes `y = Ax`, where `x` is the input (SIRF object) and `y` is the output (SIRF object). + 3. Converts the result back to a PyTorch tensor. + 4. If the input tensor requires gradients, it saves relevant information (the output SIRF object and the operator) in the context (`ctx`) for use in the backward pass. + +* **Backward Pass (Adjoint Operation - VJP):** + 1. Receives the "upstream gradient" (`grad_output`) – the gradient of the loss `L` with respect to the *output* of the forward pass: `∂L/∂y`. + 2. Converts `grad_output` to a SIRF object. + 3. Applies the SIRF `Operator.backward()` method (which computes the adjoint operation). The adjoint operation, `A^T`, effectively computes the VJP. If the forward pass computes `y = Ax`, the backward pass computes `x̄ = A^T ȳ`, where `A^T` is the adjoint of `A`, `ȳ` is `grad_output = ∂L/∂y`, and `x̄` is the gradient with respect to the input `x`. So, `x̄ = ∂L/∂x = (∂L/∂y)(∂y/∂x) = ȳ^T J = (A^T)ȳ`. Here, `J` is the Jacobian of the forward operation `y=Ax`, and it is implicitly represented by the operator `A` itself. + 4. Converts the resulting SIRF object back to a PyTorch tensor and returns it. This is the gradient with respect to the *input* of the forward pass (`∂L/∂x`). + + **Connection to VJP:** The backward pass directly computes the VJP. The upstream gradient `grad_output` (which is `∂L/∂y`) acts as the vector `w^T` in the VJP formula `w^T J`. The SIRF `Operator.backward()` method implicitly performs the multiplication by the Jacobian transpose (`J^T`, which is equivalent to the adjoint `A^T`).
not happy with this...
> + 2. Applies the SIRF `Operator.forward()` method. Let's represent the SIRF Operator as `A`. So, this step computes `y = Ax`, where `x` is the input (SIRF object) and `y` is the output (SIRF object). + 3. Converts the result back to a PyTorch tensor. + 4. If the input tensor requires gradients, it saves relevant information (the output SIRF object and the operator) in the context (`ctx`) for use in the backward pass. + +* **Backward Pass (Adjoint Operation - VJP):** + 1. Receives the "upstream gradient" (`grad_output`) – the gradient of the loss `L` with respect to the *output* of the forward pass: `∂L/∂y`. + 2. Converts `grad_output` to a SIRF object. + 3. Applies the SIRF `Operator.backward()` method (which computes the adjoint operation). The adjoint operation, `A^T`, effectively computes the VJP. If the forward pass computes `y = Ax`, the backward pass computes `x̄ = A^T ȳ`, where `A^T` is the adjoint of `A`, `ȳ` is `grad_output = ∂L/∂y`, and `x̄` is the gradient with respect to the input `x`. So, `x̄ = ∂L/∂x = (∂L/∂y)(∂y/∂x) = ȳ^T J = (A^T)ȳ`. Here, `J` is the Jacobian of the forward operation `y=Ax`, and it is implicitly represented by the operator `A` itself. + 4. Converts the resulting SIRF object back to a PyTorch tensor and returns it. This is the gradient with respect to the *input* of the forward pass (`∂L/∂x`). + + **Connection to VJP:** The backward pass directly computes the VJP. The upstream gradient `grad_output` (which is `∂L/∂y`) acts as the vector `w^T` in the VJP formula `w^T J`. The SIRF `Operator.backward()` method implicitly performs the multiplication by the Jacobian transpose (`J^T`, which is equivalent to the adjoint `A^T`). + +### `_ObjectiveFunction` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor (representing an image) to a SIRF object.
call the output x, such that you can refer to it in step 2.
> + 3. Converts the result back to a PyTorch tensor. + 4. If the input tensor requires gradients, it saves relevant information (the output SIRF object and the operator) in the context (`ctx`) for use in the backward pass. + +* **Backward Pass (Adjoint Operation - VJP):** + 1. Receives the "upstream gradient" (`grad_output`) – the gradient of the loss `L` with respect to the *output* of the forward pass: `∂L/∂y`. + 2. Converts `grad_output` to a SIRF object. + 3. Applies the SIRF `Operator.backward()` method (which computes the adjoint operation). The adjoint operation, `A^T`, effectively computes the VJP. If the forward pass computes `y = Ax`, the backward pass computes `x̄ = A^T ȳ`, where `A^T` is the adjoint of `A`, `ȳ` is `grad_output = ∂L/∂y`, and `x̄` is the gradient with respect to the input `x`. So, `x̄ = ∂L/∂x = (∂L/∂y)(∂y/∂x) = ȳ^T J = (A^T)ȳ`. Here, `J` is the Jacobian of the forward operation `y=Ax`, and it is implicitly represented by the operator `A` itself. + 4. Converts the resulting SIRF object back to a PyTorch tensor and returns it. This is the gradient with respect to the *input* of the forward pass (`∂L/∂x`). + + **Connection to VJP:** The backward pass directly computes the VJP. The upstream gradient `grad_output` (which is `∂L/∂y`) acts as the vector `w^T` in the VJP formula `w^T J`. The SIRF `Operator.backward()` method implicitly performs the multiplication by the Jacobian transpose (`J^T`, which is equivalent to the adjoint `A^T`). + +### `_ObjectiveFunction` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor (representing an image) to a SIRF object. + 2. Calls the SIRF `ObjectiveFunction.get_value()` method. Let `f(x)` represent the SIRF objective function. This step computes `f(x)`.
say "get_value() at x"
> + 2. Converts `grad_output` to a SIRF object. + 3. Applies the SIRF `Operator.backward()` method (which computes the adjoint operation). The adjoint operation, `A^T`, effectively computes the VJP. If the forward pass computes `y = Ax`, the backward pass computes `x̄ = A^T ȳ`, where `A^T` is the adjoint of `A`, `ȳ` is `grad_output = ∂L/∂y`, and `x̄` is the gradient with respect to the input `x`. So, `x̄ = ∂L/∂x = (∂L/∂y)(∂y/∂x) = ȳ^T J = (A^T)ȳ`. Here, `J` is the Jacobian of the forward operation `y=Ax`, and it is implicitly represented by the operator `A` itself. + 4. Converts the resulting SIRF object back to a PyTorch tensor and returns it. This is the gradient with respect to the *input* of the forward pass (`∂L/∂x`). + + **Connection to VJP:** The backward pass directly computes the VJP. The upstream gradient `grad_output` (which is `∂L/∂y`) acts as the vector `w^T` in the VJP formula `w^T J`. The SIRF `Operator.backward()` method implicitly performs the multiplication by the Jacobian transpose (`J^T`, which is equivalent to the adjoint `A^T`). + +### `_ObjectiveFunction` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor (representing an image) to a SIRF object. + 2. Calls the SIRF `ObjectiveFunction.get_value()` method. Let `f(x)` represent the SIRF objective function. This step computes `f(x)`. + 3. Returns the *negative* of the objective function value as a PyTorch tensor: `-f(x)`. We negate the value because PyTorch optimizers perform *minimization*, and we typically want to maximize an objective function (or minimize its negative) in image reconstruction. + 4. Saves relevant information to the `ctx` if gradients are needed. + +* **Backward Pass (VJP):** + 1. Receives the upstream gradient (`grad_output`), which is typically a tensor containing the scalar `1` (since the output of the forward pass is a scalar, and `∂L/∂(-f(x)) = 1` if `L = -f(x)`).
possibly it is typically 1, but it isn't in the GD use_case.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
gschramm
For me, the description of the autograd in pytorch makes sense:
https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#optional-reading-vector-calculus-using-autograd
This explain why, in the backward pass of a vector values function y = f(x), we have to implement the
(Jacobian transpose) applied to a vector (supplied to the layer in the backward pass).
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> @@ -0,0 +1,123 @@ +# SIRF-PyTorch Wrapper +This wrapper provides a bridge between the [SIRF](https://github.com/SyneRBI/SIRF) (Synergistic Image Reconstruction Framework) library and PyTorch, enabling the use of SIRF's image reconstruction operators and objective functions within PyTorch's automatic differentiation (autodiff) framework. + +## Core Concepts: Autodiff, Reverse Mode, JVP, VJP, and HVP + +This wrapper leverages PyTorch's `torch.autograd.Function` to achieve integration with SIRF. Understanding the following concepts is crucial for the design: + +* **Automatic Differentiation (Autodiff):** A technique for numerically evaluating the derivative of a function specified by a computer program. PyTorch uses a "reverse mode" autodiff system. + +* **Reverse Mode Autodiff:** In reverse mode, the gradients are computed by traversing the computational graph backwards, from the output (typically a scalar loss function) to the inputs. This is highly efficient for functions with many inputs and a single output, which is common in machine learning. + +* **Jacobian-Vector Product (JVP):** Let `y = f(x)` be a vector-valued function, where `x` is an input vector and `y` is an output vector. The Jacobian matrix `J` contains all the first-order partial derivatives of `f`. The JVP is the product `Jv`, where `v` is a vector. It gives the directional derivative of `f` at `x` in the direction of `v`. *Forward mode* autodiff computes JVPs.
Yes I think you're right too now... Perhaps "scaled" direction derivative, or change in f in direction of v?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> + +* **Automatic Differentiation (Autodiff):** A technique for numerically evaluating the derivative of a function specified by a computer program. PyTorch uses a "reverse mode" autodiff system. + +* **Reverse Mode Autodiff:** In reverse mode, the gradients are computed by traversing the computational graph backwards, from the output (typically a scalar loss function) to the inputs. This is highly efficient for functions with many inputs and a single output, which is common in machine learning. + +* **Jacobian-Vector Product (JVP):** Let `y = f(x)` be a vector-valued function, where `x` is an input vector and `y` is an output vector. The Jacobian matrix `J` contains all the first-order partial derivatives of `f`. The JVP is the product `Jv`, where `v` is a vector. It gives the directional derivative of `f` at `x` in the direction of `v`. *Forward mode* autodiff computes JVPs. + +* **Vector-Jacobian Product (VJP):** With `y = f(x)` as above, the VJP is the product `w^T J`, where `w` is a vector (often called the "upstream/puput gradient"). The VJP represents the gradient of a scalar-valued function that depends on `y` (e.g., a loss function), with respect to the input `x`. *Reverse mode* autodiff computes VJPs. + +* **Hessian-Vector Product (HVP):** The Hessian matrix `H` contains all the second-order partial derivatives of a scalar-valued function `f(x)`. The HVP is the product `Hv`, where `v` is a vector. It gives the second-order directional derivative of `f` at `x` in the direction of `v`. + +## Wrapper Design + +The wrapper provides three main classes: + +1. `SIRFTorchOperator`: Wraps a SIRF `Operator` (e.g., a projection operator). Computes the JVP forward and VJP in reverse.
Yes I have confused forward mode autodiff with the forward pass. I will update the docs to make this clearer...
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> +* **Automatic Differentiation (Autodiff):** A technique for numerically evaluating the derivative of a function specified by a computer program. PyTorch uses a "reverse mode" autodiff system. + +* **Reverse Mode Autodiff:** In reverse mode, the gradients are computed by traversing the computational graph backwards, from the output (typically a scalar loss function) to the inputs. This is highly efficient for functions with many inputs and a single output, which is common in machine learning. + +* **Jacobian-Vector Product (JVP):** Let `y = f(x)` be a vector-valued function, where `x` is an input vector and `y` is an output vector. The Jacobian matrix `J` contains all the first-order partial derivatives of `f`. The JVP is the product `Jv`, where `v` is a vector. It gives the directional derivative of `f` at `x` in the direction of `v`. *Forward mode* autodiff computes JVPs. + +* **Vector-Jacobian Product (VJP):** With `y = f(x)` as above, the VJP is the product `w^T J`, where `w` is a vector (often called the "upstream/puput gradient"). The VJP represents the gradient of a scalar-valued function that depends on `y` (e.g., a loss function), with respect to the input `x`. *Reverse mode* autodiff computes VJPs. + +* **Hessian-Vector Product (HVP):** The Hessian matrix `H` contains all the second-order partial derivatives of a scalar-valued function `f(x)`. The HVP is the product `Hv`, where `v` is a vector. It gives the second-order directional derivative of `f` at `x` in the direction of `v`. + +## Wrapper Design + +The wrapper provides three main classes: + +1. `SIRFTorchOperator`: Wraps a SIRF `Operator` (e.g., a projection operator). Computes the JVP forward and VJP in reverse. +2. `SIRFTorchObjectiveFunction`: Wraps a SIRF `ObjectiveFunction` for calculating its value forward, and gradient in reverse mode.
for the objective function the gradient output usually just 1 (i.e. the seed for reverse mode autodiff) so that's why I omitted it, but it is more clear if I add this in
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> + +## Wrapper Design + +The wrapper provides three main classes: + +1. `SIRFTorchOperator`: Wraps a SIRF `Operator` (e.g., a projection operator). Computes the JVP forward and VJP in reverse. +2. `SIRFTorchObjectiveFunction`: Wraps a SIRF `ObjectiveFunction` for calculating its value forward, and gradient in reverse mode. +3. `SIRFTorchObjectiveFunctionGradient`: Wraps a SIRF `ObjectiveFunction` for calculating its gradient forward and computing the Hessian-vector product in reverse mode. + +These classes use custom `torch.autograd.Function` implementations (`_Operator`, `_ObjectiveFunction`, and `_ObjectiveFunctionGradient`) to define the forward and backward passes, handling the conversions between PyTorch tensors and SIRF objects. + +### `_Operator` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Applies the SIRF `Operator.forward()` method. Let's represent the SIRF Operator as `A`. So, this step computes `y = Ax`, where `x` is the input (SIRF object) and `y` is the output (SIRF object).
Okay I will remove!
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> + +1. `SIRFTorchOperator`: Wraps a SIRF `Operator` (e.g., a projection operator). Computes the JVP forward and VJP in reverse. +2. `SIRFTorchObjectiveFunction`: Wraps a SIRF `ObjectiveFunction` for calculating its value forward, and gradient in reverse mode. +3. `SIRFTorchObjectiveFunctionGradient`: Wraps a SIRF `ObjectiveFunction` for calculating its gradient forward and computing the Hessian-vector product in reverse mode. + +These classes use custom `torch.autograd.Function` implementations (`_Operator`, `_ObjectiveFunction`, and `_ObjectiveFunctionGradient`) to define the forward and backward passes, handling the conversions between PyTorch tensors and SIRF objects. + +### `_Operator` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Applies the SIRF `Operator.forward()` method. Let's represent the SIRF Operator as `A`. So, this step computes `y = Ax`, where `x` is the input (SIRF object) and `y` is the output (SIRF object). + 3. Converts the result back to a PyTorch tensor. + 4. If the input tensor requires gradients, it saves relevant information (the output SIRF object and the operator) in the context (`ctx`) for use in the backward pass. + +* **Backward Pass (Adjoint Operation - VJP):**
I guess jacobian transpose/adjoint could work, but I will remove
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
thanks for the link @gschramm. We could maybe include it in the README. I think the main question for me is how much we should elaborate in our own doc, especially if it gets into detail on what is common usage vs what it actually does. I've put some detailed comments, and @Imraj-Singh will address those he agrees with :-). I'd personally prefer not too much (potentially confusing) detail in the README, but point to torch doc and our use cases (which could have a bit more doc then) or exercises.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> +3. `SIRFTorchObjectiveFunctionGradient`: Wraps a SIRF `ObjectiveFunction` for calculating its gradient forward and computing the Hessian-vector product in reverse mode. + +These classes use custom `torch.autograd.Function` implementations (`_Operator`, `_ObjectiveFunction`, and `_ObjectiveFunctionGradient`) to define the forward and backward passes, handling the conversions between PyTorch tensors and SIRF objects. + +### `_Operator` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Applies the SIRF `Operator.forward()` method. Let's represent the SIRF Operator as `A`. So, this step computes `y = Ax`, where `x` is the input (SIRF object) and `y` is the output (SIRF object). + 3. Converts the result back to a PyTorch tensor. + 4. If the input tensor requires gradients, it saves relevant information (the output SIRF object and the operator) in the context (`ctx`) for use in the backward pass. + +* **Backward Pass (Adjoint Operation - VJP):** + 1. Receives the "upstream gradient" (`grad_output`) – the gradient of the loss `L` with respect to the *output* of the forward pass: `∂L/∂y`. + 2. Converts `grad_output` to a SIRF object. + 3. Applies the SIRF `Operator.backward()` method (which computes the adjoint operation). The adjoint operation, `A^T`, effectively computes the VJP. If the forward pass computes `y = Ax`, the backward pass computes `x̄ = A^T ȳ`, where `A^T` is the adjoint of `A`, `ȳ` is `grad_output = ∂L/∂y`, and `x̄` is the gradient with respect to the input `x`. So, `x̄ = ∂L/∂x = (∂L/∂y)(∂y/∂x) = ȳ^T J = (A^T)ȳ`. Here, `J` is the Jacobian of the forward operation `y=Ax`, and it is implicitly represented by the operator `A` itself.
I would argue that it's always the Jacobian adjoint, as it's always some first order approximation. I am greatly simplifying it but making sure I am consistent with the explanation... Let me know what you think once I have rewritten
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
@KrisThielemans, @gschramm and @mehrhardt. I think this may be enough technicality/clarification to be useful for users? I may add more with regards to use cases/how people interact with the wrapper and gradchecking, but I think the technical aspects should (I hope) be okay.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> +def test_objective_function_gradcheck(test_data):
+ acq_data, image_data, acq_model, modality, data_type = test_data
+ if modality == "PET":
+ obj_fun = pet.make_Poisson_loglikelihood(acq_data)
+ else:
+ pytest.skip("Tests not set up for other modalities at this time.")
+
+ obj_fun.set_acquisition_model(acq_model)
+ obj_fun.set_up(image_data)
+
+ torch_obj_fun = SIRFTorchObjectiveFunction(obj_fun, image_data.clone())
+ torch_image_data = torch.tensor(image_data.as_array(), requires_grad=True).unsqueeze(0).cuda()
+
+
+ run_gradcheck(torch_obj_fun, torch_image_data, (modality, data_type), "Objective",
+ nondet_tol=1e-1, fast_mode=True, eps=1e-3, atol=1e-1, rtol=1e-1)
@gschramm I had to set up the tolerance for non-determinism to be extremely high... I think this may be troubles with computing the numerical Jacobian from a PNLL type function causes issues with numerical precision? I was wondering if you get this with your gradcheck in parallelproj. As a comparison I only wrapped the acquisition model and used torch's PNLL for the objective, and I still need rather high tolerances for non-determinism.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@mehrhardt commented on this pull request.
> @@ -0,0 +1,113 @@ +# SIRF-PyTorch Wrapper +This wrapper provides a bridge between the [SIRF](https://github.com/SyneRBI/SIRF) (Synergistic Image Reconstruction Framework) library and PyTorch, enabling the use of SIRF's image reconstruction operators and objective functions within PyTorch's automatic differentiation (autodiff) framework. + +## Forward and backward clarification + +The use of the terms forward and backward have different meaning given the context: +* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input.
Why do you refer to the derivatives as "scaled"?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> +# SIRF-PyTorch Wrapper +This wrapper provides a bridge between the [SIRF](https://github.com/SyneRBI/SIRF) (Synergistic Image Reconstruction Framework) library and PyTorch, enabling the use of SIRF's image reconstruction operators and objective functions within PyTorch's automatic differentiation (autodiff) framework. + +## Forward and backward clarification + +The use of the terms forward and backward have different meaning given the context: +* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input. +* Automatic differentiation: Forward (tangent) mode autodiff computes the (scaled directional) derivative, the Jacobian-Vector-Product (JVP), of the function along with the evaluation - computing derivatives from input to output. Backward (or reverse/adjoint) mode autodiff is the VJP - computing derivatives from output to input. +* Reverse-mode autodiff: Forward pass evaluates the function saving intermediate values from input to output. Backward pass uses the chain rule and intermediate values computing the derivatives from output to inputs/variables. +* `SIRFOperator`: For example acquistion models etc. In forward call we apply the operator, and in the backward we apply the Jacobian adjoint of the operator. + +### Importance in the implementation + +The wrapper is currently only for **reverse-mode** autodiff - there are JVP methods (for forward-mode autodiff) of `torch.autograd.Function` that are not used at present. + +For `SIRFOperator` we can wrap the adjoint operator with `AdjointOperator`, there quite confusingly the forward is the application of the adjoint, and the backward *is* the linear (Jacobian) component/approximation of the operator.
backward is the linear (Jacobian) component/approximation of the operator
I don't know what this means
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> @@ -0,0 +1,113 @@ +# SIRF-PyTorch Wrapper +This wrapper provides a bridge between the [SIRF](https://github.com/SyneRBI/SIRF) (Synergistic Image Reconstruction Framework) library and PyTorch, enabling the use of SIRF's image reconstruction operators and objective functions within PyTorch's automatic differentiation (autodiff) framework. + +## Forward and backward clarification + +The use of the terms forward and backward have different meaning given the context: +* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input.
The vector isn't neccessarily a unit vector. That said, I think I will just say "gradient" as that is correct and less confusing, maybe it will be more obvious this is just the chain rule. Also I say " Vector-Jacobian-adjoint-Product" I need to take the adjoint out... I will work on this a bit more. Thanks!
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> +# SIRF-PyTorch Wrapper +This wrapper provides a bridge between the [SIRF](https://github.com/SyneRBI/SIRF) (Synergistic Image Reconstruction Framework) library and PyTorch, enabling the use of SIRF's image reconstruction operators and objective functions within PyTorch's automatic differentiation (autodiff) framework. + +## Forward and backward clarification + +The use of the terms forward and backward have different meaning given the context: +* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input. +* Automatic differentiation: Forward (tangent) mode autodiff computes the (scaled directional) derivative, the Jacobian-Vector-Product (JVP), of the function along with the evaluation - computing derivatives from input to output. Backward (or reverse/adjoint) mode autodiff is the VJP - computing derivatives from output to input. +* Reverse-mode autodiff: Forward pass evaluates the function saving intermediate values from input to output. Backward pass uses the chain rule and intermediate values computing the derivatives from output to inputs/variables. +* `SIRFOperator`: For example acquistion models etc. In forward call we apply the operator, and in the backward we apply the Jacobian adjoint of the operator. + +### Importance in the implementation + +The wrapper is currently only for **reverse-mode** autodiff - there are JVP methods (for forward-mode autodiff) of `torch.autograd.Function` that are not used at present. + +For `SIRFOperator` we can wrap the adjoint operator with `AdjointOperator`, there quite confusingly the forward is the application of the adjoint, and the backward *is* the linear (Jacobian) component/approximation of the operator.
If we have some operator A(x) and it's adjoint B(x), the forward pass would be B(x), and the backward pass would be J_A(x) * grad_out. So for operator Ax+b, adjoint is A^T x and and backward pass is A grad_out? grad_out is pytorch for product of upstream gradients, i.e. upstream chain rule....
Perhaps I should change it to:
For
SIRFOperator, wrapping its adjoint withAdjointOperatorresults in: forward executing the adjoint operation, and backward computing the Jacobian-based gradient of the originalSIRFOperator.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
> @@ -0,0 +1,113 @@ +# SIRF-PyTorch Wrapper +This wrapper provides a bridge between the [SIRF](https://github.com/SyneRBI/SIRF) (Synergistic Image Reconstruction Framework) library and PyTorch, enabling the use of SIRF's image reconstruction operators and objective functions within PyTorch's automatic differentiation (autodiff) framework. + +## Forward and backward clarification + +The use of the terms forward and backward have different meaning given the context: +* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input.
I don't agree that "gradient is correct". The backward method multiplies its argument with the gradient of the function, which is the VJP of course. I'd personally remove "from output to input", as this terminology only makes sense if you think about this is inserted as a layer in a feed-forward network.
⬇️ Suggested change-* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input. +* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) multiplies its argument with the gradient of the function, i.e. computes the Vector-Jacobian-adjoint-Product (VJP).
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> @@ -0,0 +1,113 @@ +# SIRF-PyTorch Wrapper +This wrapper provides a bridge between the [SIRF](https://github.com/SyneRBI/SIRF
) (Synergistic Image Reconstruction Framework) library and PyTorch, enabling the use of SIRF's image reconstruction operators and objective functions within PyTorch's automatic differentiation (autodiff) framework. + +## Forward and backward clarification + +The use of the terms forward and backward have different meaning given the context: +* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input. +* Automatic differentiation: Forward (tangent) mode autodiff computes the (scaled directional) derivative, the Jacobian-Vector-Product (JVP), of the function along with the evaluation - computing derivatives from input to output. Backward (or reverse/adjoint) mode autodiff is the VJP - computing derivatives from output to input. +* Reverse-mode autodiff: Forward pass evaluates the function saving intermediate values from input to output. Backward pass uses the chain rule and intermediate values computing the derivatives from output to inputs/variables. +* `SIRFOperator`: For example acquistion models etc. In forward call we apply the operator, and in the backward we apply the Jacobian adjoint of the operator. + +### Importance in the implementation
not sure what this title is supposed to convey
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> +# SIRF-PyTorch Wrapper +This wrapper provides a bridge between the [SIRF](https://github.com/SyneRBI/SIRF) (Synergistic Image Reconstruction Framework) library and PyTorch, enabling the use of SIRF's image reconstruction operators and objective functions within PyTorch's automatic differentiation (autodiff) framework. + +## Forward and backward clarification + +The use of the terms forward and backward have different meaning given the context: +* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input. +* Automatic differentiation: Forward (tangent) mode autodiff computes the (scaled directional) derivative, the Jacobian-Vector-Product (JVP), of the function along with the evaluation - computing derivatives from input to output. Backward (or reverse/adjoint) mode autodiff is the VJP - computing derivatives from output to input. +* Reverse-mode autodiff: Forward pass evaluates the function saving intermediate values from input to output. Backward pass uses the chain rule and intermediate values computing the derivatives from output to inputs/variables. +* `SIRFOperator`: For example acquistion models etc. In forward call we apply the operator, and in the backward we apply the Jacobian adjoint of the operator. + +### Importance in the implementation + +The wrapper is currently only for **reverse-mode** autodiff - there are JVP methods (for forward-mode autodiff) of `torch.autograd.Function` that are not used at present. + +For `SIRFOperator` we can wrap the adjoint operator with `AdjointOperator`, there quite confusingly the forward is the application of the adjoint, and the backward *is* the linear (Jacobian) component/approximation of the operator.
I'm really not sure if this is helpful to mention at this point. It could be done in a use-case. But maybe we could say
⬇️ Suggested change-For `SIRFOperator` we can wrap the adjoint operator with `AdjointOperator`, there quite confusingly the forward is the application of the adjoint, and the backward *is* the linear (Jacobian) component/approximation of the operator. +Note that in certain cases it can be necessary to use the adjoint of an operator as a forward step, e.g. when adding a gradient-descent step as a layer. This can be achieved by wrapping `sirf.AdjointOperator(sirf_operator)`.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> + +1. `SIRFTorchOperator`: Wraps a SIRF `Operator` (e.g., a projection operator). Computes the JVP forward and VJP in reverse. +2. `SIRFTorchObjectiveFunction`: Wraps a SIRF `ObjectiveFunction` for calculating its value forward, and gradient in reverse mode. +3. `SIRFTorchObjectiveFunctionGradient`: Wraps a SIRF `ObjectiveFunction` for calculating its gradient forward and computing the Hessian-vector product in reverse mode. + +These classes use custom `torch.autograd.Function` implementations (`_Operator`, `_ObjectiveFunction`, and `_ObjectiveFunctionGradient`) to define the forward and backward passes, handling the conversions between PyTorch tensors and SIRF objects. + +### `_Operator` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Applies the SIRF `Operator.forward()` method. Let's represent the SIRF Operator as `A`. So, this step computes `y = Ax`, where `x` is the input (SIRF object) and `y` is the output (SIRF object). + 3. Converts the result back to a PyTorch tensor. + 4. If the input tensor requires gradients, it saves relevant information (the output SIRF object and the operator) in the context (`ctx`) for use in the backward pass. + +* **Backward Pass (Adjoint Operation - VJP):**
comment was resolved, but text removed yet.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
> + +* **Backward Pass (VJP):** + 1. Receives the upstream gradient (`grad_output`), in this case it is always a scalar. + 2. Gets the gradient of the *negative* objective function using `sirf_obj_func.get_gradient()`, which computed at the input and multiplied by the upstream gradient. + 3. Converts the SIRF gradient to a PyTorch tensor. + 4. Returns the gradient multiplied by `grad_output`. + + +### `_ObjectiveFunctionGradient` (Forward and Backward Passes) + +* **Forward Pass:** + 1. Converts the input PyTorch tensor to a SIRF object. + 2. Computes the *gradient* of the *negative* SIRF objective function using `sirf_obj_func.get_gradient()`, which is computed on the input. + 3. Returns the (negative) gradient as a PyTorch tensor. + +* **Backward Pass (JVP):**
VJP
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
> + self.unrolled_backward_test() + elif test_name == 'objective_function': + self.objective_function_test() + + + + def learned_primal_dual(self): + # use seed for reproducibility + + # set up learned primal dual network + PrimalConvBlocks = [ConvBlock() for i in range(3)] + DualConvBlocks = [ConvBlock() for i in range(3)] + # set up the forward and adjoint operators + sirf_image_template = self.image.get_uniform_copy(1) + FwdOperator = SIRFTorchOperator(self.acq_model, self.image.get_uniform_copy(1)) + AdjOperator = SIRFTorchOperator(sirf.SIRF.AdjointOperator(self.acq_model.get_linear_acquisition_model()), self.acq_data.get_uniform_copy(1))
I'm suprised by the sirf.SIRF.AdjointOperator. can't we use sirf.AdjointOperator? If not, shouldn't we be able to? (presumably some stuff to do with init.py)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
> @@ -0,0 +1,164 @@
+import os
+import numpy
+# Import the PET reconstruction engine
+import sirf.STIR as pet
+# Set the verbosity
+pet.set_verbosity(1)
+# Store temporary sinograms in RAM
+pet.AcquisitionData.set_storage_scheme("memory")
+import sirf
+msg = sirf.STIR.MessageRedirector(info=None, warn=None, errr=None)
+from sirf.Utilities import examples_data_path
+import matplotlib.pyplot as plt
+
+
+from SIRF_torch import *
let's not do that.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans requested changes on this pull request.
> +# +# based on +# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py + + +def sirf_to_torch( + sirf_src: sirf.SIRF.DataContainer | float, + device: torch.device, + requires_grad: bool = False + ) -> torch.Tensor: + """ + Converts a SIRF object to a PyTorch tensor. + + Args: + sirf_src: The SIRF object to convert. This can be a SIRF + `DataContainer`, `AcquisitionData`, `ImageData`, or a scalar (float).⬇️ Suggested change
- `DataContainer`, `AcquisitionData`, `ImageData`, or a scalar (float). + `DataContainer` such as `AcquisitionData`, `ImageData`, or a scalar (float).
> +# +# based on +# https://github.com/educating-dip/pet_deep_image_prior/blob/main/src/deep_image_prior/torch_wrapper.py + + +def sirf_to_torch( + sirf_src: sirf.SIRF.DataContainer | float, + device: torch.device, + requires_grad: bool = False + ) -> torch.Tensor: + """ + Converts a SIRF object to a PyTorch tensor. + + Args: + sirf_src: The SIRF object to convert. This can be a SIRF + `DataContainer`, `AcquisitionData`, `ImageData`, or a scalar (float).
Now that you've added float, it's tempting to add numpy objects...
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> @@ -0,0 +1,164 @@
+import os
+import numpy
+# Import the PET reconstruction engine
+import sirf.STIR as pet
+# Set the verbosity
+pet.set_verbosity(1)
+# Store temporary sinograms in RAM
+pet.AcquisitionData.set_storage_scheme("memory")
+import sirf
+msg = sirf.STIR.MessageRedirector(info=None, warn=None, errr=None)
+from sirf.Utilities import examples_data_path
+import matplotlib.pyplot as plt
+
+
+from SIRF_torch import *
sorry test_cases.py is old I will remove that file.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 3 commits.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> @@ -0,0 +1,113 @@ +# SIRF-PyTorch Wrapper +This wrapper provides a bridge between the [SIRF](https://github.com/SyneRBI/SIRF) (Synergistic Image Reconstruction Framework) library and PyTorch, enabling the use of SIRF's image reconstruction operators and objective functions within PyTorch's automatic differentiation (autodiff) framework. + +## Forward and backward clarification + +The use of the terms forward and backward have different meaning given the context: +* `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input. +* Automatic differentiation: Forward (tangent) mode autodiff computes the (scaled directional) derivative, the Jacobian-Vector-Product (JVP), of the function along with the evaluation - computing derivatives from input to output. Backward (or reverse/adjoint) mode autodiff is the VJP - computing derivatives from output to input. +* Reverse-mode autodiff: Forward pass evaluates the function saving intermediate values from input to output. Backward pass uses the chain rule and intermediate values computing the derivatives from output to inputs/variables. +* `SIRFOperator`: For example acquistion models etc. In forward call we apply the operator, and in the backward we apply the Jacobian adjoint of the operator. + +### Importance in the implementation
I took this out, but I was just trying to explain the awkwardness associated with wrapping an adjoint operator
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
> + self.unrolled_backward_test() + elif test_name == 'objective_function': + self.objective_function_test() + + + + def learned_primal_dual(self): + # use seed for reproducibility + + # set up learned primal dual network + PrimalConvBlocks = [ConvBlock() for i in range(3)] + DualConvBlocks = [ConvBlock() for i in range(3)] + # set up the forward and adjoint operators + sirf_image_template = self.image.get_uniform_copy(1) + FwdOperator = SIRFTorchOperator(self.acq_model, self.image.get_uniform_copy(1)) + AdjOperator = SIRFTorchOperator(sirf.SIRF.AdjointOperator(self.acq_model.get_linear_acquisition_model()), self.acq_data.get_uniform_copy(1))
I add something that did that... Will start a convo on the "fix"
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
@KrisThielemans this is one way of allowing sirf.DataContainer rather than sirf.SIRF.DataContainer. This may cause more overhead where just running import sirf. Perhaps @casperdcl or @paskino may know a better fix? Did we want the new torch things to be sirf.torch.Blah or just sirf.Blah... Also I am not sure if having the file named torch then importing torch will have some horrible circular import.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
Now works with MR, I am getting some very sinister intermittant segfault with setting up the sirf.STIR objective function @KrisThielemans @paskino. I do think I saw this in the PETRIC challenge. Really not sure how to diagnose and it may be SIRF/PyTorch compatibility issues, but I really don't know.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
Currently only PET, can be extended to MR. For the pet-varnet we get grad_output sometimes being negative giving errors with multiply_with_Hessian. Everything runs, and gradchecks do succeed so I think this is all correct. These are minimal examples though.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@KrisThielemans commented on this pull request.
Did we want the new torch things to be sirf.torch.Blah or just sirf.Blah...
Definitely not sirf.Blah. I'd be fine with sirf.torch.Blah
Also I am not sure if having the file named torch then importing torch will have some horrible circular import.
I don't think so, but maybe it confuses people. I suppose we could have sirf.torch_interface, but I find that rather long.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 2 commits.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
@KrisThielemans @paskino @casperdcl so this preloads the modules, probably a bad idea. I was trying to avoid the import sirf and import sirf.STIR and import sirf.Gadgetron and import sirf.torch, happy to change back. For sirf.SIRF I added in the all into sirf.SIRF
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh pushed 1 commit.
You are receiving this because you are subscribed to this thread.![]()
@Imraj-Singh commented on this pull request.
On src/torch/tests/varnet_minimal.py:
@gschramm this a minimal reproduction of your varnet. I am getting minuses in my projected input still. I think this is expected as the relu on ensures non-negativity in the forward pass.
Given the loss function there can be negative elements in the gradient even if the the relu is enforcing non-negativity in the forward pass.
@KrisThielemans this may mean that we require negative values in the input.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
Related to UCL/STIR#1477
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()