NotImplementedError with PyTorch

Ryan Friedman

unread,

Mar 9, 2021, 6:12:57 PM3/9/21

to Selene (sequence-based deep learning package)

Hi all,

I'm trying to use Selene to create my own architecture to predict enhancer activity from Massively Parallel Reporter Assays. In the process of testing things out, I ran into a NotImplementedError. It appears to come from PyTorch and not Selene per se, but I suspect it has to do with how I set up my model so I'm hoping somebody can help me troubleshoot.

Currently, I am setting things up as a multi-class classification problem. I have a set of sequences that I have binned into one of four classes: Silencer, Inactive, Weak enhancer, and Strong enhancer. I am trying to do a CNN followed by a fully connected layer and then a LogSoftmax transform to predict the log probability that a sequence belongs to each of the four classes. Right now it is configured without a CUDA, but once things are working I plan to switch over to our GPUs.

I've attached three files: the Python file containing the model class, the YAML file with the configs, and a Jupyter notebook that I used to test things out. Briefly, the log looks as follows:

Outputs and logs saved to ./Data/SeleneFiles/enhancer_model_outputs/2021-03-09-16-55-19

2021-03-09 16:55:19,913 - Training parameters set: batch size 64, number of steps per 'epoch': 180, maximum number of steps: 80000

2021-03-09 16:55:19,914 - Creating validation dataset.

2021-03-09 16:55:19,918 - 0.0038712024688720703 s to load 1472 validation examples (23 validation batches) to evaluate after each training step.

---------------------------------------------------------------------------

NotImplementedError Traceback (most recent call last)

<ipython-input-15-9d4cdb07b008> in <module>

1 configs = selene_sdk.utils.load_path(os.path.join(output_dir, "train_eval_model.yml"))

----> 2 selene_sdk.utils.parse_configs_and_run(configs, lr=0.001)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py in parse_configs_and_run(configs, create_subdirectory, lr)

339 "Using a random seed ensures results are reproducible.")

340

--> 341 execute(operations, configs, current_run_output_dir)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py in execute(operations, configs, output_dir)

186 "evaluate" in operations:

187 train_model.create_test_set()

--> 188 train_model.train_and_validate()

189

190 elif op == "evaluate":

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/train_model.py in train_and_validate(self)

380 for step in range(self._start_step, self.max_steps):

381 t_i = time()

--> 382 train_loss = self.train()

383 t_f = time()

384 time_per_step.append(t_f - t_i)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/train_model.py in train(self)

462 targets = Variable(targets)

463

--> 464 predictions = self.model(inputs.transpose(1, 2))

465 loss = self.criterion(predictions, targets)

466

~/miniconda/envs/selene/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)

725 result = self._slow_forward(*input, **kwargs)

726 else:

--> 727 result = self.forward(*input, **kwargs)

728 for hook in itertools.chain(

729 _global_forward_hooks.values(),

~/miniconda/envs/selene/lib/python3.7/site-packages/torch/nn/modules/module.py in _forward_unimplemented(self, *input)

173 registered hooks while the latter silently ignores them.

174 """

--> 175 raise NotImplementedError

176

177

NotImplementedError:

Thanks in advance for any help!

Best,

Ryan Friedman

selene_setup-2021-03-01.ipynb

train_eval_model.yml

enhancer_model.py

chen.ka...@gmail.com

unread,

Mar 9, 2021, 6:31:47 PM3/9/21

to Selene (sequence-based deep learning package)

Apparently I can't update the google groups with my response through email... probably a setting I should fix haha. For now, reposting it here:

Hi Ryan,

Thanks for your message, great to hear about your use case and I'd love to get more feedback on your experience with Selene as you use it more. :)

I think this is because your forward function is indented too much - it looks like it's part of __init__ rather than a separate function in the model class.

Hope that solves it!

Kathy

Ryan Friedman

unread,

Mar 9, 2021, 6:33:19 PM3/9/21

to Selene (sequence-based deep learning package)

Hi Kathy,

Oops -- that's an embarrassingly simple error I should've caught. That looks to solve the problem. I'll be sure to give you feedback as I use Selene more.

Thanks for the quick response!

Best,

Ryan

Ryan Friedman

unread,

Mar 10, 2021, 10:00:33 PM3/10/21

to Selene (sequence-based deep learning package)

Ok, I'm running into a new problem now that has to do with the multi-class targets. Since I am doing a multi-class classification problem, I am using NLLLoss as my loss function. The expected shape of the input is (batch size, number of classes) and I have confirmed that is correct. The expected shape of the target should be (batch size).

Since I am using custom FASTA files, I created MAT files with the sequences and their targets based on the download_data.py script in the regression MPRA example. Based on the documentation for MatFileSampler, the targets get loaded as a (batch size, number of features) matrix -- so in this case a column vector. I suspect this is the reason for my error ("log 1" below), but I'm not sure how to sidestep it since it appears to be built into Selene.

I also tried to write the targets to the MAT file as a list rather than a (number of sequences, 1) column vector, but that gives me a different error ("log 2" below).

Thanks for your help!

Best,

Ryan

LOG 1

Outputs and logs saved to ./Data/SeleneFiles/enhancer_model_outputs/2021-03-10-20-50-47

2021-03-10 20:50:47,412 - Training parameters set: batch size 64, number of steps per 'epoch': 180, maximum number of steps: 80000

2021-03-10 20:50:47,412 - Creating validation dataset.

2021-03-10 20:50:47,416 - 0.0034127235412597656 s to load 1472 validation examples (23 validation batches) to evaluate after each training step.

---------------------------------------------------------------------------

RuntimeError Traceback (most recent call last)

<ipython-input-12-9d4cdb07b008> in <module>

1 configs = selene_sdk.utils.load_path(os.path.join(output_dir, "train_eval_model.yml"))

----> 2 selene_sdk.utils.parse_configs_and_run(configs, lr=0.001)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py in parse_configs_and_run(configs, create_subdirectory, lr)

339 "Using a random seed ensures results are reproducible.")

340

--> 341 execute(operations, configs, current_run_output_dir)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py in execute(operations, configs, output_dir)

186 "evaluate" in operations:

187 train_model.create_test_set()

--> 188 train_model.train_and_validate()

189

190 elif op == "evaluate":

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/train_model.py in train_and_validate(self)

380 for step in range(self._start_step, self.max_steps):

381 t_i = time()

--> 382 train_loss = self.train()

383 t_f = time()

384 time_per_step.append(t_f - t_i)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/train_model.py in train(self)

463

464 predictions = self.model(inputs.transpose(1, 2))

--> 465 loss = self.criterion(predictions, targets)

466

467 self.optimizer.zero_grad()

~/miniconda/envs/selene/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)

725 result = self._slow_forward(*input, **kwargs)

726 else:

--> 727 result = self.forward(*input, **kwargs)

728 for hook in itertools.chain(

729 _global_forward_hooks.values(),

~/miniconda/envs/selene/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, input, target)

211

212 def forward(self, input: Tensor, target: Tensor) -> Tensor:

--> 213 return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)

214

215

~/miniconda/envs/selene/lib/python3.7/site-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)

2262 .format(input.size(0), target.size(0)))

2263 if dim == 2:

-> 2264 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

2265 elif dim == 4:

2266 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

RuntimeError: 1D target tensor expected, multi-target not supported

LOG 2

Outputs and logs saved to ./Data/SeleneFiles/enhancer_model_outputs/2021-03-10-21-00-04

2021-03-10 21:00:04,255 - Training parameters set: batch size 64, number of steps per 'epoch': 180, maximum number of steps: 80000

2021-03-10 21:00:04,256 - Creating validation dataset.

---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

<ipython-input-14-9d4cdb07b008> in <module>

1 configs = selene_sdk.utils.load_path(os.path.join(output_dir, "train_eval_model.yml"))

----> 2 selene_sdk.utils.parse_configs_and_run(configs, lr=0.001)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py in parse_configs_and_run(configs, create_subdirectory, lr)

339 "Using a random seed ensures results are reproducible.")

340

--> 341 execute(operations, configs, current_run_output_dir)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py in execute(operations, configs, output_dir)

181 train_model_info.bind(output_dir=output_dir)

182

--> 183 train_model = instantiate(train_model_info)

184 # TODO: will find a better way to handle this in the future

185 if "load_test_set" in configs and configs["load_test_set"] and \

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/config.py in instantiate(proxy, bindings)

237 bindings = {}

238 if isinstance(proxy, _Proxy):

--> 239 return _instantiate_proxy_tuple(proxy, bindings)

240 elif isinstance(proxy, dict):

241 # Recurse on the keys too, for backward compatibility.

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/config.py in _instantiate_proxy_tuple(proxy, bindings)

142 kwargs = dict((k, instantiate(v, bindings))

143 for k, v in six.iteritems(proxy.keywords))

--> 144 obj = proxy.callable(**kwargs)

145 try:

146 obj.yaml_src = proxy.yaml_src

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/train_model.py in __init__(self, model, data_sampler, loss_criterion, optimizer_class, optimizer_kwargs, batch_size, max_steps, report_stats_every_n_steps, output_dir, save_checkpoint_every_n_steps, save_new_checkpoints_after_n_steps, report_gt_feature_n_positives, n_validation_samples, n_test_samples, cpu_n_threads, use_cuda, data_parallel, logging_verbosity, checkpoint_resume, metrics)

243 verbosity=logging_verbosity)

244

--> 245 self._create_validation_set(n_samples=n_validation_samples)

246 self._validation_metrics = PerformanceMetrics(

247 self.sampler.get_feature_from_index,

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/train_model.py in _create_validation_set(self, n_samples)

312 self._validation_data, self._all_validation_targets = \

313 self.sampler.get_validation_set(

--> 314 self.batch_size, n_samples=n_samples)

315 t_f = time()

316 logger.info(("{0} s to load {1} validation examples ({2} validation "

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/samplers/multi_file_sampler.py in get_validation_set(self, batch_size, n_samples)

170 """

171 return self._samplers["validate"].get_data_and_targets(

--> 172 batch_size, n_samples)

173

174 def get_test_set(self, batch_size, n_samples=None):

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/samplers/file_samplers/mat_file_sampler.py in get_data_and_targets(self, batch_size, n_samples)

243 count = batch_size

244 while count < n_samples:

--> 245 seqs, tgts = self.sample(batch_size=batch_size)

246 sequences_and_targets.append((seqs, tgts))

247 targets_mat.append(tgts)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/samplers/file_samplers/mat_file_sampler.py in sample(self, batch_size)

162 if self._sample_tgts is not None:

163 if self._tgts_batch_axis == 0:

--> 164 targets = self._sample_tgts[use_indices, :].astype(float)

165 else:

166 targets = self._sample_tgts[:, use_indices].astype(float)

IndexError: index 1 is out of bounds for axis 0 with size 1

chen.ka...@gmail.com

unread,

Mar 14, 2021, 3:34:26 PM3/14/21

to Selene (sequence-based deep learning package)

Hi Ryan,

Thanks for your question - I'm a little confused about the difference between LOG1 and LOG2. Specifically, are you using MatFileSampler in both cases? Is the branching for training setup different (i.e. does `_create_validation_set` not get called in the LOG1 setup?)

Also, what are the expected inputs to NLLLoss? Is the issue that the output from your model `forward` function doesn't match the expected input to `NLLLoss`? (Or is there some additional postprocessing required after passing it into the model?)

Kathy

Ryan Friedman

unread,

Mar 15, 2021, 11:37:02 AM3/15/21

to Selene (sequence-based deep learning package)

Hi Kathy,

Sorry for the confusion. Yes, I am using MatFileSampler in both cases. I haven't changed anything in the config file between the two cases. The only difference between the two is that in LOG1, the targets have the shape (batch size, 1) while in LOG2, the targets have the shape (batch size, )

The expected input to NLLLoss is log-probabilities of each class as the shape (batch size, C), where C is the number of classes. The targets it expects is a class index in the range [0, C - 1].

I have confirmed that the output of the forward function has the shape (64, 4), which is my expected batch size and number of classes. To be clear, I am doing multi-class classification and not multi-task prediction. In other words, every sequence should have the target 0, 1, 2, or 3, which are mutually exclusive classes.

Based on the error in LOG1 (RuntimeError: 1D target tensor expected, multi-target not supported), I suspect the problem is that Selene is representing the targets as a matrix, rather than a vector. This is based on the documentation for MatFileSampler.get_data_and_targets, which states that targets_matrix has the shape (number of samples, number of features). I tested this out and confirmed that the output of this has the shape (number of samples, 1). What I need is for it to be flattened into the shape (number of samples, ). I tried doing this by simply saving the data in the .mat file with that shape, but then MatFileSampler cannot load the data as indicated by the error in LOG2.

Is that clear? I presume that I could write some overhead code to deal with this, but I was hoping that I could just use selene_sdk.utils.parse_configs_and_run to fit the model.

Ryan

Jian Zhou

unread,

Mar 15, 2021, 12:49:52 PM3/15/21

to Selene (sequence-based deep learning package)

Hi Ryan,

I think one solution is to keep the target of shape (number of samples, 1) and write a custom Loss function to handle target of this shape (and conversion of target to long). I believe something like the below should work

from torch import nn

class NLLLoss_(nn.Module):

def __init__(self):

super(NLLLoss_, self).__init__()

self.nllloss = NLLLoss()

def forward(self, input, target):

return self.nllloss(input, target[:,0].long())

Jian

Ryan Friedman

unread,

Mar 15, 2021, 4:18:41 PM3/15/21

to Selene (sequence-based deep learning package)

Hi Jian,

Thanks, that suggestion works great for the loss problem. However, it still creates problems for the validation metrics (i.e. calculating the AUROC and AUPR). Based on the stack trace (below), it looks to be the same problem. Specifically, performance_metrics.compute_score seems to expect that targets and predictions are the same shape.

One possibility I thought of is to one-hot encode the targets. I think this would solve the problem with the validation metrics. I would need to further customize my Loss function to handle this, but I don't imagine this would be too much effort.

Does this sound reasonable?

Thanks for all your help!

Ryan

---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

<ipython-input-24-9d4cdb07b008> in <module>

1 configs = selene_sdk.utils.load_path(os.path.join(output_dir, "train_eval_model.yml"))

----> 2 selene_sdk.utils.parse_configs_and_run(configs, lr=0.001)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py in parse_configs_and_run(configs, create_subdirectory, lr)

339 "Using a random seed ensures results are reproducible.")

340

--> 341 execute(operations, configs, current_run_output_dir)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py in execute(operations, configs, output_dir)

186 "evaluate" in operations:

187 train_model.create_test_set()

--> 188 train_model.train_and_validate()

189

190 elif op == "evaluate":

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/train_model.py in train_and_validate(self)

410 step, 1. / np.average(time_per_step)))

411 time_per_step = []

--> 412 valid_scores = self.validate()

413 validation_loss = valid_scores["loss"]

414 self._train_logger.info(train_loss)

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/train_model.py in validate(self)

529 self._validation_data)

530 average_scores = self._validation_metrics.update(all_predictions,

--> 531 self._all_validation_targets)

532 for name, score in average_scores.items():

533 logger.info("validation {0}: {1}".format(name, score))

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/performance_metrics.py in update(self, prediction, target)

386 avg_score, feature_scores = compute_score(

387 prediction, target, metric.fn,

--> 388 report_gt_feature_n_positives=self.skip_threshold)

389 metric.data.append(feature_scores)

390 metric_scores[name] = avg_score

~/miniconda/envs/selene/lib/python3.7/site-packages/selene_sdk/utils/performance_metrics.py in compute_score(prediction, target, metric_fn, report_gt_feature_n_positives)

202 feature_scores = np.ones(target.shape[1]) * np.nan

203 for index, feature_preds in enumerate(prediction.T):

--> 204 feature_targets = target[:, index]

205 if len(np.unique(feature_targets)) > 0 and \

206 np.count_nonzero(feature_targets) > report_gt_feature_n_positives:

IndexError: index 1 is out of bounds for axis 1 with size 1

Jian Zhou

unread,

Mar 15, 2021, 5:19:30 PM3/15/21

to Selene (sequence-based deep learning package)

Hi Ryan,

Yes - agree that using one-hot encoding and change the custom loss function to handle it is a good way to solve it!

Jian

Reply all

Reply to author

Forward