Can 39;t Import Torch

2 views

Skip to first unread message

Victoria Steigerwald

unread,

Aug 4, 2024, 9:24:17 PM8/4/24

to rkureculam

Ihad already installed python on my computer and it worked. I used it in Eclipse, using pyDev, so I don't know if that could be the problem. Now I want to install pytorch, so I installed anaconda and entered the command for installing pytorch. To get the right command, I use -started/locally/, where I tried the options both with and without cuda. In both cases I get an error when I type "import torch".

I have also installed miniconda and tried the same with that without succes. I also tried to work in IDLE in stead of Eclipse, but I keep getting the "no module named 'torch'" error. Each time I run a command in anaconda it appears that the installation is succesfull, but I still can't import 'torch'.

Using anaconda, I think you can check to see if pytorch is properly installed inside your conda environment using conda list inside your environment. If it is shown in the list of installed packages, you can directly try to run python in command line and import torch as in the official Pytorch tutorial:

But then when I try to import torch into Jupyter notebooks I get an error message, that the module is not found. This seems bizarre as I followed all the instructions correctly, and the torch installation apparently worked.

You can see more about those new-ish magic commands and how they are now the best way to do installs to the same environment your kernel is running from here. More explanation/background can be seen here (way down on the page about the new magic for improving the mechanism) and here.

Cannot access pathlib.Path.mkdir.call from inside a workflow. If this is code from a module not used in a workflow or known to only be used deterministically from a workflow, mark the import as pass through.

Sandbox validation occurs on every file a workflow is in by default. You should make sure to pass through any third party modules (or even better, make your workflow a separate module/file from the rest of your code and pass through imports for the activities that you reference in the workflows).

Can you provide a complete standalone replication? Also are you passing through imports for non-stdlib/non-temporalio modules in files that contain workflows? This would include libraries like Pytorch and others. We strongly recommend you do this so they are not loaded in the sandbox. See related README sections here and here.

Also are you passing through imports for non-stdlib/non-temporalio modules in files that contain workflows? This would include libraries like Pytorch and others. We strongly recommend you do this so they are not loaded in the sandbox.

As for async activity completion - I personally like this pattern better - but I was asked to provide both a synchronous and asynchronous worker (technically activity) to ensure I support both styles of underlying python script.

PyTorch with DirectML provides an easy-to-use way for developers to try out the latest and greatest AI models on their Windows machine. You can download PyTorch with DirectML by installing the torch-directml PyPi package. Once set up, you can start with our samples or use the AI Toolkit for VS Code.

Download and install the Miniconda Windows installer on your system. There's additional guidance for setup on Anaconda's site. Once Miniconda is installed, create an environment using Python named pytdml, and activate it through the following commands.

Once you've installed the torch-directml package, you can verify that it runs correctly by adding two tensors. First start an interactive Python session, and import Torch with the following lines:

The current release of torch-directml is mapped to the "PrivateUse1" Torch backend. The torch_directml.device() API is a convenient wrapper for sending your tensors to the DirectML device.

The training script is very similar to a training script you might run outside of SageMaker, but youcan access useful properties about the training environment through various environment variables.For example:

SM_OUTPUT_DATA_DIR: A string representing the filesystem path to write output artifacts to. Output artifacts mayinclude checkpoints, graphs, and other files to save, not including model artifacts. These artifacts are compressedand uploaded to S3 to the same S3 prefix as the model artifacts.

A typical training script loads data from the input channels, configures training with hyperparameters, trains a model,and saves a model to model_dir so that it can be hosted later. Hyperparameters are passed to your script as argumentsand can be retrieved with an argparse.ArgumentParser instance. For example, a training script might startwith the following:

Because SageMaker imports your training script, you should put your training code in a main guard(if __name__=='__main__':) if you are using the same script to host your model, so that SageMaker does notinadvertently run your training code at the wrong point in execution.

In order to save your trained PyTorch model for deployment on SageMaker, your training script should save your modelto a certain filesystem path called model_dir. This value is accessible through the environment variableSM_MODEL_DIR. The following code demonstrates how to save a trained PyTorch model named model asmodel.pth at the :

After your training job is complete, SageMaker compresses and uploads the serialized model to S3, and your model datawill be available in the S3 output_path you specified when you created the PyTorch Estimator.

If there are other packages you want to use with your script, you can include a requirements.txt file in the same directory as your training script to install other dependencies at runtime. Both requirements.txt and your training script should be put in the same folder. You must specify this folder in source_dir argument when creating PyTorch estimator.

The function of installing packages using requirements.txt is supported for all PyTorch versions during training. When serving a PyTorch model, support for this function varies with PyTorch versions. For PyTorch 1.3.1 or newer, requirements.txt must be under folder code. The SageMaker PyTorch Estimator will automatically save code in model.tar.gz after training (assuming you set up your script and requirements.txt correctly as stipulated in the previous paragraph). In the case of bringing your own trained model for deployment, you must save requirements.txt under folder code in model.tar.gz yourself or specify it through dependencies. For PyTorch 1.2.0, requirements.txt is not supported for inference. For PyTorch 0.4.0 to 1.1.0, requirements.txt must be in source_dir.

A requirements.txt file is a text file that contains a list of items that are installed by using pip install. You can also specify the version of an item to install. For information about the format of a requirements.txt file, see Requirements Files in the pip documentation.

inputs: This can take one of the following forms: A stringS3 URI, for example s3://my-bucket/my-training-data. In thiscase, the S3 objects rooted at the my-training-data prefix willbe available in the default train channel. A dict fromstring channel names to S3 URIs. In this case, the objects rooted ateach S3 prefix will be available as files in each channel directory.

SageMaker supports the PyTorch DistributedDataParallel (DDP)package. You simply need to check the variables in your training script,such as the world size and the rank of the current host, when initializingprocess groups for distributed training.And then, launch the training job using thesagemaker.pytorch.estimator.PyTorch estimator classwith the pytorchddp option as the distribution strategy.

You can run multi-node distributed PyTorch training jobs using thesagemaker.pytorch.estimator.PyTorch estimator class.With instance_count=1, the estimator submits asingle-node training job to SageMaker; with instance_count greaterthan one, a multi-node training job is launched.

With the pytorchddp option, the SageMaker PyTorch estimator runs a SageMakertraining container for PyTorch, sets up the environment for MPI, and launchesthe training job using the mpirun command on each worker with the given informationduring the PyTorch DDP initialization.

SageMaker Training supports Amazon EC2 Trn1 instances powered byAWS Trainium device,the second generation purpose-built machine learning accelerator from AWS.Each Trn1 instance consists of up to 16 Trainium devices, and eachTrainium device consists of two NeuronCoresin the AWS Neuron Documentation.

You can run distributed training job on Trn1 instances.SageMaker supports the xla package through torchrun.With this, you do not need to manually pass RANK,WORLD_SIZE, MASTER_ADDR, and MASTER_PORT.You can launch the training job using thesagemaker.pytorch.estimator.PyTorch estimator classwith the torch_distributed option as the distribution strategy.

This torch_distributed support is availablein the AWS Deep Learning Containers for PyTorch Neuron starting v1.11.0.To find a complete list of supported versions of PyTorch Neuron, seeNeuron Containersin the AWS Deep Learning Containers GitHub repository.

You can run multi-node distributed PyTorch training jobs on Trn1 instances using thesagemaker.pytorch.estimator.PyTorch estimator class.With instance_count=1, the estimator submits asingle-node training job to SageMaker; with instance_count greaterthan one, a multi-node training job is launched.