Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1027851: pytorch FTBFS with Python 3.11 as default version

21 views
Skip to first unread message

Aron Xu

unread,
Jan 26, 2023, 1:20:04 PM1/26/23
to
Hi


On Fri, Jan 27, 2023 at 1:27 AM Andreas Tille <and...@an3as.eu> wrote:
>
> Hi,
>
> I was checking this bug log and realised, that the "Forwarded" field
> links to an old PR[1] that was merged long before this bug was filed.
> Checking upstream I realised that the Debian package is lagging behind
> upstream releases.
>
> Could someone please give a status update what might be the plan to get
> pytorch back into testing (either be applying the PR patch to the old
> version or upgrading to latest upstream which will most probably support
> Python 3.11).
>

The packaging work of 1.13.1[1] has started on salsa. We still have a
failure related to fmtlib before making the package build successfully
[5/1781]. Both Mo and I have limited bandwidth here and help is always
appreciated.

[1]https://salsa.debian.org/deeplearning-team/pytorch

Regards,
Aron

Andreas Tille

unread,
Jan 27, 2023, 3:00:04 AM1/27/23
to
Hi Aron,

Am Fri, Jan 27, 2023 at 02:09:05AM +0800 schrieb Aron Xu:
>
> The packaging work of 1.13.1[1] has started on salsa. We still have a
> failure related to fmtlib before making the package build successfully
> [5/1781]. Both Mo and I have limited bandwidth here and help is always
> appreciated.

I've just checked the changelog and noticed:

Bump SOVERSION to 1.13

but we are in transition freeze. So this needs to be coordinated with
release team. I guess if we argue that 1.13 will support Python3.11
this could be some argument after the decision that this should be the
supported Python3 version for the next release.

Kind regards
Andreas.

> [1]https://salsa.debian.org/deeplearning-team/pytorch

--
http://fam-tille.de

Andreas Tille

unread,
Jan 27, 2023, 8:50:05 AM1/27/23
to
Am Fri, Jan 27, 2023 at 08:21:46PM +0800 schrieb Aron Xu:
> On Fri, Jan 27, 2023 at 7:12 PM Andreas Tille <ti...@debian.org> wrote:
> > make: *** [debian/rules:83: binary] Terminated
> > ninja: build stopped: interrupted by user.
> >
> > could be a sign for this. Was I to naive to assume Salsa CI could
> > manage a pytorch build and should we possibly switch this off again?
> >
>
> Not sure but by wild guess it could be caused by running for too long?

I do not think so. Since I was aware that it will take long I have
adjusted the timeout from 1h (default) to 4h. The log stops a bit after
3h. To my experience if timeout is the reason the log ends with this
information.

> I'm building and testing it with a quite high end configuration that's
> able to finish the build stage in a few minutes...

Amazing ...

So what help could I (as someone who does not know pytorch at all, just
maintains some packages that are depending from it) can I provide?

Kind regards
Andreas.

--
http://fam-tille.de

Aron Xu

unread,
Jan 27, 2023, 2:20:04 PM1/27/23
to
On Fri, Jan 27, 2023 at 9:42 PM Andreas Tille <ti...@debian.org> wrote:
>
> So what help could I (as someone who does not know pytorch at all, just
> maintains some packages that are depending from it) can I provide?
>

Here is the fail log just in case you can have a look...

/build/pytorch/build$ ninja
[1/5] Building CXX object
caffe2/torch/CMakeFiles/torch_python.dir/csrc/Exceptions.cpp.o
FAILED: caffe2/torch/CMakeFiles/torch_python.dir/csrc/Exceptions.cpp.o
/usr/bin/c++ -DAT_PER_OPERATOR_HEADERS -DBUILDING_TESTS
-DGFLAGS_IS_A_DLL=0 -DGLOG_CUSTOM_PREFIX_SUPPORT
-DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1
-DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
-DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx
-DTHP_BUILD_MAIN_LIB -DUSE_C10D -DUSE_C10D_GLOO -DUSE_DISTRIBUTED
-DUSE_EXTERNAL_MZCRC -DUSE_NUMPY -DUSE_RPC -DUSE_TENSORPIPE
-DUSE_VALGRIND -D_FILE_OFFSET_BITS=64 -Dtorch_python_EXPORTS
-I/build/pytorch/build/aten/src -I/build/pytorch/aten/src
-I/build/pytorch/build -I/build/pytorch
-I/build/pytorch/cmake/../third_party/benchmark/include
-I/build/pytorch/debian/foxi -I/build/pytorch/build/debian/foxi
-I/build/pytorch/torch/.. -I/build/pytorch/torch/../aten/src
-I/build/pytorch/torch/../aten/src/TH
-I/build/pytorch/build/caffe2/aten/src
-I/build/pytorch/build/third_party
-I/build/pytorch/build/third_party/onnx
-I/build/pytorch/torch/../third_party/valgrind-headers
-I/build/pytorch/torch/../third_party/gloo
-I/build/pytorch/torch/../third_party/onnx
-I/build/pytorch/torch/../third_party/flatbuffers/include
-I/build/pytorch/debian/kineto/libkineto/include
-I/build/pytorch/torch/csrc -I/build/pytorch/torch/csrc/api/include
-I/build/pytorch/torch/lib -I/build/pytorch/torch/lib/libshm
-I/build/pytorch/torch/csrc/api -I/build/pytorch/c10/..
-I/build/pytorch/torch/lib/libshm/../../../torch/lib -isystem
/build/pytorch/build/third_party/gloo -isystem
/build/pytorch/cmake/../third_party/gloo -isystem
/build/pytorch/cmake/../third_party/googletest/googlemock/include
-isystem /build/pytorch/cmake/../third_party/googletest/googletest/include
-isystem /usr/include/opencv4 -isystem /usr/include/eigen3 -isystem
/usr/include/python3.11 -isystem
/usr/lib/python3/dist-packages/numpy/core/include -Wdate-time
-D_FORTIFY_SOURCE=2 -g -O2 -ffile-prefix-map=/build/pytorch=.
-fstack-protector-strong -Wformat -Werror=format-security
-gsplit-dwarf -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp
-DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE
-DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra
-Werror=return-type -Werror=non-virtual-dtor
-Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds
-Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter
-Wno-unused-function -Wno-unused-result -Wno-strict-overflow
-Wno-strict-aliasing -Wno-error=deprecated-declarations
-Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic
-Wno-error=redundant-decls -Wno-error=old-style-cast
-fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable
-Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math
-Werror=format -Werror=cast-function-type -Wno-stringop-overflow
-DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O2 -g
-DNDEBUG -fPIC -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -Wno-unused-variable
-fno-strict-aliasing -Wno-write-strings -Wno-strict-aliasing
-std=gnu++14 -MD -MT
caffe2/torch/CMakeFiles/torch_python.dir/csrc/Exceptions.cpp.o -MF
caffe2/torch/CMakeFiles/torch_python.dir/csrc/Exceptions.cpp.o.d -o
caffe2/torch/CMakeFiles/torch_python.dir/csrc/Exceptions.cpp.o -c
/build/pytorch/torch/csrc/Exceptions.cpp
/build/pytorch/torch/csrc/Exceptions.cpp: In destructor
'torch::PyWarningHandler::~PyWarningHandler()':
/build/pytorch/torch/csrc/Exceptions.cpp:264:23: error: no matching
function for call to 'format_to(fmt::v9::memory_buffer&,
torch::PyWarningHandler::~PyWarningHandler()::<lambda()>::FMT_COMPILE_STRING,
std::__cxx11::basic_string<char>&, const char*&, uint32_t&)'
264 | fmt::format_to(
| ~~~~~~~~~~~~~~^
265 | buf,
| ~~~~
266 | FMT_STRING("{} (Triggered internally at {}:{}.)"),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
267 | msg,
| ~~~~
268 | source_location.file,
| ~~~~~~~~~~~~~~~~~~~~~
269 | source_location.line);
| ~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/fmt/format.h:48,
from /build/pytorch/torch/csrc/Exceptions.cpp:10:
/usr/include/fmt/core.h:3233:17: note: candidate: 'template<class
OutputIt, class ... T, typename
std::enable_if<fmt::v9::detail::is_output_iterator<OutputIt,
char>::value, int>::type <anonymous> > OutputIt
fmt::v9::format_to(OutputIt, format_string<T ...>, T&& ...)'
3233 | FMT_INLINE auto format_to(OutputIt out, format_string<T...>
fmt, T&&... args)
| ^~~~~~~~~
/usr/include/fmt/core.h:3233:17: note: template argument
deduction/substitution failed:
/usr/include/fmt/core.h:3232:11: error: no type named 'type' in
'struct std::enable_if<false, int>'
3232 | FMT_ENABLE_IF(detail::is_output_iterator<OutputIt,
char>::value)>
| ^~~~~~~~~~~~~
/usr/include/fmt/format.h:4202:17: note: candidate: 'template<class
OutputIt, class Locale, class ... T, typename
std::enable_if<(fmt::v9::detail::is_output_iterator<OutputIt,
char>::value && fmt::v9::detail::is_locale<Locale>::value), int>::type
<anonymous> > OutputIt fmt::v9::format_to(OutputIt, const Locale&,
format_string<T ...>, T&& ...)'
4202 | FMT_INLINE auto format_to(OutputIt out, const Locale& loc,
| ^~~~~~~~~
/usr/include/fmt/format.h:4202:17: note: template argument
deduction/substitution failed:
/usr/include/fmt/format.h:4200:11: error: no type named 'type' in
'struct std::enable_if<false, int>'
4200 | FMT_ENABLE_IF(detail::is_output_iterator<OutputIt,
char>::value&&
| ^~~~~~~~~~~~~
ninja: build stopped: subcommand failed.

Aron Xu

unread,
Jan 28, 2023, 10:20:04 PM1/28/23
to
On Fri, Jan 27, 2023 at 9:42 PM Andreas Tille <ti...@debian.org> wrote:
>
> Am Fri, Jan 27, 2023 at 08:21:46PM +0800 schrieb Aron Xu:
> > On Fri, Jan 27, 2023 at 7:12 PM Andreas Tille <ti...@debian.org> wrote:
> > > make: *** [debian/rules:83: binary] Terminated
> > > ninja: build stopped: interrupted by user.
> > >
> > > could be a sign for this. Was I to naive to assume Salsa CI could
> > > manage a pytorch build and should we possibly switch this off again?
> > >
> >
> > Not sure but by wild guess it could be caused by running for too long?
>
> I do not think so. Since I was aware that it will take long I have
> adjusted the timeout from 1h (default) to 4h. The log stops a bit after
> 3h. To my experience if timeout is the reason the log ends with this
> information.
>

Then I guess it could be out-of-memory, the build process is hungry
for RAM and a single cc1plus process can take at least up to 2GB
memory during my quick observation.

Regards,
Aron

Andreas Tille

unread,
Jan 29, 2023, 3:10:04 AM1/29/23
to
Hi,
I have no idea about fmtlib but I noticed:

[2022-09-04] fmtlib 9.1.0+ds1-2 MIGRATED to testing (Debian testing watch)
[2022-09-04] Accepted fmtlib 9.1.0+ds1-2 (source) into unstable (Shengjing Zhu)
[2022-08-27] Accepted fmtlib 9.1.0+ds1-1 (source) into experimental (Shengjing Zhu)
[2022-08-24] fmtlib 9.0.0+ds1-4 MIGRATED to testing (Debian testing watch)

Is this failure dating back to August last year and possibly connected to
the version bump from 9.00 to 9.1.0?

May be my question is naive but just asking.

Andreas Tille

unread,
Jan 30, 2023, 1:00:05 AM1/30/23
to
Am Sun, Jan 29, 2023 at 10:22:24AM -0500 schrieb M. Zhou:
> And Aron has uploaded pytorch to NEW.

It has cleared new quite quickly but is featuring an autopkgtest
regression[1]:


Traceback (most recent call last):
File "/tmp/autopkgtest-lxc.as10mbia/downtmp/build.o5C/src/test/run_test.py", line 22, in <module>
from torch.testing._internal.common_utils import (
File "/usr/lib/python3/dist-packages/torch/testing/_internal/common_utils.py", line 57, in <module>
import expecttest
ModuleNotFoundError: No module named 'expecttest'


Since we do not have this module[2] (yet) we should probably exclude all
tests that need this module, right? If you think its a nice thing to
have I would volunteer to package this in DPT.

Kind regards
Andreas.

[1] https://ci.debian.net/data/autopkgtest/testing/amd64/p/pytorch/30823657/log.gz
[2] https://pypi.org/project/expecttest/

--
http://fam-tille.de
0 new messages