Possible bug of invalid assembly code

53 views
Skip to first unread message

Siyuan Ren

unread,
Apr 20, 2015, 11:24:16 PM4/20/15
to cryptop...@googlegroups.com
I tried to compile Crypto++ under OS X Yosemite with assembly enabled, and got tons of incomprehensible compilation errors. Some of the earlier posts had me believe that it was the fault of the outdated assembler which comes with OS X, so I reported it as a potential bug to Apple. After several conversions, they replied that

This issue behaves as intended based on the following:

The problem here is that the integrated assembler is more strict than gas(1) or as(1) on operand sizes. The "rdi" reference here is not valid for a MOVD instruction, with requires a 32-bit GPR register. Depending on what was intended by the code, either "movd edi, xmm0" or "movq rdi, xmm0" will be a correct fix.

That is, this is a project bug that the compiler is correctly diagnosing. Previous assemblers incorrectly did not issue an error here.

The diagnostic, however, is downright terrible with the huge blob of inline asm all being on one line like that. We will work on fixing that.


For example, in the source of gcm.cpp expanded by the preprocessor, there are codes like

  "psrldq xmm0" ", " "15" ";"
  "movd rdi" ", " "xmm0" ";"
  "movzx eax" ", " "WORD PTR [r10 + rdi*2]" ";"
  "shl eax" ", " "8" ";"

Will these be fixed in future revisions?

Jeffrey Walton

unread,
Apr 21, 2015, 4:39:29 PM4/21/15
to cryptop...@googlegroups.com


On Monday, April 20, 2015 at 11:24:16 PM UTC-4, Siyuan Ren wrote:
I tried to compile Crypto++ under OS X Yosemite with assembly enabled, and got tons of incomprehensible compilation errors. Some of the earlier posts had me believe that it was the fault of the outdated assembler which comes with OS X, so I reported it as a potential bug to Apple. After several conversions, they replied that

This issue behaves as intended based on the following:

The problem here is that the integrated assembler is more strict than gas(1) or as(1) on operand sizes. The "rdi" reference here is not valid for a MOVD instruction, with requires a 32-bit GPR register. Depending on what was intended by the code, either "movd edi, xmm0" or "movq rdi, xmm0" will be a correct fix.

That is, this is a project bug that the compiler is correctly diagnosing. Previous assemblers incorrectly did not issue an error here.

The diagnostic, however, is downright terrible with the huge blob of inline asm all being on one line like that. We will work on fixing that.

Oh, that's interesting.

I usually fail at a different point when trying to enable ASM on OS X (its related to Intel vs AT&T assembler).
 
For example, in the source of gcm.cpp expanded by the preprocessor, there are codes like

  "psrldq xmm0" ", " "15" ";"
  "movd rdi" ", " "xmm0" ";"
  "movzx eax" ", " "WORD PTR [r10 + rdi*2]" ";"
  "shl eax" ", " "8" ";"

Will these be fixed in future revisions?

Hopefully.

 The first step is reaching out through the mailing list. The second step is writing to Wei directly. See the bottom of the Crypto++ homepage (http://www.cryptopp.com/).

Siyuan Ren

unread,
Apr 21, 2015, 9:36:35 PM4/21/15
to cryptop...@googlegroups.com
I usually fail at a different point when trying to enable ASM on OS X (its related to Intel vs AT&T assembler).

According to Apple, both their gas and the embedded assembler in clang++ support both Intel and AT&T style.

The second step is writing to Wei directly. 

Did he not monitor this user group?

Jeffrey Walton

unread,
Jul 22, 2015, 2:45:25 AM7/22/15
to Crypto++ Users


On Monday, April 20, 2015 at 11:24:16 PM UTC-4, Siyuan Ren wrote:

I finally was able to clear some time and deep dive this issue.

The answer to "will they be fix" is Yes.

There are four or five issues in play. Three or four stem from LLVM and the integrated assembler. The last  is undecided, but I *think* it has to do with the incompatibility discussed http://clang.llvm.org/compatibility.html#inline-asm.

Regarding the integrated assembler, there are two assemblers available: one is the integrated assembler built into Clang. The second is the traditional assembler, like GNU AS (GAS). The former is giving us the trouble at the moment (not the latter)

Issue 1: Crypto++ misidentifies the opportunity to provide ASM implementation because Clang returns an error rather than a version string. You see the misidentification by way of -DCRYPTOPP_DISABLE_ASM. A bug report was filed at https://llvm.org/bugs/show_bug.cgi?id=24200. This will become an issue in the future, but its OK for now.

Issue 2: LLVM Bug 18916 - "don't err on '.att_syntax prefix'", https://llvm.org/bugs/show_bug.cgi?id=18916. Right now we are keying on a config.h define called WORKAROUND_LLVM_BUG_18916. We use it to fixup the string literals with the {prefix|noprefix} part. This issue is effectively cleared now.

Issue 3: We have no way to tell when the integrated assembler is being used at the source level (think preprocessor macros). That's going to create problems later, when the 18916 bug is fixed. Right now we are keying on a config.h define called WORKAROUND_LLVM_BUG_18916. I have an idea of how to fix it. I'll take it to the list when the time is right.

Issue 4: the integrated assembler requires each ASM statement to be on a separate line; while GCC effectively allows them to be concatenated with a semi-colon. Remember WORKAROUND_LLVM_BUG_18916? When it get fixed and WORKAROUND_LLVM_BUG_18916 goes away, we won't know when to add the newlines. This issue is effectively cleared now.

Issue 5: I suspect this issue is related to http://clang.llvm.org/compatibility.html#inline-asm. With the other issues cleared, that's all I have remaining to clear and then we can review and check-in. The issues are like the following, and produce an "Illegal operand size" or similar.

    movd rdi, mmx0

**********

Related to issue (1), we can do this to tame use of DCRYPTOPP_DISABLE_ASM because `GAS210_OR_LATER ?= ...` allows an environment or command line override.  Its kind of a hack, but it will be addressed properly in the future.

This will get you into the ASM code paths:

    make GAS210_OR_LATER=1 GAS217_OR_LATER=1 GAS219_OR_LATER=1

$ cat GNUmakefile.diff
diff --git a/GNUmakefile b/GNUmakefile
index 2b5bcc2..3f48cc5 100644
--- a/GNUmakefile
+++ b/GNUmakefile
@@ -25,9 +25,9 @@ ifeq ($(PREFIX),)
 PREFIX = /usr
 endif
 
-# Sadly, we can't actually use GCC_PRAGMA_AWARE because of GCC bug 53431.
-# Its a shame because GCC has so much to offer by the way of analysis.
-# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53431
+# Sadly, we can't actually use GCC_PRAGMA_AWARE with GCC because of GCC bug 53431.
+# Its a shame because GCC has so much to offer by the way of analysis. Clang works
+# as expected, though. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53431
 ifneq ($(CLANG_COMPILER),0)
 CXXFLAGS += -Wall
 endif
@@ -36,26 +36,41 @@ ifeq ($(IS_X86),1)
 
 GCC42_OR_LATER = $(shell $(CXX) -v 2>&1 | $(EGREP) -i -c "^gcc version (4.[2-9]|[5-9])")
 ICC111_OR_LATER = $(shell $(CXX) --version 2>&1 | $(EGREP) -c "\(ICC\) ([2-9][0-9]|1[2-9]|11\.[1-9])")
-GAS210_OR_LATER = $(shell $(CXX) -xc -c /dev/null -Wa,-v -o/dev/null 2>&1 | $(EGREP) -c "GNU assembler version (2\.[1-9][0-9]|[3-9])")
-GAS217_OR_LATER = $(shell $(CXX) -xc -c /dev/null -Wa,-v -o/dev/null 2>&1 | $(EGREP) -c "GNU assembler version (2\.1[7-9]|2\.[2-9]|[3-9])")
-GAS219_OR_LATER = $(shell $(CXX) -xc -c /dev/null -Wa,-v -o/dev/null 2>&1 | $(EGREP) -c "GNU assembler version (2\.19|2\.[2-9]|[3-9])")
+GAS210_OR_LATER ?= $(shell $(CXX) -xc -c /dev/null -Wa,-v -o/dev/null 2>&1 | $(EGREP) -c "GNU assembler version (2\.[1-9][0-9]|[3-9])")
+GAS217_OR_LATER ?= $(shell $(CXX) -xc -c /dev/null -Wa,-v -o/dev/null 2>&1 | $(EGREP) -c "GNU assembler version (2\.1[7-9]|2\.[2-9]|[3-9])")
+GAS219_OR_LATER ?= $(shell $(CXX) -xc -c /dev/null -Wa,-v -o/dev/null 2>&1 | $(EGREP) -c "GNU assembler version (2\.19|2\.[2-9]|[3-9])")
 
 # Enable PIC for x86_64 targets
 ifneq ($(IS_X86_64),0)
 CXXFLAGS += -fPIC
 endif # PIC for x86_64 targets
 
-# Undefined Behavior Sanitzier (Clang and G++)
+# Undefined Behavior Sanitizer (Clang 3.2 and GCC 4.8 and above)
 ifeq ($(findstring ubsan,$(MAKECMDGOALS)),ubsan)
 CXXFLAGS += -fsanitize=undefined
-# CXXFLAGS += -fsanitize-undefined-trap-on-error
+# CXXFLAGS += -fsanitize-undefined-trap-on-error
 endif # UBsan
 
-# Address Sanitzier (Clang and G++)
+# Address Sanitizer (Clang 3.2 and GCC 4.8 and above)
 ifeq ($(findstring asan,$(MAKECMDGOALS)),asan)
 CXXFLAGS += -fsanitize=address
 endif # Asan
 
+# Test CXXFLAGS in case the user passed the flags directly through it
+ASAN = 0
+UBSAN = 0
+ifeq ($(findstring -fsanitize=address,$(CXXFLAGS)),-fsanitize=address)
+ASAN = 1
+endif
+ifeq ($(findstring -fsanitize=undefined,$(CXXFLAGS)),-fsanitize=undefined)
+UBSAN = 1
+endif
+
+# Enforce Sanitizer business logic...
+ifeq ($(ASAN)$(UBSAN),11)
+$(error Asan and UBsan are mutually exclusive)
+endif
+
 # Cygwin work arounds
 ifneq ($(IS_CYGWIN),0)
 
**********

Jeff

Jeffrey Walton

unread,
Jul 22, 2015, 4:54:02 AM7/22/15
to Crypto++ Users


On Wednesday, July 22, 2015 at 2:45:25 AM UTC-4, Jeffrey Walton wrote:


On Monday, April 20, 2015 at 11:24:16 PM UTC-4, Siyuan Ren wrote:
I tried to compile Crypto++ under OS X Yosemite with assembly enabled, and got tons of incomprehensible compilation errors. Some of the earlier posts had me believe that it was the fault of the outdated assembler which comes with OS X, so I reported it as a potential bug to Apple. After several conversions, they replied that

This issue behaves as intended based on the following:

The problem here is that the integrated assembler is more strict than gas(1) or as(1) on operand sizes. The "rdi" reference here is not valid for a MOVD instruction, with requires a 32-bit GPR register. Depending on what was intended by the code, either "movd edi, xmm0" or "movq rdi, xmm0" will be a correct fix.

That is, this is a project bug that the compiler is correctly diagnosing. Previous assemblers incorrectly did not issue an error here.

The diagnostic, however, is downright terrible with the huge blob of inline asm all being on one line like that. We will work on fixing that.


For example, in the source of gcm.cpp expanded by the preprocessor, there are codes like

  "psrldq xmm0" ", " "15" ";"
  "movd rdi" ", " "xmm0" ";"
  "movzx eax" ", " "WORD PTR [r10 + rdi*2]" ";"
  "shl eax" ", " "8" ";"

Will these be fixed in future revisions?

I finally was able to clear some time and deep dive this issue.

The answer to "will they be fix" is Yes.

There are four or five issues in play. Three or four stem from LLVM and the integrated assembler. The last  is undecided, but I *think* it has to do with the incompatibility discussed http://clang.llvm.org/compatibility.html#inline-asm.

Regarding the integrated assembler, there are two assemblers available: one is the integrated assembler built into Clang. The second is the traditional assembler, like GNU AS (GAS). The former is giving us the trouble at the moment (not the latter)

Issue 1: Crypto++ misidentifies the opportunity to provide ASM implementation because Clang returns an error rather than a version string. You see the misidentification by way of -DCRYPTOPP_DISABLE_ASM. A bug report was filed at https://llvm.org/bugs/show_bug.cgi?id=24200. This will become an issue in the future, but its OK for now.

Issue 2: LLVM Bug 18916 - "don't err on '.att_syntax prefix'", https://llvm.org/bugs/show_bug.cgi?id=18916. Right now we are keying on a config.h define called WORKAROUND_LLVM_BUG_18916. We use it to fixup the string literals with the {prefix|noprefix} part. This issue is effectively cleared now.

Issue 3: We have no way to tell when the integrated assembler is being used at the source level (think preprocessor macros). That's going to create problems later, when the 18916 bug is fixed. Right now we are keying on a config.h define called WORKAROUND_LLVM_BUG_18916. I have an idea of how to fix it. I'll take it to the list when the time is right.

Issue 4: the integrated assembler requires each ASM statement to be on a separate line; while GCC effectively allows them to be concatenated with a semi-colon. Remember WORKAROUND_LLVM_BUG_18916? When it get fixed and WORKAROUND_LLVM_BUG_18916 goes away, we won't know when to add the newlines. This issue is effectively cleared now.

Issue 5: I suspect this issue is related to http://clang.llvm.org/compatibility.html#inline-asm. With the other issues cleared, that's all I have remaining to clear and then we can review and check-in. The issues are like the following, and produce an "Illegal operand size" or similar.

    movd rdi, mmx0

We loaded up a few issues in the issue tracker related to the 4 or 5 described above to help track the cause and correction

  * Library fails to detect when SSE, SSE2, SSE3 are available under Clang, https://github.com/weidai11/cryptopp/issues/9
  * Library does not compile due to inline assembly and ".att_sytax noprefix" and ".intel_syntax prefix" under Clang, https://github.com/weidai11/cryptopp/issues/10
  * Need extensible way to handle Clang Integrated Assembler/LLVM Issues 18916 and 24200, https://github.com/weidai11/cryptopp/issues/11
  * Clang integrated assembler and "invalid operand for instruction" error, https://github.com/weidai11/cryptopp/issues/12

 
 

Jeffrey Walton

unread,
Jul 23, 2015, 1:05:36 PM7/23/15
to Crypto++ Users, nolo...@gmail.com

Will these be fixed in future revisions?

I finally was able to clear some time and deep dive this issue.

The answer to "will they be fix" is Yes.

There are four or five issues in play. Three or four stem from LLVM and the integrated assembler. The last  is undecided, but I *think* it has to do with the incompatibility discussed http://clang.llvm.org/compatibility.html#inline-asm.

Regarding the integrated assembler, there are two assemblers available: one is the integrated assembler built into Clang. The second is the traditional assembler, like GNU AS (GAS). The former is giving us the trouble at the moment (not the latter)

Issue 1: Crypto++ misidentifies the opportunity to provide ASM implementation because Clang returns an error rather than a version string. You see the misidentification by way of -DCRYPTOPP_DISABLE_ASM. A bug report was filed at https://llvm.org/bugs/show_bug.cgi?id=24200. This will become an issue in the future, but its OK for now.

Issue 2: LLVM Bug 18916 - "don't err on '.att_syntax prefix'", https://llvm.org/bugs/show_bug.cgi?id=18916. Right now we are keying on a config.h define called WORKAROUND_LLVM_BUG_18916. We use it to fixup the string literals with the {prefix|noprefix} part. This issue is effectively cleared now.

Issue 3: We have no way to tell when the integrated assembler is being used at the source level (think preprocessor macros). That's going to create problems later, when the 18916 bug is fixed. Right now we are keying on a config.h define called WORKAROUND_LLVM_BUG_18916. I have an idea of how to fix it. I'll take it to the list when the time is right.

Issue 4: the integrated assembler requires each ASM statement to be on a separate line; while GCC effectively allows them to be concatenated with a semi-colon. Remember WORKAROUND_LLVM_BUG_18916? When it get fixed and WORKAROUND_LLVM_BUG_18916 goes away, we won't know when to add the newlines. This issue is effectively cleared now.

Issue 5: I suspect this issue is related to http://clang.llvm.org/compatibility.html#inline-asm. With the other issues cleared, that's all I have remaining to clear and then we can review and check-in. The issues are like the following, and produce an "Illegal operand size" or similar.

    movd rdi, mmx0

Add two more LLVM/Clang issues:

    * LLVM Bug 24232 - "[X86] Inline assembly operands don't work with .intel_syntax", https://llvm.org/bugs/show_bug.cgi?id=24232
    * LLVM Bug 24226, "Constant not propagated into inline assembly, results in "constraint 'I' expects an integer constant expression"", https://llvm.org/bugs/show_bug.cgi?id=24226

I have not seen Crypto++ break a compiler like this since the 1990s...

Jeff
Reply all
Reply to author
Forward
0 new messages