asm: Adding MVCLE Instruction into asm for s390x
MVCLE instruction is necessary to further improve Memmove routine performance.
This change will add MVCLE into the Go asm tool for s390x architecture.
diff --git a/src/cmd/asm/internal/asm/testdata/s390x.s b/src/cmd/asm/internal/asm/testdata/s390x.s
index 95a8c50..d6540bd 100644
--- a/src/cmd/asm/internal/asm/testdata/s390x.s
+++ b/src/cmd/asm/internal/asm/testdata/s390x.s
@@ -266,6 +266,7 @@
MVCIN $8, (R15), n-8(SP) // e807f010f000
CLC $8, (R15), n-8(SP) // d507f000f010
XC $256, -8(R15), -8(R15) // b90400afc2a8fffffff8d7ffa000a000
+ MVCLE 0, R4, R6 // a8640000
MVC $256, 8192(R1), 8192(R2) // b90400a2c2a800002000b90400b1c2b800002000d2ffa000b000
CMP R1, R2 // b9200012
diff --git a/src/cmd/internal/obj/s390x/a.out.go b/src/cmd/internal/obj/s390x/a.out.go
index 3eed462..1a64370 100644
--- a/src/cmd/internal/obj/s390x/a.out.go
+++ b/src/cmd/internal/obj/s390x/a.out.go
@@ -444,6 +444,7 @@
// storage-and-storage
AMVC
AMVCIN
+ AMVCLE
ACLC
AXC
AOC
diff --git a/src/cmd/internal/obj/s390x/anames.go b/src/cmd/internal/obj/s390x/anames.go
index ae86d20..c0a0c40 100644
--- a/src/cmd/internal/obj/s390x/anames.go
+++ b/src/cmd/internal/obj/s390x/anames.go
@@ -181,6 +181,7 @@
"CMPUBNE",
"MVC",
"MVCIN",
+ "MVCLE",
"CLC",
"XC",
"OC",
diff --git a/src/cmd/internal/obj/s390x/asmz.go b/src/cmd/internal/obj/s390x/asmz.go
index 6511549..509a9f5 100644
--- a/src/cmd/internal/obj/s390x/asmz.go
+++ b/src/cmd/internal/obj/s390x/asmz.go
@@ -449,6 +449,10 @@
// VRR-f
{i: 122, as: AVLVGP, a1: C_REG, a2: C_REG, a6: C_VREG},
+
+ // MVC storage and storage
+ {i: 128, as: AMVCLE, a1: C_LOREG, a2: C_REG, a6: C_REG},
+ {i: 128, as: AMVCLE, a1: C_LAUTO, a2: C_REG, a6: C_REG},
}
var oprange [ALAST & obj.AMask][]Optab
@@ -4453,6 +4457,19 @@
}
}
zRRF(opcode, uint32(p.Reg), 0, uint32(p.From.Reg), uint32(p.To.Reg), asm)
+
+ case 128:
+ v := c.regoff(&p.From)
+ if p.To.Reg&1 != 0 {
+ c.ctxt.Diag("output argument must be even register in %v", p)
+ }
+ if p.Reg&1 != 0 {
+ c.ctxt.Diag("input argument must be an even register in %v", p)
+ }
+ if (p.From.Reg == p.To.Reg) || (p.From.Reg == p.Reg) {
+ c.ctxt.Diag("Padding Byte Register cannot be same as input or output register %v", p)
+ }
+ zRS(op_MVCLE, uint32(p.To.Reg), uint32(p.Reg), uint32(p.From.Reg), uint32(v), asm)
}
}
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
asm: Adding MVCLE Instruction into asm for s390xcan you add full package details in the commit message as "cmd/internal/obj/s390x"
https://go.dev/doc/contribute#commit_messages
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
asm: Adding MVCLE Instruction into asm for s390x(nit) s/Adding/add/
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
asm: Adding MVCLE Instruction into asm for s390xKiran M Vijay IBM(nit) s/Adding/add/
Done
can you add full package details in the commit message as "cmd/internal/obj/s390x"
https://go.dev/doc/contribute#commit_messages
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
For MVCLE, R1 and R3 are the destination and source registers repectively.. The part or full contents of source is moved to the destination register. They both needed to be checked for even-odd register pairs. Please check whether R3 should be considered as "p.Reg" or "p.From.Reg".
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
For MVCLE, R1 and R3 are the destination and source registers repectively.. The part or full contents of source is moved to the destination register. They both needed to be checked for even-odd register pairs. Please check whether R3 should be considered as "p.Reg" or "p.From.Reg".
In Prog structure, From and To are the 2 fields that can denote addresses.
// MVCLE R1,R3,D2(B2)
In the MVCLE, D2(B2) needs to be denoted as address, which is not possible if its mapped to Reg field of Prog (which is of type int16).
So, From is mapped to Reg field of Prog in this case, because it is used to denote only a register number.
And D2(B2) is mapped to From field of Prog so that it can point to a heap address.
Will add a comment in the code to denote the same, for clarity.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
| Code-Review | +2 |
| Commit-Queue | +1 |
// Instruction Format: MVCLE R1,R3,D2(B2)D2 here is just a constant offset?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
@k...@golang.org Can you please re-trigger the x_tools-gotip-linux-amd64 job?
I tested with latest s390x and there seems to be no issue.
$ ./gcimporter.test -test.v -test.run "TestStdlib"
=== RUN TestStdlib
=== RUN TestStdlib/gotypesalias=0
=== RUN TestStdlib/gotypesalias=1
--- PASS: TestStdlib (9.89s)
--- PASS: TestStdlib/gotypesalias=0 (5.03s)
--- PASS: TestStdlib/gotypesalias=1 (4.76s)
PASS
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// Instruction Format: MVCLE R1,R3,D2(B2)D2 here is just a constant offset?
Yes, it is a constant offset value.
Depending on how MVCLE performs in practice these days it could be interesting to experiment with emitting it (in a loop in case it is preempted) as an inline memmove when we know the source and destination operands don't overlap.
// Instruction Format: MVCLE R1,R3,D2(B2)Kiran M Vijay IBMD2 here is just a constant offset?
Yes, it is a constant offset value.
It's maybe worth noting that the D2(B2) operand is not actually an address, instead it is the value of the padding byte that is used when the destination operand is longer than the source operand. The padding byte value is calculated as (D2+B2)%256 where D2 is an immediate and B2 is a register. Allowing C_LAUTO for this operand seems incorrect since that class is an address on the stack.
In practice I suspect in Go MVCLE is only likely to be used with operands of the same length (basically implementing a memcpy) in which case the padding byte value will be ignored and should be set to zero.
Perhaps it would be simpler to only allow a constant value for the padding for now:
e.g. MVCLE $0xFF, R2, R6 // padding byte is 0xFF - i.e. 255(R0)
If needed/desired now or later a variant with a register operand for the padding could also be introduced:
e.g. MVCLE R8, R2, R6 // padding is R8%256 - i.e. 0(R8)
It is not particularly useful to be able to set both D2 and B2 to non-zero values at the same time so I don't think we lose anything by not supporting it.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Depending on how MVCLE performs in practice these days it could be interesting to experiment with emitting it (in a loop in case it is preempted) as an inline memmove when we know the source and destination operands don't overlap.
Sure, MVCLE is found to perform well for larger sizes, so the inlining shall be considered further. Thank you!
Kiran M Vijay IBMD2 here is just a constant offset?
Michael MundayYes, it is a constant offset value.
It's maybe worth noting that the D2(B2) operand is not actually an address, instead it is the value of the padding byte that is used when the destination operand is longer than the source operand. The padding byte value is calculated as (D2+B2)%256 where D2 is an immediate and B2 is a register. Allowing C_LAUTO for this operand seems incorrect since that class is an address on the stack.
In practice I suspect in Go MVCLE is only likely to be used with operands of the same length (basically implementing a memcpy) in which case the padding byte value will be ignored and should be set to zero.
Perhaps it would be simpler to only allow a constant value for the padding for now:
e.g. MVCLE $0xFF, R2, R6 // padding byte is 0xFF - i.e. 255(R0)If needed/desired now or later a variant with a register operand for the padding could also be introduced:
e.g. MVCLE R8, R2, R6 // padding is R8%256 - i.e. 0(R8)It is not particularly useful to be able to set both D2 and B2 to non-zero values at the same time so I don't think we lose anything by not supporting it.
Thanks for the input Michael.
You are right. I have removed the C_LAUTO class for the 2nd operand accordingly.
Also, for handling the constants and register values alone for the 2nd operand, the current change can do so.
Both the examples you have mentioned above work fine. I have added test data to indicate the same.
Please review and let me know if any other changes needed.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
{i: 127, as: AMVCLE, a1: C_LOREG, a2: C_REG, a6: C_REG},@kiran....@ibm.com as @mike...@gmail.com mentioned can you also add simpler case to allow a constant value for the padding as below
`{i: 127, as: AMVCLE, a1: C_SCON, a2: C_REG, a6: C_REG},`
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
MVCLE 4095, R4, R6 // a8640fff
MVCLE $4095, R4, R6 // a8640fffNote: The generated machine instruction is same for both the cases.
{i: 127, as: AMVCLE, a1: C_LOREG, a2: C_REG, a6: C_REG},@kiran....@ibm.com as @mike...@gmail.com mentioned can you also add simpler case to allow a constant value for the padding as below
`{i: 127, as: AMVCLE, a1: C_SCON, a2: C_REG, a6: C_REG},`
Thanks Srinivas for the input.
I have added the SCON class for the second operand. However, please note that the instruction generated for both the cases (when using displacement alone without reg and when using constant literal value) will be the same.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
Your changes look good to me.. Thanks
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +2 |
{i: 127, as: AMVCLE, a1: C_LOREG, a2: C_REG, a6: C_REG},My preference would be to drop the LOREG support and only allow SCON (and maybe REG for memset-like use cases) for the padding value. Using LOREG does keep the syntax closer to other assemblers though so I'm happy either way.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
we see this error with Try Bot (although the TryBots_pass rating is +1):
run.go:232: FAIL: testdata/script/script_test_basics.txt:6: cc -c testdata/mumble.c: exec: WaitDelay expired before I/O complete
FAIL
FAIL cmd/link 29.226s
But I have tried running the complete Test Suite locally and everything works fine.
...
? cmd/internal/telemetry/counter [no test files]
ok cmd/internal/test2json 0.400s
ok cmd/link 11.388s
...
##### ../test
ok cmd/internal/testdir 53.732s
ALL TESTS PASSED
---
Installed Go for linux/s390x in /home/kmvijay/workspace/goroot_master
Installed commands in /home/kmvijay/workspace/goroot_master/bin
*** You need to add /home/kmvijay/workspace/goroot_master/bin to your PATH.
Can we go ahead and merge this change?
Or should the TryBots job be re-triggered?
{i: 127, as: AMVCLE, a1: C_LOREG, a2: C_REG, a6: C_REG},My preference would be to drop the LOREG support and only allow SCON (and maybe REG for memset-like use cases) for the padding value. Using LOREG does keep the syntax closer to other assemblers though so I'm happy either way.
Yes, I agree, but since the POP mentions the MVCLE format as below:
`MVCLE R1,R3,D2(B2)`
we decided to go ahead with LOREG support.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +2 |
| Commit-Queue | +1 |
we see this error with Try Bot (although the TryBots_pass rating is +1):
run.go:232: FAIL: testdata/script/script_test_basics.txt:6: cc -c testdata/mumble.c: exec: WaitDelay expired before I/O complete
FAIL
FAIL cmd/link 29.226s
But I have tried running the complete Test Suite locally and everything works fine....
? cmd/internal/telemetry/counter [no test files]
ok cmd/internal/test2json 0.400s
ok cmd/link 11.388s
...
##### ../test
ok cmd/internal/testdir 53.732sALL TESTS PASSED
---
Installed Go for linux/s390x in /home/kmvijay/workspace/goroot_master
Installed commands in /home/kmvijay/workspace/goroot_master/bin
*** You need to add /home/kmvijay/workspace/goroot_master/bin to your PATH.
Can we go ahead and merge this change?
Or should the TryBots job be re-triggered?
Can we go ahead and merge this change?
We're in the freeze, so this won't be merged until the freeze lifts (sometime in July).
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
cmd/internal/obj/s390x: add MVCLE instruction
MVCLE (Move Long Extended) instruction is used to move large data storage-to-storage.
This change will add MVCLE into the Go asm for s390x architecture.
Upcoming PR of runtime/memmove_s390x.s will use this instruction for performance improvement.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |