Skia CPU : Setting Rotation makes Image rendering 40 times slow

Bharat Ahuja

unread,

Nov 15, 2018, 6:40:25 AM11/15/18

to skia-discuss

Hi,

We have a scenario where we need to render an image onto a raster SkSurface. The image is rendered via an SkShader, i.e. we create a shader from the SkImage and attach the shader to the SkPaint. When we are creating Skshader with a rotation matrix we can see that rendering takes almost 40 times more time than normal scenarios which seems little odd. I expect that with rotation pixel processing time would increase but this increase seems too much if i consider other graphics libraries. I tried to use windows' D2D software back end to understand additional time in rendering and it seems like D2D can do the rendering in almost 2 * (time taken with out rotation). Are we doing something wrong here? If not, can someone please explain why skia software performs badly in comparison to others? Is there any pipe line change happening underneath?

FYI we are using Skia m64 build.

int width = 500;

int height = 500;

SkImageInfo imageInfo = SkImageInfo::Make(width, height, kRGBA_8888_SkColorType, kPremul_SkAlphaType);

size_t stride = width * 4;

SkBitmap bitmap;

(void)bitmap.tryAllocPixels(imageInfo,stride);

uint32_t* pixels = reinterpret_cast< uint32_t* >( bitmap.getPixels() );

for( int i = 0; i < width; i++ )

for( int j = 0; j < height; j++ )

{

*pixels = i % 8 ? ( (255 << 24) + 255 ) : ( (255 << 24) + (255 << 8) + 255 );

pixels++;

}

sk_sp<SkImage> image = SkImage::MakeFromBitmap( bitmap );

SkPaint paint;

SkMatrix matrix;

matrix.setRotate(45);

paint.setFilterQuality( kLow_SkFilterQuality );

paint.setShader(image->makeShader(SkShader::kClamp_TileMode,SkShader::kClamp_TileMode,

&matrix));

SkRect rect = { (float)0, (float)0, (float)width , (float)height };

auto start = std::chrono::high_resolution_clock::now();

canvas->drawRect(rect, paint);

auto stop = std::chrono::high_resolution_clock::now();

long long duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start).count();

//LOGE( "total time microseconds = %llu", duration );

with rotation matrix // matrix.setRotate(45);

total time microseconds = 272965

total time microseconds = 275342

total time microseconds = 276618

total time microseconds = 274913

total time microseconds = 276534

with identity matrix // matrix.setIdentity( );

total time microseconds = 7996

total time microseconds = 3954

total time microseconds = 7328

total time microseconds = 4252

total time microseconds = 4350

with scaling matrix // matrix.setScale( 14.2, 1.8 );

total time microseconds = 15604

total time microseconds = 25151

total time microseconds = 15506

total time microseconds = 23355

total time microseconds = 18398

Mike Klein

unread,

Nov 15, 2018, 8:58:58 AM11/15/18

to skia-d...@googlegroups.com

Are you building with the MSVC compiler (cl.exe)? Some parts of Skia's software backend are optimized only when compiled by Clang (clang-cl.exe).

--
You received this message because you are subscribed to the Google Groups "skia-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.
To post to this group, send email to skia-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/skia-discuss.
For more options, visit https://groups.google.com/d/optout.

Bharat Ahuja

unread,

Nov 15, 2018, 9:13:14 AM11/15/18

to skia-discuss

Actually I *am* using clang.

I am building on a windows machine and using clang++ from androidndk.16.1.1

androidndk.16.1.1\toolchains\llvm\prebuilt\windows-x86_64\bin\clang++.exe

Andrew Ellerton

unread,

Nov 15, 2018, 9:20:05 AM11/15/18

to skia-d...@googlegroups.com

Hi Mike,

Although tangential to the OP's question, this has been on my mind for a while and now is a good time to ask: Is clang on windows a widely supported platform for skia? I ask because I tried (admittedly briefly) and then went back to vanilla msvc, in part because I found some old posts saying that clang wasn't widely supported.

Interested to get your view...

Regards

Andrew

Mike Klein

unread,

Nov 15, 2018, 9:21:46 AM11/15/18

to skia-d...@googlegroups.com

Yes, we're happy to support Clang across the board. Chromium's using Clang exclusively now, even on Windows.

Andrew Ellerton

unread,

Nov 15, 2018, 9:23:15 AM11/15/18

to skia-d...@googlegroups.com

Wow that's great!

Are there any particular setup instructions you recommend?

Mike Klein

unread,

Nov 15, 2018, 9:24:07 AM11/15/18

to skia-d...@googlegroups.com

Sorry, I saw the reference to Windows and D2D and assumed you were building for Windows. You're targeting Android? I'm not familiar with "androidndk.16.1.1"... is that somehow related to the r16 release of the Android NDK (https://developer.android.com/ndk/downloads/older_releases)?

Corey Lucier

unread,

Nov 15, 2018, 9:27:43 AM11/15/18

to skia-discuss

That particular build is very very old though, can you try with latest M70 or above if feasible?

Also are these profile times with a release build or debug? Debug Skia CPU can be very very slow in some code paths.

Bharat Ahuja

unread,

Nov 15, 2018, 9:29:14 AM11/15/18

to skia-discuss

yes sorry for the confusion yes i am using ndk r16 release. Actually i am building skia for android but why i mentioned windows and D2D is I wanted to understand whether other rendering engines also take such larger time on applying rotation matrix so i tried windows software D2D rendering engine on mobile platform with similar hardware capabilities as my android device.

Bharat Ahuja

unread,

Nov 15, 2018, 9:32:02 AM11/15/18

to skia-discuss

I 'll see if i can try the newer build. These are release build run time.

Corey Lucier

unread,

Nov 15, 2018, 9:33:25 AM11/15/18

to skia-discuss

Andrew,

In fact if you aren't using Clang on windows you are in for a much slower ride. The Skia team recently made some performance tradeoffs which essentially prefer Clang, resulting in builds with MSVC becoming much slower.

Just FYI.

Andrew Ellerton

unread,

Nov 15, 2018, 9:47:33 AM11/15/18

to skia-d...@googlegroups.com

Thanks Corey. My current work is windows-focussed, and I definitely had a few raised eyebrow moments at the msvc compiled binary performance, and not in a good way ;)

I'll definitely give clang a go. Thanks for the heads up, to you and Mike.

Mike Klein

unread,

Nov 15, 2018, 9:49:00 AM11/15/18

to skia-d...@googlegroups.com

Yep. We definitely have intentionally made MSVC-based builds slower so that we can make Clang-based builds faster. Or really, I should say, more capable and maintainable... most of the changes we've made restructure the software drawing backend so that it's not too much of a maintenance burden and help make it easy to keep correctness and feature parity with the other backends.

I should probably add that about 99% of our attention to Android and iOS is focused on performance when using GPU-accelerated drawing. All those devices have GPUs, and if you want good performance you'll really want to use them. Skia's GPU backend has been used by Chrome on Android for a long time, and is the backend for Flutter on Android and iOS, and lately draws the Android system UI itself.

Mike Reed

unread,

Nov 15, 2018, 11:39:25 AM11/15/18

to skia-d...@googlegroups.com

Some slow down is expected between identity bitmap draws and yours

1. rotated bitmaps means sampling memory(somewhat) out of order, so we get more cache misses

2. you've set kLow filter quality, which asks us to perform a blierp filter per pixel (which means read 4 pixels and compute a weighted average).

- this setting is cheaper/ignored if the matrix is identity.

Bharat Ahuja

unread,

Nov 15, 2018, 11:59:14 AM11/15/18

to skia-discuss

yes i expect a slowdown and do understand theses details but 40 times seems too much

Mike Klein

unread,

Nov 15, 2018, 12:32:59 PM11/15/18

to skia-d...@googlegroups.com

Want to post up your libskia.a / libskia.so somewhere and I can take a look at it to see what's up? Also, can you post up as much detail as possible about how you're building Skia?

Bharat Ahuja

unread,

Nov 16, 2018, 6:38:30 AM11/16/18

to skia-discuss

seems like rotation scenario goes through RasterPipeline instead of Legacy Pipeline.

bool SkImageShader::IsRasterPipelineOnly(const SkMatrix& ctm, SkColorType ct, SkAlphaType at,

SkShader::TileMode tx, SkShader::TileMode ty,

const SkMatrix& localM) {

if (ct != kN32_SkColorType) {

return true;

}

if (at == kUnpremul_SkAlphaType) {

return true;

}

if (tx != ty) {

return true;

}

if (!ctm.isScaleTranslate()) {

return true;

}

if (!localM.isScaleTranslate()) {

return true;

}

return false;

}

location for stripped and unstripped so

https://drive.google.com/file/d/1329Hnd45AmF8dkilaU3bLIdDBTWgbEEx/view?usp=sharing

https://drive.google.com/file/d/1AEjdkQZ3tVFVLaGAwMpfAehUr2Df3P1C/view?usp=sharing

we are building skia using clang ndk r16 release with below flag settings

LOCAL_ARM_NEON = 1
LOCAL_CFLAGS = $(LOCAL_CFLAGS) -mfloat-abi=softfp -mfpu=neon
LOCAL_ASFLAGS = $(LOCAL_ASFLAGS) -mfpu=neon

CFLAGS output in console

-target armv7-none-linux-androideabi -fdiagnostics-format=msvc -Werror=return-type -Werror=unused-variable -Werror=address-of-packed-member -gcc-toolchain e:/androidndk.16.1.1/toolchains/arm-linux-androideabi-4.9/prebuilt/windows-x86_64 -D__stdcall= -D_stdcall= -D__cdecl= -D_cdecl= -D__thiscall= -D_thiscall= -D__vectorcall= -D_vectorcall= -D_fastcall=__fastcall -fno-elide-type -fdiagnostics-show-template-tree -gdwarf-2 -fno-standalone-debug -DNDEBUG -Os -fomit-frame-pointer -Wno-c++1z-compat -Wno-deprecated-declarations -Wno-deprecated -Wno-expansion-to-defined -Wno-ignored-pragma-intrinsic -Wno-multichar -Wno-nonportable-include-path -Wno-undefined-var-template -Wno-write-strings -mthumb -march=armv7-a -mfloat-abi=softfp -D__ARM_ARCH_7__ -D__ARM_ARCH_7A__ -mfpu=vfp -fmessage-length=0 -fasm-blocks -fvisibility-inlines-hidden -DUNICODE -D_UNICODE -DANDROID -D__ANDROID_API__=21 -D_ANDROID -DUNDER_ANDROID=1 -DUNDER_ANDROID2=1 -DANDROID_TEMP -DUNDER_NONWIN=1 -DCOMPILER_GCCE -DSAFECRT_VISIBLE -DNSdk= -DANDROID_NO_WARNING -D_LCONV_DEFINED -DNO_ENUM_FORWARD_DECLARATION -DWIN32_LEAN_AND_MEAN -D_WIN7_PLATFORM_UPDATE=1 -D_WCHAR_IS_ALWAYS_SIGNED -DCRYPT_OID_INFO_HAS_EXTRA_FIELDS -DWINVER=0x0A00 -D_WIN32_WINNT=0x0A00 -D_WIN32_IE=0x0800 -DDEVL=1 -DWIN32_LEAN_AND_MEAN=1 -DSTD_CALL -DSHIP -DNDEBUG -DDBG=0 -DFPO=1 -DSK_BUILD_FOR_APP -DSK_SUPPORT_LEGACY_TILED_BITMAPS -DSK_ARM_HAS_NEON -DNOMINMAX -DWINVER=0x0A00 -DISOLATION_AWARE_ENABLED=0 -D_XMMINTRIN_H_INCLUDED -DARM -D_ARM_ -D_M_ARM -D_MoveToCoprocessor=_MoveToCoprocessorStub -D__emit=__emitStub -D__readfsdword=__readfsdwordStb -mstackrealign -mstack-alignment=16 -fshort-wchar -fsigned-char -ferror-limit=20 -fms-extensions -fmsc-version=1700 -D__PRETTY_FUNCTION__=__FUNCTION__ -Wno-microsoft -D MS_CLANG_MS_EXTENSIONS=1 -Wno-delete-non-virtual-dtor -fshort-wchar -D HAS_PRAGMA_MS_STRUCT=1 -frandom-seed=0 -fstack-protector -Winvalid-pch -D _FORTIFY_SOURCE=2 -fno-strict-overflow -Wno-attributes -Wno-comment -Wno-extra-tokens -Wno-constant-logical-operand -Wno-extern-initializer -Wno-ignored-attributes -Wno-invalid-token-paste -Wno-microsoft -Wno-missing-declarations -Wno-null-dereference -Wno-parentheses -Wno-unused-value -Wno-invalid-offsetof -ffunction-sections -fdata-sections -mfloat-abi=softfp -mfpu=neon

Mike Klein

unread,

Nov 16, 2018, 9:54:29 AM11/16/18

to skia-d...@googlegroups.com

That's funny, I'm able to disassemble your .so's, but they don't make a lot of sense inside: I'm seeing different instructions than I'd expect in known functions, and lots and lots of undecodeable instructions. It's like objdump is not getting the format right.

Anyway, from what I can glean without looking at the disassembly per se, it would be worth using a newer version of Skia (say, m71). That code in m64 is pretty old, and since then we've made some code generation improvements, and implemented the somewhat faster low-precision backend on more platforms like ARMv7.

What build system are you using to build Skia? Is it something other than skia.org/user/build? Some sort of Android.mk-based build system? It looks like you have a funny mashup of Windows, x86, Android, and ARM build flags set, and I'm seeing a bunch of flags (like -Wno-c++1z-compat) that I'm pretty sure we've never had in our build configuration. Between the two settings of -mfpu=vfp and -mfpu=neon, I'm not sure which wins. I'm also not sure what the effect of -mfloat-abi=softfp will be here... I think nothing, but it's worth fiddling around with.

Bharat Ahuja

unread,

Nov 16, 2018, 11:35:41 AM11/16/18

to skia-discuss

yes, I know this is messy we have our own Android.mk-based build system and don't use skia build system. From my experience mfpu=neon wins.

anyways I'll try on the newer version and check if anything changes.

Corey Lucier

unread,

Dec 3, 2018, 8:18:03 AM12/3/18

to skia-discuss

Any update Bharat?

Abhishek Gupta

unread,

Dec 3, 2018, 8:39:16 AM12/3/18

to skia-discuss

Hi Corey,

We have currently paused that task (which resulted in this scenario) as upgrading to a newer version of Skia would take some time in our current setting.

We will post our results here once we have upgraded to a newer version.

Thanks,

Abhishek

On Monday, December 3, 2018 at 6:48:03 PM UTC+5:30, Corey Lucier wrote:

Any update Bharat?

Bharat Ahuja

unread,

Jul 31, 2019, 8:46:12 AM7/31/19

to skia-discuss

HI, Corey, we updated to M74 it seems like skia has improved but still not comparable to D2D or GDI Plus. I have seen in other threads people reporting the same behaviour( skia is lagging from other rendering libraries ).

Skia probably has the best performance I've seen for rendering, but one area that it seems to struggle is when drawing/processing images. I would really like if we can fix it. If you can help me figure out the bottleneck code, maybe I can help in fixing the issue?

https://github.com/mono/SkiaSharp/issues/758

Mike Klein

unread,

Jul 31, 2019, 10:29:34 AM7/31/19

to skia-d...@googlegroups.com

Want to post up your new binaries like you did last time?

--

You received this message because you are subscribed to the Google Groups "skia-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/3d95d14b-f7b9-48da-a5d5-c8ef0ae63293%40googlegroups.com.

Bharat Ahuja

unread,

Jul 31, 2019, 10:59:02 AM7/31/19

to skia-discuss

Some performance numbers I measured

{"Skia Running": false, "Bitmap Tranformed": false, "time in microseconds": 1303}

{"Skia Running": false, "Bitmap Tranformed": true(45), "time in microseconds": 2462}

{"Skia Running": true, "Bitmap Tranformed": false, "time in microseconds": 912}

{"Skia Running": true, "Bitmap Tranformed": true(45), "time in microseconds": 7147}

Bharat Ahuja

unread,

Jul 31, 2019, 11:33:26 AM7/31/19

to skia-discuss

@Mike I have copied the binaries at below location. Build flags are same as earlier.

https://drive.google.com/file/d/1zcXDJnIdFPsGJ7Wsgr7oUk2avjKFEtzY/view?usp=sharing

https://drive.google.com/file/d/1pnhb_HUEA216H1k2iHJg83VG23b6Mxka/view?usp=sharing

Mike Klein

unread,

Jul 31, 2019, 12:04:17 PM7/31/19

to skia-d...@googlegroups.com

Hmm, that looks a lot more sensible, and probably everything is now working as intended, though I'm still a little surprised by the quality of the code generation. E.g. the function neon::lowp::clear(),

0006797c <neon::lowp::clear(neon::lowp::Params*, void**, unsigned short __vector(8), unsigned short __vector(8), unsigned short __vector(8), unsigned short __vector(8))>:
6797c: b5d0 push {r4, r6, r7, lr}
6797e: af02 add r7, sp, #8
67980: 466c mov r4, sp
67982: f36f 0401 bfc r4, #0, #2
67986: 46a5 mov sp, r4
67988: f1a7 0408 sub.w r4, r7, #8
6798c: f851 2b04 ldr.w r2, [r1], #4
67990: ef80 0050 vmov.i32 q0, #0 ; 0x00000000
67994: ef80 2050 vmov.i32 q1, #0 ; 0x00000000
67998: ef80 4050 vmov.i32 q2, #0 ; 0x00000000
6799c: ef80 6050 vmov.i32 q3, #0 ; 0x00000000
679a0: 46a5 mov sp, r4
679a2: e8bd 40d0 ldmia.w sp!, {r4, r6, r7, lr}
679a6: 4710 bx r2

That's fine, but I'd have expected something a little more streamlined,

00000000 <neon::lowp::clear(neon::lowp::Params*, void**, unsigned short __vector(8), unsigned short __vector(8), unsigned short __vector(8), unsigned short __vector(8))>:
0: f851 2b04 ldr.w r2, [r1], #4
4: ef80 0050 vmov.i32 q0, #0 ; 0x00000000
8: ef80 2050 vmov.i32 q1, #0 ; 0x00000000
c: ef80 4050 vmov.i32 q2, #0 ; 0x00000000
10: ef80 6050 vmov.i32 q3, #0 ; 0x00000000
14: 4710 bx r2

Or for a function that's more complicated and more relevant to bicubic sampling,

00063dd8 <neon::bicubic_n3x(neon::Params*, void**, float __vector(4), float __vector(4), float __vector(4), float __vector(4))>:
63dd8: b5d0 push {r4, r6, r7, lr}
63dda: af02 add r7, sp, #8
63ddc: b084 sub sp, #16
63dde: 466c mov r4, sp
63de0: f36f 0403 bfc r4, #0, #4
63de4: 46a5 mov sp, r4
63de6: 680a ldr r2, [r1, #0]
63de8: efc7 0f50 vmov.f32 q8, #1 ; 0x3f800000
63dec: f102 0380 add.w r3, r2, #128 ; 0x80
63df0: f963 2a0f vld1.8 {d18-d19}, [r3]
63df4: a312 add r3, pc, #72 ; (adr r3, 63e40 <neon::bicubic_n3x(neon::Params*, void**, float __vector(4), float __vector(4), float __vector(4), float __vector(4))+0x68>)
63df6: ef60 0de2 vsub.f32 q8, q8, q9
63dfa: f963 2acf vld1.64 {d18-d19}, [r3]
63dfe: a314 add r3, pc, #80 ; (adr r3, 63e50 <neon::bicubic_n3x(neon::Params*, void**, float __vector(4), float __vector(4), float __vector(4), float __vector(4))+0x78>)
63e00: f963 4acf vld1.64 {d20-d21}, [r3]
63e04: 466b mov r3, sp
63e06: ff40 2df2 vmul.f32 q9, q8, q9
63e0a: ff40 0df0 vmul.f32 q8, q8, q8
63e0e: ef42 2de4 vadd.f32 q9, q9, q10
63e12: f962 4a0f vld1.8 {d20-d21}, [r2]
63e16: f502 7280 add.w r2, r2, #256 ; 0x100
63e1a: ff40 0df2 vmul.f32 q8, q8, q9
63e1e: ffc7 2f58 vmov.f32 q9, #-1.5 ; 0xbfc00000
63e22: ef04 0de2 vadd.f32 q0, q10, q9
63e26: f943 0aef vst1.64 {d16-d17}, [r3 :128]
63e2a: f942 0a0f vst1.8 {d16-d17}, [r2]
63e2e: 684a ldr r2, [r1, #4]
63e30: 3108 adds r1, #8
63e32: 4790 blx r2
63e34: f1a7 0408 sub.w r4, r7, #8
63e38: 46a5 mov sp, r4
63e3a: bdd0 pop {r4, r6, r7, pc}

But I'd have expected something a bit more like this,

00000000 <neon::bicubic_n3x(neon::Params*, void**, float __vector(4), float __vector(4), float __vector(4), float __vector(4))>:
0: 680a ldr r2, [r1, #0]
2: efc7 0f50 vmov.f32 q8, #1 ; 0x3f800000
6: f102 0380 add.w r3, r2, #128 ; 0x80
a: f963 2a8f vld1.32 {d18-d19}, [r3]
e: a310 add r3, pc, #64 ; (adr r3, 50 <neon::bicubic_n3x(neon::Params*, void**, float __vector(4), float __vector(4), float __vector(4), float __vector(4))+0x50>)
10: ef60 0de2 vsub.f32 q8, q8, q9
14: f963 2aef vld1.64 {d18-d19}, [r3 :128]
18: a311 add r3, pc, #68 ; (adr r3, 60 <neon::bicubic_n3x(neon::Params*, void**, float __vector(4), float __vector(4), float __vector(4), float __vector(4))+0x60>)
1a: f963 4aef vld1.64 {d20-d21}, [r3 :128]
1e: f44f 7380 mov.w r3, #256 ; 0x100
22: ef40 4df2 vmla.f32 q10, q8, q9
26: ffc7 2f58 vmov.f32 q9, #-1.5 ; 0xbfc00000
2a: ff40 0df0 vmul.f32 q8, q8, q8
2e: ff40 0df4 vmul.f32 q8, q8, q10
32: f962 4a83 vld1.32 {d20-d21}, [r2], r3
36: ef04 0de2 vadd.f32 q0, q10, q9
3a: f942 0a8f vst1.32 {d16-d17}, [r2]
3e: f101 0208 add.w r2, r1, #8
42: 684b ldr r3, [r1, #4]
44: 4611 mov r1, r2
46: 4718 bx r3

I know that's a lot to digest, and I've not really worked through it myself, but it looks like mainly you've got,

1) a bunch of extra bookkeeping that's sometimes breaking the ability of these code snippets to tail-call into each other;

2) an extra memory write in bicubic_n3x that I can't really account for.

--

You received this message because you are subscribed to the Google Groups "skia-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/48939203-4d18-438d-9faf-afca7d770da1%40googlegroups.com.

Bharat Ahuja

unread,

Jul 31, 2019, 1:02:31 PM7/31/19

to skia-discuss

This might be related to some compiler flag? Do you see any setting we are missing? BTW since I have used kLow_SkFilterQuality so won't it go into bilerp func?

Mike Klein

unread,

Jul 31, 2019, 1:05:15 PM7/31/19

to skia-d...@googlegroups.com

Oh, yes, kLow_SkFilterQuality will bilerp.

--
You received this message because you are subscribed to the Google Groups "skia-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/aeefe8a7-482d-4484-a0ec-c3fd93e292db%40googlegroups.com.

Mike Reed

unread,

Aug 1, 2019, 9:09:58 AM8/1/19

to skia-d...@googlegroups.com

We just landed https://skia-review.googlesource.com/c/skia/+/231109

Curious to see if it impacts your performance.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/CAD%3DHvMsj0nRT4FVzbOC%3D92p7ga6Au2YOeu1kZ0%3DxBR4cWYKpbw%40mail.gmail.com.

Bharat Ahuja

unread,

Aug 1, 2019, 10:45:51 AM8/1/19

to skia-discuss

Thanks mikel I'll try that and will update ...

To unsubscribe from this group and stop receiving emails from it, send an email to skia-d...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/aeefe8a7-482d-4484-a0ec-c3fd93e292db%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "skia-discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-d...@googlegroups.com.

Abhishek Gupta

unread,

Aug 5, 2019, 11:09:35 AM8/5/19

to skia-discuss

Hi Mike,

That change had no perf impact for our scenario ( which is expected as well since we are using low filter quality and the change was for bicubic ).

Regards,

Abhishek

Mike Reed

unread,

Aug 5, 2019, 11:23:56 AM8/5/19

to skia-d...@googlegroups.com

Hmmm, ok. I thought the issue was the bicubic was too slow.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/72957f54-b418-41f2-a5de-6e46e2d34219%40googlegroups.com.

Bharat Ahuja

unread,

Aug 8, 2019, 11:56:29 AM8/8/19

to skia-discuss

Hey Mike,

Please see the variance in numbers( not fiddle numbers, on my device ) when I am applying the transforms. it is getting almost around 8 times, whereas for other libs(GDI) variance is 2 -3 times.

{"Skia Running": false, "Bitmap Tranformed": false, "time in microseconds": 1303}

{"Skia Running": false, "Bitmap Tranformed": true(45), "time in microseconds": 2462}

{"Skia Running": true, "Bitmap Tranformed": false, "time in microseconds": 912}

{"Skia Running": true, "Bitmap Tranformed": true(45), "time in microseconds": 7147}

Below is the simple skia fiddle repro that matches with our scenario.

int width = 500;

int height = 500;

SkImageInfo imageInfo = SkImageInfo::Make(width, height, kRGBA_8888_SkColorType, kPremul_SkAlphaType);

size_t rowBytes = imageInfo .minRowBytes();

size_t size = rowBytes * height;

std::vector<char> pixelMemory(size); // allocate memory

sk_sp<SkSurface> surface =

SkSurface::MakeRasterDirect(

imageInfo , &pixelMemory[0], rowBytes);

SkCanvas* pCanvas = surface->getCanvas();

size_t stride = width * 4;

SkBitmap bitmap;

(void)bitmap.tryAllocPixels(imageInfo,stride);

uint32_t* pixels = reinterpret_cast< uint32_t* >( bitmap.getPixels() );

for( int i = 0; i < width; i++ )

for( int j = 0; j < height; j++ )

{

*pixels = i % 8 ? ( (255 << 24) + 255 ) : ( (255 << 24) + (255 << 8) + 255 );

pixels++;

}

sk_sp<SkImage> image = SkImage::MakeFromBitmap( bitmap );

SkPaint paint;

SkMatrix matrix;

matrix.setRotate(45);

paint.setFilterQuality( kLow_SkFilterQuality );

paint.setShader(image->makeShader(SkTileMode::kClamp,SkTileMode::kClamp,

&matrix));

SkRect rect = { (float)0, (float)0, (float)width , (float)height };

auto start = std::chrono::high_resolution_clock::now();

pCanvas->drawRect(rect, paint);

auto stop = std::chrono::high_resolution_clock::now();

long long duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start).count();

//LOGE( "total time microseconds = %llu", duration );

On Monday, August 5, 2019 at 8:53:56 PM UTC+5:30, Mike Reed wrote:

Hmmm, ok. I thought the issue was the bicubic was too slow.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/72957f54-b418-41f2-a5de-6e46e2d34219%40googlegroups.com.

Bharat Ahuja

unread,

Aug 13, 2019, 10:03:17 AM8/13/19

to skia-discuss

@Mike Klien As you said our binary doesn't have optimal code and there are some additional instructions(extra book keeping?)as well. Is it becuase we are missing some flag skia depends on? Can you please suggest steps( compiler flags etc ) I can try?

Mike Klein

unread,

Aug 13, 2019, 10:32:20 AM8/13/19

to skia-discuss

Hmm. I'm not really sure. Can you send the full command line that's used to compile src/core/SkOpts.cpp?

Also, just about everything about how you're drawing can affect performance... do you happen to know the SkColorType, SkAlphaType, and SkColorSpace of the surface you're drawing into? Are you dithering? Anything unusual like that? The more accurately you can reproduce what you're doing on fiddle.skia.org, the more we can help.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/9032075a-7a7d-47ab-a4a1-cbb385ce1140%40googlegroups.com.

Bharat Ahuja

unread,

Aug 13, 2019, 11:22:13 AM8/13/19

to skia-discuss

Below is the command we run

androidndk.16.1.12\toolchains\llvm\prebuilt\windows-x86_64\bin\clang++.exe @src\core\android_clang_cpp_flags.rsp -c src\core\SkOpts.cpp -o src\core\obj\androidarm\clang-temp\SkOpts.cpp\compile.tmp

where android_clang_cpp_flags.rsp contains below compile time flags

-target armv7-none-linux-androideabi -fdiagnostics-format=msvc -Werror=return-type -Werror=unused-variable -Werror=address-of-packed-member -gcc-toolchain e:/nugetcache/androidndk.16.1.12/toolchains/arm-linux-androideabi-4.9/prebuilt/windows-x86_64 -D__stdcall= -D_stdcall= -D__cdecl= -D_cdecl= -D__thiscall= -D_thiscall= -D__vectorcall= -D_vectorcall= -D_fastcall=__fastcall -fno-elide-type -fdiagnostics-show-template-tree -gdwarf-2 -fno-standalone-debug -DNDEBUG -Os -fomit-frame-pointer -Wno-c++1z-compat -Wno-deprecated-declarations -Wno-deprecated -Wno-expansion-to-defined -Wno-ignored-pragma-intrinsic -Wno-multichar -Wno-nonportable-include-path -Wno-undefined-var-template -Wno-write-strings -mthumb -march=armv7-a -D__ARM_ARCH_7__ -D__ARM_ARCH_7A__ -mfpu=neon -fmessage-length=0 -fasm-blocks -fvisibility-inlines-hidden -DUNICODE -D_UNICODE -D__ANDROID_API__=21 -DCOMPILER_GCCE -DSAFECRT_VISIBLE -DNSdk= -DANDROID_NO_WARNING -D_LCONV_DEFINED -DNO_ENUM_FORWARD_DECLARATION -D_WCHAR_IS_ALWAYS_SIGNED -DCRYPT_OID_INFO_HAS_EXTRA_FIELDS -DSTD_CALL -DSHIP -DNDEBUG -DDBG=0 -DFPO=1 -DSK_BUILD_FOR_OFFICE -DSK_SUPPORT_LEGACY_TILED_BITMAPS -DSK_ARM_HAS_NEON -DNOMINMAX -DISOLATION_AWARE_ENABLED=0 -D_XMMINTRIN_H_INCLUDED -DARM -D_ARM_ -D_M_ARM -D__readfsdword=__readfsdwordStb -mstackrealign -mstack-alignment=16 -fshort-wchar -fsigned-char -ferror-limit=20 -fms-extensions -fmsc-version=1700 -D__PRETTY_FUNCTION__=__FUNCTION__ -Wno-microsoft -D MS_CLANG_MS_EXTENSIONS=1 -Wno-delete-non-virtual-dtor -fshort-wchar -D HAS_PRAGMA_MS_STRUCT=1 -frandom-seed=0 -fstack-protector -D _FORTIFY_SOURCE=2 -fno-strict-overflow -Wno-attributes -Wno-comment -Wno-extra-tokens -Wno-constant-logical-operand -Wno-deprecated-dynamic-exception-spec -Wno-extern-initializer -Wno-ignored-attributes -Wno-invalid-token-paste -Wno-microsoft -Wno-missing-declarations -Wno-null-dereference -Wno-parentheses -Wno-unused-value -Wno-invalid-offsetof -ffunction-sections -fdata-sections -fexceptions -frtti -std=c++1z -DBOOST_HASH_CXX17=0 -DGSL_USE_STD_BYTE=0 -fno-aligned-allocation -DBOOST_ASIO_DISABLE_STD_STRING_VIEW -DBOOST_ASIO_HAS_STD_EXPERIMENTAL_STRING_VIEW -DBOOST_THREAD_CHRONO_POSIX_API.

SkColorType = kRGBA_8888_SkColorType, SkAlphaType = kPremul_SkAlphaType, and SkColorSpace = nullptr

Below is the sample fiddle code. Please note that we have seen significant improvement since we have updated to M74, though it is still not comparable to other libraries we are using. Variance in numbers( not fiddle numbers, on my device ) when I am applying the transforms is getting almost around 8 times whereas for other libs variance is 2 -3 times. So, we expect there is still room for the improvement.

{"Skia Running": false, "Bitmap Tranformed": false, "time in microseconds": 1303}

{"Skia Running": false, "Bitmap Tranformed": true(45), "time in microseconds": 2462}

{"Skia Running": true, "Bitmap Tranformed": false, "time in microseconds": 912}

{"Skia Running": true, "Bitmap Tranformed": true(45), "time in microseconds": 7147}

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/9032075a-7a7d-47ab-a4a1-cbb385ce1140%40googlegroups.com.

Mike Klein

unread,

Aug 13, 2019, 11:33:56 AM8/13/19

to skia-discuss

Oh right! You've got that crazy build with a mix of Clang and MSVC and Android and Windows settings. I think I'm going to have to leave you to fend for yourself to work through that. But it looks like you've got all the important stuff in there, like -march=armv7-a and -mfpu=neon. AndwWith those surface settings, I think how SkOpts.cpp is mostly moot. Let me take a quick scan through to make sure the code paths you're hitting are the one's I'd expect.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/46b328df-7c09-4f9d-823b-feaffaf5d593%40googlegroups.com.

Bharat Ahuja

unread,

Aug 13, 2019, 11:37:44 AM8/13/19

to skia-discuss

yeah, our build flags are weird. Isn't the surface and skimage setting I am using, is the optimal one? last time I had a discussion around this and I got the feedback premultiplied surfaces are supposed to be faster?

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/46b328df-7c09-4f9d-823b-feaffaf5d593%40googlegroups.com.

Mike Klein

unread,

Aug 13, 2019, 11:39:08 AM8/13/19

to skia-discuss

Yep, looks good. Might want to throw a static_assert(kN32_SkColorType == kRGBA_8888_SkColorType, "") in there somewhere just to double check. There's typically one byte order of 8888 that's fastest and the other is slower, but I think you've got the right one for your build.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/3b60ed35-d1ab-47dc-9657-14ee90a88a60%40googlegroups.com.

Mike Klein

unread,

Aug 13, 2019, 11:57:28 AM8/13/19

to skia-discuss

Ooh, how interesting. I thought we had a faster code path than we'd be hitting here. Turns out we're dumping this case into the slower general path (handled by that code in SkOpts.cpp) which explains the big step up in runtime. This check is failing before we can get into the fast stuff:

static bool legacy_shader_can_handle(const SkMatrix& inv) {
if (!inv.isScaleTranslate()) {
return false;
}

...

Mike Klein

unread,

Aug 13, 2019, 12:01:44 PM8/13/19

to skia-discuss

Ah, okay, looks like we deleted those fast paths here (https://skia-review.googlesource.com/c/skia/+/36800/) and here (https://skia-review.googlesource.com/c/skia/+/171581) after disabling them here (https://skia-review.googlesource.com/c/skia/+/33180).

Mike Klein

unread,

Aug 13, 2019, 12:11:55 PM8/13/19

to skia-discuss

Without upstream changes, I think there's only one tweak to make here, which is to make sure to tag your source image as kOpaque_SkAlphaType if indeed it is opaque. That'll cut a couple stages off the color pipeline we use here:

seed_shader
matrix_2x3
bilerp_clamp_8888
clamp_gamut
load_8888_dst
srcover
store_8888

~~~>

seed_shader
matrix_2x3
bilerp_clamp_8888
clamp_gamut
store_8888

On my laptop that's good for a 20% speedup, which isn't nothing. I think anything else (restoring old fast paths, making this sort of general path faster, cutting possibly redundant work like clamp_gamut) will need some work on our end.

Bharat Ahuja

unread,

Aug 13, 2019, 12:51:55 PM8/13/19

to skia-discuss

Yeah, Thanks mike sorry it got lost due to the long thread but I had mentioned somewhere above that code goes through RasterPipeline instead of Legacy Pipeline. Is new code not neon optimized?

Thanks for the suggestion of kOpaque_SkAlphaType but our Bitmaps mostly has an alpha channel so I don't think we can use kOpaque_SkAlphaType.

I am not much of skia expert but If someone can guide me I can contribute and integrate those faster paths in general path code.

Thanks

Bharat

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/3b60ed35-d1ab-47dc-9657-14ee90a88a60%40googlegroups.com.

Mike Klein

unread,

Aug 13, 2019, 12:56:50 PM8/13/19

to skia-discuss

Yeah, it uses NEON. You're just hitting the more flexible, slower, general path. Without rotation you're hitting common special case that's got its own optimized routines.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/11f1a2c2-df75-40a9-955a-cda69fe9047b%40googlegroups.com.

Bharat Ahuja

unread,

Aug 14, 2019, 12:04:29 PM8/14/19

to skia-discuss

Hey Mike,

I somehow hacked the faster path for rotated bitmaps and it seemed to work and step up in time seems to be gone. May I know why did we remove this path ? is it because of some bugs? normal rotation seems to be working fine though.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/11f1a2c2-df75-40a9-955a-cda69fe9047b%40googlegroups.com.

Mike Klein

unread,

Aug 14, 2019, 12:09:09 PM8/14/19

to skia-discuss

Oh that's great. I'm not really sure I remember why we removed that code path. Probably just an experiment to see if anyone cared? It's a pain in the butt to maintain that sort of code. We can't really care about all code paths being optimally fast... it's primarily driven by user needs. The rough balance we try to maintain is to keep most pixels being drawn by fast code, and most features being drawn by code that's simpler, more combinatoric, and harder to get wrong.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/85b8299c-a3c2-4880-89ba-79eaf7f21a2c%40googlegroups.com.

Bharat Ahuja

unread,

Aug 14, 2019, 12:22:51 PM8/14/19

to skia-d...@googlegroups.com

Ohh I see. Thanks, then I 'll probably integrate into our skia build. BTW same goes for tiled bitmap also. Maybe we removed tiled optimization for the same reason?

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/CAD%3DHvMuNQCTE_jiGTkTwfv%3D_jp03XjMF4p7kQuQA0jY1Ngq8zw%40mail.gmail.com.

Mike Klein

unread,

Aug 14, 2019, 12:26:05 PM8/14/19

to skia-discuss

Yep, as you see the only routines that are still there are the clamp-in-x/clamp-in-y ones. Mirror, repeat, and decal all go through the general case.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/CA%2B1kOPqYPQ30sdqZbQ5axspp-STefBT3gYmRABDSD%2BcEJbL7yg%40mail.gmail.com.

Bharat Ahuja

unread,

Oct 10, 2019, 1:50:31 PM10/10/19

to skia-discuss

Hi Mike,

As we discussed last time I am bringing those fast bitmap rendering paths back for affine transformation case. Changes mainly bring back matrix procs and sampling procs( i.e S32_alpha_D32_filter_DXDY ).I am almost done with changes and have validated my changes on ARM(Neon), EmulatorX86(SSE2), EmulatorX86_64( SSE3 ). I had earlier done all these changes on M74. In the latest version of Skia, I can see we have also added AVX based implementation of sampling function S32_alpha_D32_filter_DX. Currently, I am missing AVX implementation of S32_alpha_D32_filter_DXDY( I do have SSE2 and SSE3 implementation ) but still, my changes seem to compile fine if I build it using ninja -C out/Shared.

Is AVX implementation is not enabled by default? or am I just on compiling right platform?

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/85b8299c-a3c2-4880-89ba-79eaf7f21a2c%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "skia-discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-d...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/CAD%3DHvMuNQCTE_jiGTkTwfv%3D_jp03XjMF4p7kQuQA0jY1Ngq8zw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "skia-discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-d...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/CA%2B1kOPqYPQ30sdqZbQ5axspp-STefBT3gYmRABDSD%2BcEJbL7yg%40mail.gmail.com.

Mike Klein

unread,

Oct 10, 2019, 2:18:32 PM10/10/19

to skia-discuss

It'd help to see your code, but you have to explicitly opt each function into the various better-than-default SkOpts_foo.cpp specializations. Take a look at SkOpts_hsw.cpp?

I think we've noticed that for most use cases, hsw::S32_alpha_D32_filter_DX doesn't actually help much if at all. Seems like once we got the pmaddubsw instructions in there with_mm_maddubs_epi16(), it's mostly memory limited. In fact, with AVX2, using _mm256_maddubs_epi16() really only helped make the code size smaller... a more naive approach resembling the SSE2 code ran at the same speed.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/4b22351a-0bc1-49ad-9a1d-043d752b037c%40googlegroups.com.

Bharat Ahuja

unread,

Oct 10, 2019, 2:27:14 PM10/10/19

to skia-discuss

Cool thanks for explaining.I will take a look into this file.

My Changes are little unclean. I will send you a clean change in a day or two.

Thanks
Bharat

Mike Klein

unread,

Oct 10, 2019, 2:28:55 PM10/10/19

to skia-discuss

No prob, take your time.

--
You received this message because you are subscribed to the Google Groups "skia-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/e4f37f54-55fe-4c51-aebb-a878f8a4f066%40googlegroups.com.

Bharat Ahuja

unread,

Oct 15, 2019, 10:56:58 AM10/15/19

to skia-discuss

Hi Mike,

I am finished with the change. Can you please take a look.

https://skia-review.googlesource.com/c/skia/+/248816 .

Thanks

Bharat

On Thursday, October 10, 2019 at 11:58:55 PM UTC+5:30, Mike Klein wrote:

No prob, take your time.

On Thu, Oct 10, 2019 at 1:27 PM Bharat Ahuja <ahujab...@gmail.com> wrote:

Cool thanks for explaining.I will take a look into this file.

My Changes are little unclean. I will send you a clean change in a day or two.

Thanks
Bharat

--
You received this message because you are subscribed to the Google Groups "skia-discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-d...@googlegroups.com.

Bharat Ahuja

unread,

Oct 17, 2019, 10:56:21 AM10/17/19

to skia-discuss

Thanks so much for the valuable feedback mike. I have updated the iteration incorporating feedback. I have also removed change which enables legacy pipeline when tile modes are not equal. I 'll send it as another cl. Please take a look.

Thanks

Bharat

Bharat Ahuja

unread,

Oct 17, 2019, 3:11:00 PM10/17/19

to skia-discuss

Hi Mike,

I have tried this change on x86 and x86_64 with a bitmap of size 600 x 600 rendered on RGBAp888 SkCanvas of size 1000 x 1000 with a 45-degree transform. I am noticing improvements for these platforms as well. Improvement is significantly large when extend mode is repeat or mirror. I have attached the benchmark results in the review. We primarily care about ARM devices so we don't mind limiting this CL to ARM platforms but I believe it can be useful for other platforms as well. Let me know what's your call on this?

Thanks

Bharat

Mike Klein

unread,

Oct 17, 2019, 3:14:29 PM10/17/19

to skia-discuss

You've split this in two, right, one part that would affect rotate, one part that would affect repeat and/or mirror?

To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/e1b581d3-4532-4bf7-bc9e-689bd2f72b12%40googlegroups.com.

Bharat Ahuja

unread,

Oct 17, 2019, 3:33:45 PM10/17/19

to skia-discuss

Actually I have split it into two parts. One is rotate and other one is when tilex mode is not same as tiley mode( irrespective of rotate or scale ).

Thanks
Bharat

Bharat Ahuja

unread,

Oct 18, 2019, 7:41:32 AM10/18/19

to skia-discuss

Hi Mike,

As discussed, I have limited this change to ARM\ARM64 devices and published new iteration. Please take a look.

Thanks

Bharat

Bharat Ahuja

unread,

Oct 21, 2019, 9:56:13 AM10/21/19

to skia-discuss

Hi Guys,

I have incorporated all CR comments and published a new iteration. Since I am going for a vacation (~10 days), I would really appreciate if we can close on this, if things look okay, before Wednesday.

Thanks

Bharat

Mike Klein

unread,

Oct 21, 2019, 12:18:03 PM10/21/19

to skia-discuss

I'd prefer we wait until you get back to land this. This isn't time critical and I don't want to artificially rush.

--

You received this message because you are subscribed to the Google Groups "skia-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/8d5808d2-56a8-48f1-b48d-64a5a5a7d26c%40googlegroups.com.

Bharat Ahuja

unread,

Oct 21, 2019, 1:15:50 PM10/21/19

to skia-discuss

Sure. sounds good.

Thanks so much for the feedback.

Thanks

Bharat

On Monday, October 21, 2019, at 9:48:03 PM UTC+5:30, Mike Klein wrote:

I'd prefer we wait until you get back to land this. This isn't time critical and I don't want to artificially rush.

On Mon, Oct 21, 2019 at 8:56 AM Bharat Ahuja <ahujab...@gmail.com> wrote:

Hi Guys,

I have incorporated all CR comments and published a new iteration. Since I am going for a vacation (~10 days), I would really appreciate if we can close on this, if things look okay, before Wednesday.

Thanks
Bharat

On Friday, October 18, 2019 at 5:11:32 PM UTC+5:30, Bharat Ahuja wrote:
Hi Mike,

As discussed, I have limited this change to ARM\ARM64 devices and published new iteration. Please take a look.

Thanks
Bharat

On Friday, October 18, 2019 at 1:03:45 AM UTC+5:30, Bharat Ahuja wrote:
Actually I have split it into two parts. One is rotate and other one is when tilex mode is not same as tiley mode( irrespective of rotate or scale ).
Thanks
Bharat

--
You received this message because you are subscribed to the Google Groups "skia-discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-d...@googlegroups.com.

Matthew Leibowitz

unread,

Nov 9, 2019, 9:38:46 PM11/9/19

to skia-discuss

I think I am following this discussion, but there seems to be talk of other variables.

I just did a test on fiddle and I see a 10x difference with 0 degrees vs 20 degrees. Is this what you folks are talking about?

https://fiddle.skia.org/c/b83a5b08608c2611f3cb88153548d70b

On Monday, October 21, 2019 at 6:18:03 PM UTC+2, Mike Klein wrote:

I'd prefer we wait until you get back to land this. This isn't time critical and I don't want to artificially rush.

On Mon, Oct 21, 2019 at 8:56 AM Bharat Ahuja <ahujab...@gmail.com> wrote:

Hi Guys,

I have incorporated all CR comments and published a new iteration. Since I am going for a vacation (~10 days), I would really appreciate if we can close on this, if things look okay, before Wednesday.

Thanks
Bharat

On Friday, October 18, 2019 at 5:11:32 PM UTC+5:30, Bharat Ahuja wrote:
Hi Mike,

As discussed, I have limited this change to ARM\ARM64 devices and published new iteration. Please take a look.

Thanks
Bharat

On Friday, October 18, 2019 at 1:03:45 AM UTC+5:30, Bharat Ahuja wrote:
Actually I have split it into two parts. One is rotate and other one is when tilex mode is not same as tiley mode( irrespective of rotate or scale ).
Thanks
Bharat

--
You received this message because you are subscribed to the Google Groups "skia-discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-d...@googlegroups.com.

Mike Klein

unread,

Nov 10, 2019, 6:20:08 AM11/10/19

to skia-discuss

Not exactly, no. Drawing at 0 degrees there is a very special case as far as speed goes, essentially as fast as memcpy(), while any rotation or scale that kicks us out of that special 1:1 case requires more work.

To unsubscribe from this group and stop receiving emails from it, send an email to skia-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/skia-discuss/05940091-793e-4e38-bc03-0acd0f783dd4%40googlegroups.com.

Reply all

Reply to author

Forward