Low-level graphics access - rendering huge grids

228 views
Skip to first unread message

Creative Magic

unread,
Jun 17, 2016, 7:18:22 AM6/17/16
to Haxe
I'm curious on how large of a grid I can render. So I've written a Game Of Life sample where each tile is 1px. Now I want to run and render the sample on full-screen on iOS, Android and HTML5.

This is not a real project, but I'm interested in learning how to deal with such "challenges".

I've tried rendering using OpenFL:

* Creating screenWidth*ScreenHeight amount of Sprite objects (Graphics API) ===> unresponsive app
* Creating screenWidth*ScreenHeight amount of Bitmap objects with 2 pre-calculated BitmapData's ===> 1-2 FPS
* Using OpenFL's Tilesheet ===> 3-5 FPS
* Creating a single large BitmapData that would have it's values set via setPixel() method ===> ~12 FPS
* Creating a single large BitmapData that would have it's values set via setPixels() method (the pixels data would be set from a ByteArray in a single pass) ===> 15-18 FPS
* Calculating the grid in memory without rendering - 59-61 FPS ( so at least I know that rendering is the problem )

The results were really-really close between Android, iOS, Neko and HTML5. I tried the blitting on Flash and it kind of worked (27FPS with target 30FPS) but I don't think I'll use Flash for release.

I understand now that to render such a huge grid I need to use low-level access to the GPU. Any advice how to do that with Haxe? How would you render a grid, say, 1800x900px?


David Elahee

unread,
Jun 17, 2016, 8:39:25 AM6/17/16
to haxe...@googlegroups.com
This is not really haxe specific...1800x900px WOW that is a very big amount of pixels and hardware are not built this way.

There is not that much solutions. You have to map the GL texture in mem and write it directly then only upload the written portion to the GPU. That is kind of what you do with flash, but flash handles it for you.

If you want to do it on phones, I believe you'll have to do it the raw opengl way only uploading and blitting what is needed.

Good luck.

--
To post to this group haxe...@googlegroups.com
http://groups.google.com/group/haxelang?hl=en
---
You received this message because you are subscribed to the Google Groups "Haxe" group.
For more options, visit https://groups.google.com/d/optout.



--
David Elahee


Creative Magic

unread,
Jun 17, 2016, 8:52:45 AM6/17/16
to Haxe
Not a Haxe specific task, sure, but the question is how would you make an app with Haxe that could do that. If it's done with raw OpenGL (and I think it is) then maybe someone could shed some light how they'd make a bridge using Haxe.
Again, I'm just trying to test the limits, see what's possible and I don't really care if it's hard. And since it's technically possible to manipulate each pixel on a device, I want to learn how and to what extent I can use it.

Robert Konrad

unread,
Jun 17, 2016, 9:28:53 AM6/17/16
to Haxe
Try the g1 api in Kha -> https://github.com/KTXSoftware/Kha/blob/master/Sources/kha/graphics1/Graphics.hx
And then send me some test code so I can tune the performance :)

JLM

unread,
Jun 17, 2016, 9:29:15 AM6/17/16
to Haxe
You could try look at https://github.com/azrafe7/hxPixels there is a luxe example, this should allow easy manipulation of pixels but maybe not optimum I don't know?

Luxe currently targets gpu with shaders, if possible move heavy stuff to custom shaders.

Probably Luxe and Kha are more focused on shaders than flash style toolkits.

http://luxeengine.com/docs/guide.shaders.html
http://luboslenco.com/kha3d/

But you may find Nicolas shader language of interest
http://ncannasse.fr/projects/shaders?version=322


Creative Magic

unread,
Jun 17, 2016, 10:15:06 AM6/17/16
to Haxe
Try the g1 api in Kha -> https://github.com/KTXSoftware/Kha/blob/master/Sources/kha/graphics1/Graphics.hx
And then send me some test code so I can tune the performance :)

Thanks, I'll try that and send you a link to GitHub, I'm interested to what degree of success I can use it. Hopefully it's also good on HTML5 (WebGL) support. I really wish there'd be a renderer that can do it all :D

You could try look at https://github.com/azrafe7/hxPixels there is a luxe example, this should allow easy manipulation of pixels but maybe not optimum I don't know?

Sure, I'll test it, the name sounds right! :) I'll write another reply to this post once I'm done with examples. Thank you for your reply! 

JLM

unread,
Jun 17, 2016, 11:04:01 AM6/17/16
to Haxe
Kha does webgl and c++, it is a renderer that can do it all but it is quite low level compared to flash. But depending on your algorithm depends on how much you might be able to do on the shader and how much via some pixels on the cpu.
This link might be useful for testing your webgl max size.
https://www.khronos.org/registry/webgl/sdk/tests/conformance/limits/gl-max-texture-dimensions.html

Creative Magic

unread,
Jun 17, 2016, 12:21:54 PM6/17/16
to Haxe
JLM, I've used HxPixels, the result was 30-31 FPS on full screen with OpenFL on -Dwebgl out of max 60 FPS. If I reduce the screen by half - I easily surpass 60 FPS. I've also disabled the update() method of my GameOfLife class so that I'd only see rendering performance.

I'll try Kha tomorrow and post results.

Also, my max webgl size is 128^2, it seems...oddly little.

Mark Knol

unread,
Jun 17, 2016, 2:03:41 PM6/17/16
to Haxe
1800x900 is large? Why?

Ive done generative art at 15000x15000 px a few years ago in actionscript. Withou gpu rendering.

Today the cool kids want 2x 4k for decent VR experience :) so thats big, but becoming pretty much a norm.

I also love to know for personal experiments a lib/target with fast and convenient graphics api, other than Flash.

David Elahee

unread,
Jun 17, 2016, 3:01:56 PM6/17/16
to haxe...@googlegroups.com
My bad. I meant uploading 1800x900 each frame from cpu to gpu is big for mobile. Actionscript does only upload modified sections which is a convenient scenario.

--
To post to this group haxe...@googlegroups.com
http://groups.google.com/group/haxelang?hl=en
---
You received this message because you are subscribed to the Google Groups "Haxe" group.
For more options, visit https://groups.google.com/d/optout.



--
David Elahee


Robert Konrad

unread,
Jun 17, 2016, 7:48:08 PM6/17/16
to Haxe
Most mobile hardware wouldn't even need to upload anything to the gpu because memory is usually shared by cpu and gpu. Maybe OpenGL implementations copy stuff around anyway but at least with Metal and Vulkan the proper behavior can be enforced.

David Elahee

unread,
Jun 18, 2016, 2:58:50 AM6/18/16
to haxe...@googlegroups.com

Opengl has a client server memory model so textures are copied by paradigm,same goes for buffers and uniforms.


Remember, you can totally erase them after upload. And phone memories are awfully slow...

Uploading large 2k if bgra is 0 copy time because dma carries it. But if you access them from cpu during this dmacopy, you would get a fatass system stall.

At least that was the case a couple years ago...

David Elahee

unread,
Jun 18, 2016, 3:12:46 AM6/18/16
to haxe...@googlegroups.com

For your grid problem, the good old solution when sub texture blitz is not available is to split your world in something like 128x128  tiles mapping texture 1 to 1 and only upload affected chunks. If you have very dynamic system, you ll have to resort to buffer mapping or sub tex upload. Gl.

Robert Konrad

unread,
Jun 18, 2016, 5:46:32 AM6/18/16
to Haxe
GL does have memory mapping functions nowadays so it's up to the implementation whether it copies stuff or not. Copying makes CPU/GPU synchronization easier. With modern apis sync is your job so you can do what you want. In any case, there is no "upload" on most mobile hardware. You're still taking a performance hit because the layout of CPU accessible textures is not optimized for GPU access so cache usage is not optimal, .

David Elahee

unread,
Jun 18, 2016, 10:33:14 AM6/18/16
to haxe...@googlegroups.com
Let's clear up some things :)

As OP mentionned mobile targets, I emphasized on techniques for mobile of the gles 2.0 generation. Robert is very right for gles3 and modern architectures.

So even if Robert points are true for modern gpu (gles3.0+), in production situation on mobile, you cannot assume all machines will have fast texture "uploads" ( or copies, whatever, it is hardly capable to upload more than a full frame buffer ). The fastest you'll get is using  glCopyTexSubImage2D which can sometime still be very slow compared to glMapBuffer.

In the end and beyond technical details, Robert is totally right concerning gles 3.0, vulkan and metal, the memory model for them is no longer a client server one and you can assume writing to gpu owned memory "fast enough" for your use case using bindless textures or glMap of pipelines. ( GL ES is not that fast because memory sharing is still two times slower than bindless textures ala metal/vk )...
But it do not fit very well game production since they can only be used to limited fractions of the market (https://developer.android.com/about/dashboards/index.html#Platform) especially if you are a solo indie and don't have a dedicated gpu engineer that will deliver the best pipeline everywhere :)

Now if you go the "desktop" way or if ou work on small grids, you won't have most these problems...
Let's hope we exposed the whole landscape well enough, in any case you'll have to experiment with many solutions and take your lowest target device into account. For mobiles, the samsung galaxy tabs are usually the perfectly slowest devices to work with (big screen, slow memory).





--
To post to this group haxe...@googlegroups.com
http://groups.google.com/group/haxelang?hl=en
---
You received this message because you are subscribed to the Google Groups "Haxe" group.
For more options, visit https://groups.google.com/d/optout.



--
David Elahee


Robert Konrad

unread,
Jun 18, 2016, 12:11:51 PM6/18/16
to Haxe
Ya, I'm kind of focused on newer hardware recently :) All depends on your target audience and when you plan to release something (when you start a project today, especially if it's a game, it probably won't be out tomorrow).

Creative Magic

unread,
Jun 19, 2016, 2:15:35 AM6/19/16
to Haxe
Thanks to everyone for their posts, it's really interesting to get more and more info on the subject. But please note again: it's not a real project, it's not something I want to apply in day-to-day tasks. It's about learning and pushing the limits, like a programming challenge that I've made for myself. The results of this "journey" will later be written in a form of a post for others to see without going though all the materials that I'm going through now.

I know I haven't posted here in a while, but it's because the deeper I go into this rabbit-hole of GPU rendering, the more questions I need to answers for :D

For future posters I would like to politely ask: please focus on the discussion of what framework/engine/renderer/algorithm etc. you think is viable to render huge grids fast. It can be an new thing that's not widely supported or a good-old approach that would still work.

I can't promise to devote all my free time to this experiment, but I think I'll write the initial post once I get at least one good solution and update later if more solutions are available.

Creative Magic

unread,
Jun 19, 2016, 2:19:15 AM6/19/16
to Haxe
BTW, just to add: I am looking for a way to render huge amounts of animations on screen to make all kinds of silly games where you control swarms of units ( each about 16x18 pixels ) and have thousands of those fight. So If I'll be able to simulate The Game of Life on full-screen with 1 cell = 1px, then the next step would be to have moving characters that each has a couple of animations.

I'm also looking into how modern particle-systems are made and maybe get some inspiration out of that.

Robert Konrad

unread,
Jun 19, 2016, 6:53:06 AM6/19/16
to Haxe
Oh, in this case we're talking about the wrong approach. If your sprite is 1 pixel in size you might be better off modifying a texture, but not if it's 288 pixels. You can draw hundreds of thousands or even millions of moving sprites by modifying a vertex buffer.

Rafael Oliveira

unread,
Jun 19, 2016, 3:44:00 PM6/19/16
to Haxe
I made a implementation in kha this weekend

The cell size is adjustable. But with the size of 1 pixel the result isn't very different. I'm not a expert in graphics, I just used g1.setPixel with cells of 1 pixel and g2.fillRect with sizes bigger than 1.
The results are: 8 FPS with 1pixel, 31 FPS with 2x2 and 60 FPS with 3x3.
Can you check in your machine?

Creative Magic

unread,
Jun 19, 2016, 9:32:04 PM6/19/16
to Haxe
Wow, Rafael, good job! I really appreciate the effort to write this. I got similar results, but my machine could get to 51 FPS in 2x2 cells size, but dead on 8 FPS with 1x1. Although I couldn't find a satisfactory implementation yet, I'm getting closer to understanding what the limits are and why.

I'll spend a couple more days researching this topic and then start writing my article on this topic.

Hugh

unread,
Jun 19, 2016, 9:46:28 PM6/19/16
to Haxe
There is a demo implementation of this using byte-array access in nme:

You can test this with "haxelib run nme demo 16 cpp"

I think on mobile you will be limited by texture-upload speed.  In this case, I suggest you use a luma texture, rather than an rgba texture, for a 4x speedup.

Even better would be to implement the algorithm in a shader and ping-pong between 2 render targets to avoid the texture-upload altogether.

Hugh

Hugh

unread,
Jun 19, 2016, 10:37:39 PM6/19/16
to Haxe
Another idea would be to combine 16x1 'life cells' into a single tile index into a tilesheet.  You could then have a tilesheet with 2^16 16x1 bitmaps.
This should give you a 16x speed improvement over simple 1-pixel-tiles.
If you scoot along the row until you start with an 'on' pixel, you can halve the required texture memory, ie 2^15 bitmaps and easily avoid some empty cells.

But if this is not really your goal, the quad-rendering routines, as defined by bunny mark, should give you some idea of what can be done.

Hugh

Rafael Oliveira

unread,
Jun 20, 2016, 1:07:19 AM6/20/16
to Haxe
Shaders can be used in Kha, but for now I don't have enough knowledge, but I will try in another time.

PSvils

unread,
Jun 20, 2016, 10:11:04 AM6/20/16
to Haxe
I would suggest in your case you record all changes that happened in a frame, and stream those as vertices, and use glDrawPoints() (probably something similar in Kha or Snow or Lime)
That way it's 1 draw call, no direct texture access, and streaming a series of vertices should be pretty light weight.

From my minimal knowledge, that sounds like the fastest way, especially since you're not doing anything extra, only drawing the changed cells.

JLM

unread,
Jun 20, 2016, 11:21:48 AM6/20/16
to Haxe
I found this, looks useful for your needs using shaders could be done in Kha or webgl haxe.
http://nullprogram.com/blog/2014/06/10/
Reply all
Reply to author
Forward
0 new messages