AWS IoT, heap and stack usage

moriarty

unread,

Mar 9, 2016, 2:41:36 AM3/9/16

to esp-open-rtos mailing list

Hi all,

I'm trying to use eep-open-rtos with AWS IoT ( https://aws.amazon.com/iot/sdk/ ). Porting the code is a piece of cake (implementing simple timer interface + doing a bit of cert parsing), but the memory usage seems to be an issue - I'm either running out of heap or stack (depending on how big I make the stack :) ).

The code from high level to low level looks like this:

int user_init() {

.. //set up statin config

xTaskCreate(doStuff, (const signed char*)"c", 2100, &mainqueue, 2, NULL);

}

void doStuff() {

MQTTConnectParams connectParams = // set up params

aws_iot_mqtt_connect(&connectParams); // this guy runs out of heap / stack

printf("connected\n");

}

void aws_iot_mqtt_connect() {

MQTTConnect();

}

// MQTT a few layers down calls iot_tls_connect

void ios_tls_connect() {

parse_root_cert();

parse_device_cert();

parse_private_key();

// by this point we've used up 10-15k of heap

mbedtls_ssl_handshake(); // this is where things are getting interesting

}

int mbedtls_ssl_handshake() {

while (state != HANDSHAKE_OVER) {

perform_client_handshake_step();

}

int perform_client_handshake_step() {

switch (state) {

case HANDSHAKE_STEP_1: return do_step1(); // possibly changes state

case HANDSHAKE_STEP_2: return do_step2(); // possibly changes state

....

case HANDSHAKE_STEP_LAST: return do_stepLast(); // possibly changes state

}

Somewhere inside one of those steps my heap and stack collide and things go haywire. What's interesting is that I'm monitoring stack size (uxTaskGetStackHighWaterMark) in various points inside those do_step() functions and the stack size always increases. I'd expect it to be fairly stable (modulo some variance because I'm measuring at random points at different 'depth'), as we're looping inside the same mbedssl_handshake() loop. But it seems to never decrease.

I did a few simple experiments to check that I'm still sane:

void printmem() {

printf("MEM0 stack %lu\n", uxTaskGetStackHighWaterMark(NULL));

}

int heavy() {

char buff[100];

memset(buff, 0, 100);

return buff[0] + buff[rand() % 100];

}

int light() {

int aaa[50];

memset(aaa, 0, 50);

return aaa[rand() % 10];

}

int main() {

int v = 0;

// expect to always print the same watermark

printmem(); // prints 2018

printmem(); // prints 1951

v += light();

printmem(); // prints 1951

v += heavy();

printmem(); // prints 1951

}

...and I can't explain the first change in the stack size. I thought it might have to do with first call to printf EVER (why does it affect stack of main() and not printmem?), or maybe first call to any function (function brought up from ROM to IRAM? why does it only happen for printmem() and not heavy() / light() ? I'm experimenting with -O0 so inlining should be out of picture), but none of those theories make sense. Perhaps, since I'm not really measuring the stack size, but watermark, it's dynamically pushed down by the growing heap?

My next step would be diving ears deep into the magnificent world of assembly, but perhaps someone already knows what I'm missing?

Angus Gratton

unread,

Mar 9, 2016, 4:22:54 PM3/9/16

to moriarty, esp-open-rtos mailing list

Hi moriarty,

Very keen to have the Amazon IoT SDK supported, that sounds very useful. :)

On Tue, Mar 08, 2016 at 11:41:36PM -0800, moriarty wrote:
> Somewhere inside one of those steps my heap and stack collide and things go
> haywire. What's interesting is that I'm monitoring stack size
> (uxTaskGetStackHighWaterMark) in various points inside those do_step()

> functions and the stack size *always* increases. I'd expect it to be fairly

> stable (modulo some variance because I'm measuring at random points at
> different 'depth'), as we're looping inside the same mbedssl_handshake()
> loop. But it seems to never decrease.

I think the reason for this is that the interrupt handlers use the stacks of each task (the IRQ handler stack frames are pushed on top of the current task's stack). So depending on where the task's stack is when a particular interrupt fires, it can move the stack high water point.

There's an issue open that tracks the desire to change this, to save memory:

https://github.com/SuperHouse/esp-open-rtos/issues/99

(I'm not 100% sure this is the reason, because measuring the stack high water mark itself happens inside the interrupt handler, so it may be more complex than that - but I think that's the root cause.)

Do you have enough spare heap to increase the stack size of the task whose stack is overflowing?

Regarding reducing memory usage overall, I think there are two open issues that will greatly help once solved. #99 (linked above) and #11:

https://github.com/SuperHouse/esp-open-rtos/issues/11#issuecomment-194511051

I have plans to work on these, just no time yet. If anyone else wants to work on those and has questions, I'm very happy to answer them (in that case I'd suggest commenting inside those individual issues, to help keep track).

Regards,

Angus

moriarty

unread,

Mar 11, 2016, 2:35:01 AM3/11/16

to esp-open-rtos mailing list

OK, so I have good news and bad news.

Good news: managed to get it working: https://twitter.com/Moriarty/status/708187076219375616 . The stack weirdnesses I saw were easily explainable if I have paid more attention to the documentation - uxTaskGetStackHighWaterMark measures watermark (what a surprise!), not the current stack usage. So, it's monotonically growing by default.

Bad news: memory is tight. After the MQTT connection is established, there's only 6k heap left (on the bright side, there's plenty of stack available - the task was configured with stack size of 2100 variables), and the stack watermark is just ~400 (yeah, that's pretty close).

And it took a few kinky hacks to get even to this tight state:

1) bigint library used by mbedtls (unfortunately) follows the best security practices and zeroes all the dynamic memory it uses before and after usage. This means they only use malloc() + free() pair and never realloc(). At the same time, they grow their memory regions quite often, and that leads to fragmentation and excessive heap usage. One of the changes I made was to use realloc(). This is a big no-no in a secure code. I'll think what we can do here without sacrificing security, and try to push a change to mbedtls.

2) mbedtls parameters had to be tweaked, most notably MBEDTLS_SSL_MAX_CONTENT_LEN was decreased (also disabled everything except TLS 1.2, but I don't think that impacted memory usage significantly). This is more or less fine by itself, but there's no guarantee that the library will still work with other servers configured differently. Even more, I think if Amazon change their server's configuration (maybe add more cyphers), there's a chance the library could stop working. OTOH, I'm no expert here and these are just speculations

I'll work on publishing the changes, and if that doesn't take long, can take a look at storing .rodata in ROM

Angus Gratton

unread,

Apr 6, 2016, 7:12:50 PM4/6/16

to moriarty, esp-open-rtos mailing list

Hey Moriarty,

On Thu, Mar 10, 2016 at 11:35:01PM -0800, moriarty wrote:
> I'll work on publishing the changes, and if that doesn't take long, can
> take a look at storing .rodata in ROM

How'd you get on with this?

> Good news: managed to get it
> working: https://twitter.com/Moriarty/status/708187076219375616 . The stack
> weirdnesses I saw were easily explainable if I have paid more attention to

> the documentation - uxTaskGetStackHigh*WaterMark* measures *watermark* (what

> a surprise!), not the current stack usage. So, it's monotonically growing
> by default.

This is true, but it should still hit a maximum point after a while, or something has gone wrong... :)

> Bad news: memory is tight. After the MQTT connection is established,
> there's only 6k heap left (on the bright side, there's plenty of stack
> available - the task was configured with stack size of 2100 variables), and
> the stack watermark is just ~400 (yeah, that's pretty close).
>
> And it took a few kinky hacks to get even to this tight state:
> 1) bigint library used by mbedtls (unfortunately) follows the best
> security practices and zeroes all the dynamic memory it uses before and

> *after* usage. This means they only use malloc() + free() pair and never

> realloc(). At the same time, they grow their memory regions quite often,
> and that leads to fragmentation and excessive heap usage. One of the
> changes I made was to use realloc(). This is a big no-no in a secure code.
> I'll think what we can do here without sacrificing security, and try to
> push a change to mbedtls.

Hmm. :( Do you have any numbers on how fragmented the heap gets?

Having a glance at the bignum implementation, maybe we can add some tweaks to preemptively allocate larger buffers to avoid growing unnecessarily? If it dramatically reduces the amount of fragmentation then it might be something the mbedtls maintainers would accept upstream as well.

> 2) mbedtls parameters had to be tweaked, most
> notably MBEDTLS_SSL_MAX_CONTENT_LEN was decreased (also disabled everything
> except TLS 1.2, but I don't think that impacted memory usage
> significantly). This is more or less fine by itself, but there's no
> guarantee that the library will still work with other servers configured
> differently. Even more, I think if Amazon change their server's
> configuration (maybe add more cyphers), there's a chance the library could
> stop working. OTOH, I'm no expert here and these are just speculations

MBEDTLS_SSL_MAX_CONTENT_LEN was already too low to be spec compliant. :( The spec says you have to accept up to 16KiB of data in a single "block", unless you negotiate a smaler buffer size using the TLS content length extension. Which most servers don't enable :(.

Connecting to a server which doesn't support session size negotiation is playing with fire, because at any point they can send you a block that you can't decrypt (because it's too long) and then you have no choice but to close the session. However if you know a bit about the application layer then you may be able to be sure that the largest plaintext sent at once is X bytes, in which case you can round up to the cipher block size (I think?) and then you're OK...

The big change I'd like to make in mbedtls (and submit back upstream) is to allow asymetrical MBEDTLS_SSL_MAX_CONTENT_LEN. Because even though you may not know how large the longest block you receive might be, you can certainly control the largest block you send. So I think having a 512 byte TLS transmit buffer is fine (if a bit less efficient on the wire), and that gives you more space to allocate for the receive buffer.

I can't think of a reason the mbedtls maintainers won't accept that patch, although I'm not a cryptography expert so maybe it weakens the encryption somehow (I just can't see how...)

Angus

Drasko DRASKOVIC

unread,

May 4, 2016, 5:46:45 AM5/4/16

to esp-open-rtos mailing list

@moriarty,

is there a published code?

I am very interested in this implementation.

BR,

Drasko

Reply all

Reply to author

Forward