New functions in String class

68 views
Skip to first unread message

TK

unread,
Apr 20, 2016, 11:41:49 AM4/20/16
to Developers
I have found some lack of functionality in the String class, which I need.
I would add it by myself and post the changes for review here, when done.

But before starting, I want to discuss about a suitable implementation of the functionality, as proposed on the forum.

The functions I need and my proposals as follows:

--- Padding ---
1a) PadLeft (long), PadLeft (long, char): Add a given character (space if no char provided) to the left side of string until the string reaches a given length.
same name and purpose as in .Net: https://msdn.microsoft.com/de-de/library/system.string.padleft%28v=vs.110%29.aspx

1b) PadRight (long), PadRight (long, char): Same as PadLeft (...), but on the other side.

--- Cropping ---
2a) CropLeft (long): Remove n characters from the left side of the string.

2b) CropRight (long): Same as CropLeft (...), but on the other side.

--- Keeping ---
3a) KeepLeft (long): Keep n characters from the left side of the string, remove the rest.

3b) KeepRight (long): Same as KeepLeft (...), but on the other side.

--- Trim ---
4a) Trim (char): Like Trim(), but with given character instead of space.

4b) TrimStart (), TrimStart (char): Like Trim (...), but only modifying the start of the string.

4c) TrimEnd (), TrimStart (char): Like TrimStart (...), but only modifying the end of the string.

4d) All Trim...(...) should (additionally) return the object (currently it works on the object and returns void), so that the command can be concatenated with others.

Awaiting your comments.

Best regards
Tobias

Rob Tillaart

unread,
Apr 20, 2016, 12:42:42 PM4/20/16
to Arduino Developers
Hi,

You can create issues for the libs on github.

that said, given the memory constraints on Arduino the 'long' parameters should be uint8_t or uint16_t 

KeepLeft(n) 
{
  CropRight(length()- n);
}

I would overload trim to trim both ends ->  trim(char end = " ", char start = "");


--
You received this message because you are subscribed to the Google Groups "Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@arduino.cc.

William Westfield

unread,
Apr 21, 2016, 12:56:04 PM4/21/16
to devel...@arduino.cc
On Apr 15, 2016, at 5:30 AM, TK <flitz...@gmail.com> wrote:

> I have found some lack of functionality in the String class, which I need.
> I would add it by myself and post the changes for review here, when done.

I guess. Enhancing a feature widely believed to not work very well seems … less than ideal.
Maybe you could simultaneously implement strXXXXXX(char *, …) functions that do approximately the same thing on char arrays at the same time?


> --- Keeping ---
> 3a) KeepLeft (long): Keep n characters from the left side of the string, remove the rest.

Isn’t that the same as the current “remove”?

BillW

Massimo Banzi

unread,
Apr 21, 2016, 1:56:58 PM4/21/16
to devel...@arduino.cc

On 21 April 2016 at 17:56:00, William Westfield (wes...@mac.com) wrote:

. Enhancing a feature widely believed to not work very well seems 

Are you referring to String? What makes you say that it’s "widely believed to not work very well 

I’m using them quite a bit on the SAMD21 boards and they are a lifesaver. Clearly on a 1K ram 8 bit micro they are a bit overkill.


-- 
Massimo Banzi
Arduino.cc


Massimo Banzi

unread,
Apr 21, 2016, 1:57:46 PM4/21/16
to devel...@arduino.cc
Tobias: Thanks for the contribution. there is a lot of useful stuff there. 

I do think it adds too many methods and it might make the code become a bit too complicated to understand. 

We can clean up the API a bit and make it a valuable addition to the library.



-- 
Massimo Banzi
Arduino.cc

TK

unread,
Apr 22, 2016, 1:22:58 AM4/22/16
to Developers


Am Donnerstag, 21. April 2016 19:57:46 UTC+2 schrieb Massimo Banzi:
I do think it adds too many methods and it might make the code become a bit too complicated to understand. 

I have added just the methods I am using myself now. Actually, I have added 3 times that number of methods. Each one has become 2 overloaded variants.
Take a look at the .net String class, it has much more functionality. If it's well documented, then each user might easily find his way and not get confused by the amount of functionality. And I don't see why it is too complicated. Can you give me an example? The normal user won't get in touch with the implementation, so he should only care about the .h and the comments in it which explain the usage of the functions. I have not yet added any comment, because I thought that the names of the functions and variables are quite self-explaining, but I of course will if requested.

Because I did not get a reply here for days (and this post was not visible for days), I have already added them at github: https://github.com/arduino/Arduino/issues/4870
My post there also contains the modified and tested WString.cpp/.h files. The added functionality is confirmed to work well.
I did not have time yet to create a fork as suggested there, I just uploaded the files.

@Bill Westfield:
I am using the String class without any problems. It works great! I don't know what you or others believe about it, but you should rethink your opinion.

KeepLeft (long): is not the same as "remove()". remove() rather is the same as "cropRight()". But it is still different: remove() has the start index, cropLeft/Right has the number of chars. KeepLeft/Right also has number of chars.

The overloaded functions ...C and ...S make String operation much more convenient than before, see uploaded test program on github. Here are 3 short examples (all using the really great(!) streaming.h):
1) equalizing number alignment to 5 (4) chars before dot:
m_oDisplay << String(m_fTWasser, 1).padLeftC(5) << " (" << String(m_fTWasserSoll, 1).padLeftC(4) << ")";
2) printing a byte array with leading 0
*m_poDisplay << String(*(m_pbyValue + ixCnt), HEX).padLeftC(2, '0');
3) printing a float (reference to it given as byte*), adding spaces left, limiting total length: This is 1 line instead of 4! :)
*m_poDisplay << String(*(float*)m_pbyValue, m_byTextLenDecimals).padLeftC(m_byTextLenTotal).keepRightC(m_byTextLenTotal);


I hope I was able to demonstrate why the functionality is needed by me and helpful for others. Please check out the files on github.

TK

unread,
Apr 22, 2016, 1:29:42 AM4/22/16
to Developers
After thinking again about your comment, you're right. KeepLeft() is the same as remove(). Due the corresponding function keepRight() I have also added keepLeft().
What would be a proper "remove" name for keepRight()? Remove() uses the start index, keep...() uses a length, so there's a difference in meaning. Therefore you likely can't port it back to "remove...()". I suggest keeping both.

Victor Aprea

unread,
Apr 22, 2016, 9:38:11 AM4/22/16
to Arduino Developers
Hey all,

I'd like to follow up on Bill's comment, but I appreciate that this could get off topic. So I'll start by applauding and encouraging TK's proposed contributions that started the thread. I think many of us have at one point or another gone off and written very similar functions as necessary.

I think any doubts about the reliability of the String class probably stem from its use of dynamic memory allocation (i,e. realloc) would be my guess. While allocation failures are properly guarded within the library, when those failures occur in practice, most users don't account for that possibility at the application (sketch) level. This can give rise to interesting and unexpected runtime behavior to those who don't recognize the possibility that method calls can result in the object becoming invalidated by the library (e.g. due to memory fragmentation and so on). 

As Massimo and others have alluded, this is clearly less likely to occur on targets with an abundance of RAM. So yea, the library might be bulletproof (for some definition of that term), but writers of software can unquestionably abuse the library in spectacular albeit subtle ways, and subsequently blame the library rather than their use of it. Lets face it, using dynamic memory in unmanaged languages like C/C++ can be fraught with peril and nuance, and that's especially true on a resource constrained target. Perhaps a way to help here would be just to have a dedicated section on good error checking / handling practices in the documentation?

Kind Regards,
Vic

Victor Aprea // Wicked Device

--

William Westfield

unread,
Apr 22, 2016, 12:40:16 PM4/22/16
to devel...@arduino.cc
> I think any doubts about the reliability of the String class probably stem from its use of dynamic memory allocation (i,e. realloc) would be my guess.

Yes; I guess the complaints I’ve heard are all against the AVR implementation, which has completely different malloc() implementation and much less RAM than the ARM chips. It’s hard to say whether the bugs that people run into are their own fault, due to something in String, or due to something in malloc()/free(). I usually don’t get that far, since I immediately cringe when I see something like:

do {
myString += getchr();
} while (!myString.endsWith(‘\n’);

(How many realloc() operations does that do on a 50-character “line”? Might be 50. Might be a more reasonable 4 or 5. It just bugs me not to KNOW.)

BillW/WestfW


TK

unread,
Apr 22, 2016, 1:08:24 PM4/22/16
to Developers


Am Freitag, 22. April 2016 18:40:16 UTC+2 schrieb Bill Westfield:

(How many realloc() operations does that do on a 50-character “line”?  Might be 50.  Might be a more reasonable 4 or 5.  It just bugs me not to KNOW.)

You could get around all that reallocations if you could preset the size for the internal buffer. We'd simply have to overload the constructor of the class to solve at least the one problem you mentioned.
 I am gonna have a look at that later when I'm home from work.

Paul Stoffregen

unread,
Apr 22, 2016, 1:13:38 PM4/22/16
to devel...@arduino.cc
On 04/22/2016 10:08 AM, TK wrote:


Am Freitag, 22. April 2016 18:40:16 UTC+2 schrieb Bill Westfield:

(How many realloc() operations does that do on a 50-character “line”?  Might be 50.  Might be a more reasonable 4 or 5.  It just bugs me not to KNOW.)

You could get around all that reallocations if you could preset the size for the internal buffer.

We'd simply have to overload the constructor of the class to solve at least the one problem you mentioned.
 I am gonna have a look at that later when I'm home from work.

TK

unread,
Apr 22, 2016, 2:18:26 PM4/22/16
to Developers
yes, like this, but this function just re-allocates an existing buffer with new size. It should be part of the constructor, so that the first allocation already sets the buffer to a desired size.

Also, I think that the buffer should not be increased by the amount of needed extra space. It should be doubled always. This behaviour is also used in .net (C#) StringBuilder's internal buffer. It is always doubled, because the developers assumed that if a buffer already has a large size (due to large strings being added), this way of operation will continue and more large strings will be added, therefore a doubled buffer makes sense.
It may not make sense do double on MCU, so I suggest adding 25% or 50% of the current size if the buffer is exhausted.
If this is not desired, then a parameter could be added in the string class which explicitly enables this behaviour.

@victor.aprea: I agree that the documentation in many cases is not sufficient (e.g. the doc on EEPROM misses the length() function). It needs to become enhanced a lot. I will contribute, I just need to find some time (like all of us probably) and know how. And every page should contain a note:
"We try to keep this page updated, but due to continuous development the information here may be outdated. Please refer to the (header files of the) library itself for the most up-to-date information."

TK

unread,
Apr 22, 2016, 2:37:20 PM4/22/16
to Developers
off-topic: Why can't I edit my post? So I need to add another one:
on-topic:


Am Freitag, 22. April 2016 20:18:26 UTC+2 schrieb TK:
yes, like this, but this function just re-allocates an existing buffer with new size. It should be part of the constructor, so that the first allocation already sets the buffer to a desired size.
 
String has 13 different constructors. Overloading all of them just for buffer preallocation is not a sensible option.
Adding a public static member that holds the default buffer size will lead to very undesired behaviour and is not an option either.
So, up to now I have no idea how to achieve that.

 
Also, I think that the buffer should not be increased by the amount of needed extra space. It should be doubled always. This behaviour is also used in .net (C#) StringBuilder's internal buffer. It is always doubled, because the developers assumed that if a buffer already has a large size (due to large strings being added), this way of operation will continue and more large strings will be added, therefore a doubled buffer makes sense.
It may not make sense do double on MCU, so I suggest adding 25% or 50% of the current size if the buffer is exhausted.
If this is not desired, then a parameter could be added in the string class which explicitly enables this behaviour.
 
Here I think that we could add a public parameter to the String class that defines the percentage of reserved buffer size on each created string instance and for each resize.
By default this value is 0, unit is percent. If someone frequently concats Strings, he should increase this value to whatever he likes in order to reduce the number of buffer reallocations.
This is independent from the first idea and will help for sure, so I would definitely implement this.

TK

unread,
Apr 22, 2016, 5:38:58 PM4/22/16
to Developers

Here I think that we could add a public parameter to the String class that defines the percentage of reserved buffer size on each created string instance and for each resize.
By default this value is 0, unit is percent. If someone frequently concats Strings, he should increase this value to whatever he likes in order to reduce the number of buffer reallocations.
This is independent from the first idea and will help for sure, so I would definitely implement this.

setReservePercentage () added in my github branch of the Arduino master. Tested, working. However, I found a strange behaviour, which I don't think is related to this new functionality, but probably a bug in malloc. I have created a new post on that. Maybe this is something which @Bill Westfield  referred to.

 

Andrew Kroll

unread,
Apr 22, 2016, 6:07:02 PM4/22/16
to devel...@arduino.cc

I use my own malloc, which lacks 'issues' -- i actually include it to override the provided one on avr — mainly because it is ISR safe. It may also not have any of the bugs seen/triggered.

--

Matthew Ford

unread,
Apr 22, 2016, 7:26:03 PM4/22/16
to devel...@arduino.cc
I keep thinking of Arduino as micro programming for beginners.

As such I advise against C++ at all.  Although beginners still have to get over the wacky class void arg construction confusion.
e.g
pfodParser parser;   but not  pfodParser parser();
versus
pfodParser parser("1");

If the users don't write libraries they can avoid/ignore C++ for a long time.

As for strings, there is already an unavoidable problem with C strings running out of memory hence the
F( ) macro.

C++ String class adds another level of complexity a beginner could well do without along with anything else the does dynamic malloc and free.

matthew

William Westfield

unread,
Apr 22, 2016, 9:18:45 PM4/22/16
to devel...@arduino.cc
> String has 13 different constructors. Overloading all of them just for buffer preallocation is not a sensible option.

Hmm. Why not? Isn’t it essentially part of the C++ “style” to hide “ugliness” in standard classes instead of directly exposing the user? Unused code gets optimized away…


> Adding a public static member that holds the default buffer size will lead to very undesired behaviour and is not an option either.

Can’t you just add a default “size = 10” (or whatever) to each of the existing constructors?

C’s lack of string support is sort of embarrassing. But that’s a separate issue :-(

I see that “real” C++ String class is also lacking equivalents of the new suggested methods.
Are there standardized alternatives? matching others is good…

BillW/WestfW

TK

unread,
Apr 23, 2016, 3:56:59 AM4/23/16
to Developers
Am Samstag, 23. April 2016 03:18:45 UTC+2 schrieb Bill Westfield:
> String has 13 different constructors. Overloading all of them just for buffer preallocation is not a sensible option.

Hmm.  Why not?  Isn’t it essentially part of the C++ “style” to hide “ugliness” in standard classes instead of directly exposing the user?  Unused code gets optimized away…


> Adding a public static member that holds the default buffer size will lead to very undesired behaviour and is not an option either.

Can’t you just add a default “size = 10” (or whatever) to each of the existing constructors?

good idea! I totally forgot about the existence of default values. C# hasn't had that for a long time and C++CLI still doesn't, so I don't use it in my daily work.
In this way we can add the functionality to all existing constructors w/o overloading.
But I would use "size = 0" so that the default String behaviour persists.

Not an alternative here, but pretty much standardized: .net String, which I used as example for padding and trim functions.

TK

unread,
Apr 23, 2016, 9:24:40 AM4/23/16
to Developers
10 out of 13 String constructors use the = assignment operator, thus I cannot pass the bufferSize to the underlying copy().
Therefore I decided to create a member
unsigned int minCapacity
which is used in reserve() in the same way as the previously added reserverPercentage. I have intentionally NOT put it in changeBuffer(), so that the buffer still can be created smaller than minCapacity if needed.
changeBuffer(), which does the actual realloc, is used inside reserve(..) and replace (const String&, const String&).
In reserve(..) the adapted size is passed.
In replace(..) we could exchange the changeBuffer()-call by a call to reserve(), or we can keep it as is.

overview of latest changes:
- String constructors enhanced with parameter "unsigned int bufferSize = 0"
- void setReservePercentage (unsigned char percentage) added: on resizing, the buffer is created 'percentage' percent bigger than needed to avoid resizing by small values.
- static void setDefaultReservePercentage (unsigned char percentage); like setReservePercentage(), but as default for all new Strings and also on String object creation.
- variables added for these functions:
  unsigned int minCapacity;   // the minimum value of capacity
  unsigned char reservePercentage = 0;  // the percentage of reserved buffer when creating or resizing a String
  static unsigned char defaultReservePercentage;  // the default value of reservePercentage

All changes tested, no failures found. Ready for tests by other users.

half off-topic:
One more thing: I have replaced each tab by 2 spaces, so that the visible indentation is the same for all users. Some use spaces (like me), some use tabs, and we all use tab widths (or space equivalent) of 2, 4, or 8, or whatever. Meaning that the indentation has looked different for all of us due to a mix of tabs and spaces. Is there a guideline what settings to use in the coding software so that code will look the same for all of us? I haven't found anything, I am going to ask that in a new topic.
Reply all
Reply to author
Forward
0 new messages