Average of two strings

171 views
Skip to first unread message

jfcg...@gmail.com

unread,
Nov 1, 2021, 1:47:03 PM11/1/21
to golang-nuts
Hi,

I wrote MeanStr(s1,s2) to calculate lexicographic average of two strings. It works fine with ascii strings but I want to get feedback on general (utf-8) inputs. It should satisfy:
  • a good average of two inputs (in some context): For most/all s (in that context) with s1 < s < s2, s is supposed to be < MeanStr(s1,s2) about half the time
  • MeanStr(s1,s2) = MeanStr(s2,s1)
  • For s1 < s2s1 <= MeanStr(s1,s2) < s2
  • MeanStr(s,s) = s for all s
Can you take a look at the function and try to find (utf-8 or not) inputs that break these requirements?

Thanks..

Note: Ordering is regular <, not any fancy unicode order. The function considers the first 31 bytes.
Note: I am planning to switch from median-of-2n+1 pivot calculation to (arithmetic) median-of-2n in sorty, which is cheaper to calculate and better pivot (smaller variance than median-of-2n+1)

jfcg...@gmail.com

unread,
Nov 3, 2021, 7:43:58 AM11/3/21
to golang-nuts
Hi,

I thought averaging strings was the non-trivial case. I knew (x+y)/2 for averaging integers would have overflow problems and not work for all inputs. It turns out there is a short but non-trivial expression for averaging signed/unsigned integers for 'all' inputs. I've also added Mean*() for 32/64 bit signed/unsigned integers with comprehensive tests to sixb v1.3.0.

Let me know if you identify an input that breaks any of these functions.
Cheers..
Reply all
Reply to author
Forward
0 new messages