non-continuous hash

Dr Eberhard W Lisse

unread,

Jul 3, 2022, 11:14:45 AM7/3/22

to

Having something like:

[...]
my $statement = qq/
SELECT exchangerate, transdate
FROM exchangerate
ORDER BY transdate
/;
$statement = $TABLE->prepare ($statement);
$statement->execute () or die $TABLE->errstr;
my %exchangrate;
while (@row = $statement->fetchrow_array) {
$exchangrate{$row[1]} = $row[0];
}
foreach $key ( sort (keys(%exchangrate))) {
print "$key: $exchangrate{$key}\n"
}
[...]

results in something like

[...]
2022-06-19: 0.855
2022-06-21: 0.8601
2022-06-22: 0.8589
2022-06-23: 0.8582
2022-06-29: 0.8646
[...]

which is non-continuous mainly because of weekends and so on.

How do I fill up the empty key-value pairs by way of the last existing
pair?

Ie in this example, the value for 2022-06-20 should also be 0.855, and
the values for 2022-06-24/5/6/7/8 should all be 0.8582.

Even cooler would be to take the "closes value, ie 2022-06-24/5 should
have the (same) value from 2022-06-23 and 2022-06-27/8 should get the
ones from 2022-06-29. But that's not a requirement,

greetings, el

JLM

unread,

Jul 3, 2022, 1:30:38 PM7/3/22

to

El 3/7/22 a las 17:14, Dr Eberhard W Lisse escribió:

>   foreach $key ( sort (keys(%exchangrate))) {
>         print "$key: $exchangrate{$key}\n"
>     }
>     [...]
>
> results in something like
>
>     [...]
>     2022-06-19: 0.855
>     2022-06-21: 0.8601
>     2022-06-22: 0.8589
>     2022-06-23: 0.8582
>     2022-06-29: 0.8646
>     [...]
>
> which is non-continuous mainly because of weekends and so on.
>
>
> How do I fill up the empty key-value pairs by way of the last existing
> pair?
>
> Ie in this example, the value for 2022-06-20 should also be 0.855, and
> the values for 2022-06-24/5/6/7/8 should all be 0.8582.
>

for $key ( sort keys %exchangerate ) {
if ($key =~ /(\d+)\-(\d+)\-(\d+)/) {
my $date = $3;
if ($date <= 30){
my $c = 1
until (exists $exchagerate{"$1-$2-".$date+$c}){
$exchangerate{"$1-$2-".$date+$c} =$exchangerate{$key};
$c++;
}
}
}
}

Try this. Untested.

--
http://gamo.sdf-eu.org/index.html # ?
World peace, no less!

JLM

unread,

Jul 5, 2022, 1:12:55 AM7/5/22

to

El 3/7/22 a las 19:30, JLM escribió:

> El 3/7/22 a las 17:14, Dr Eberhard W Lisse escribió:
>>    foreach $key ( sort (keys(%exchangrate))) {
>>          print "$key: $exchangrate{$key}\n"
>>      }
>>      [...]
>>
>> results in something like
>>
>>      [...]
>>      2022-06-19: 0.855
>>      2022-06-21: 0.8601
>>      2022-06-22: 0.8589
>>      2022-06-23: 0.8582
>>      2022-06-29: 0.8646
>>      [...]
>>
>> which is non-continuous mainly because of weekends and so on.
>>
>>
>> How do I fill up the empty key-value pairs by way of the last existing
>> pair?
>>
>> Ie in this example, the value for 2022-06-20 should also be 0.855, and
>> the values for 2022-06-24/5/6/7/8 should all be 0.8582.
>>
>
> for $key ( sort keys %exchangerate ) {
>       if ($key =~ /(\d+)\-(\d+)\-(\d+)/) {
>           my $date = $3;
>           if ($date <= 30){
>             my $c = 1

$c = spritf "%02d", $date+$c; # let's try to fix things

            until (exists $exchagerate{"$1-$2-$c"}){
$exchangerate{"$1-$2-$c"} =$exchangerate{$key};
>                 $c++;

$c = sprintf "%02d", $c; #

>             }
>            }
>       }
> }
>
> Try this. Untested.
>

Still untested. Performance issues I suppose are not a case.

JLM

unread,

Jul 5, 2022, 7:22:18 AM7/5/22

to

El 5/7/22 a las 7:12, JLM escribió:

>> for $key ( sort keys %exchangerate ) {
>>        if ($key =~ /(\d+)\-(\d+)\-(\d+)/) {
>>            my $date = $3;
>>            if ($date <= 30){
>>              my $c = 1
>
> $c = spritf "%02d", $date+$c;   # let's try to fix things
>
>             until (exists $exchagerate{"$1-$2-$c"}){
>          $exchangerate{"$1-$2-$c"} =$exchangerate{$key};
>>                  $c++;

last if ($c > 31);

>
> $c = sprintf "%02d", $c;       #
>
>>              }
>>             }
>>        }
>> }
>>
>> Try this. Untested.
>>

A question arises here. For how many days have sense to have
no data? Is it possible to happen in the edge of one month
to another? Then, similar arrangements should be done to the
number of month, and so on.

Best.

Bob Nichols

unread,

Jul 5, 2022, 9:00:48 AM7/5/22

to

#!/usr/bin/perl
use strict;
use warnings;
use POSIX qw/mktime strftime/;

# This is just to get some values into %Rate for the demo:
# You should populate %Rate properly with your database
# selection code (as %exchangrate in your post).
my %Rate = (
'2022-06-19' => 0.855,
'2022-06-21' => 0.8601,
'2022-06-22' => 0.8589,
'2022-06-23' => 0.8582,
'2022-06-29' => 0.8646,
);
my @Dates = sort keys %Rate;

sub timestamp {
# Convert an ISO8601 date string into a UNIX timestamp
$_[0] =~ /^(\d\d\d\d)-?(\d\d)-?(\d\d)$/a;
return mktime(0, 0, 12, $3+0, $2-1, $1-1900);
}

sub iso8601 {
# Convert a UNIX timestamp into an ISO8601 date string
return strftime('%F', localtime($_[0]));
}

sub previous_date {
my $date = $_[0];
return $date if exists $Rate{$date};
return (grep $_ lt $date, @Dates)[-1];
}

sub nearest_date {
my $date = $_[0];
return $date if exists $Rate{$date};
my $prev_date = (grep $_ lt $date, @Dates)[-1];
my $next_date = (grep $_ gt $date, @Dates)[0];
my $prev_gap = timestamp($date) - timestamp($prev_date);
my $next_gap = timestamp($next_date) - timestamp($date);
return $prev_gap <= $next_gap ? $prev_date : $next_date;
}

# Now we have all we need to add missing values to %Rate.

printf "Original %%Rate contains %d values:\n",
scalar keys %Rate;
printf "$_\t$Rate{$_}\n" foreach sort keys %Rate;

# Iterate through every date between the first value and
# the last value in @Dates:
for (
my $date = $Dates[0];
$date le $Dates[-1];
$date = iso8601( timestamp($date)+86400 )
) {
next if exists $Rate{$date};
# If you want a use the previous date for a missing value:
my $previous = previous_date($date);
$Rate{$date} = $Rate{$previous};
# or, if you prefer the nearest date instead:
my $nearest = nearest_date($date);
$Rate{$date} = $Rate{$nearest};
}

printf "\nFilled %%Rate contains %d values:\n",
scalar keys %Rate;
printf "$_\t$Rate{$_}\n" foreach sort keys %Rate;

Dr Eberhard Lisse

unread,

Jul 5, 2022, 10:28:59 AM7/5/22

to

Bob

Thanks, that looks very cool :-)-O

On 05/07/2022 15:00, Bob Nichols wrote:
> #!/usr/bin/perl
[...]

JLM,

Thanks.

It's an exchange rate coming from

http://www.ecb.europa.eu/stats/policy_and_exchange_rates/euro_reference_exchange_rates/html/index.en.html

That implies it doesn't work on the weekends and sometimes my system is
off.

Exactness doesn't really matter for the exchange rate differences in the
book keeping system where this goes, as they are written off once the
financial year has closed and have no tex implications.

Currently I look up the exchange rate for each transaction (between 2
and 100, last week, last month up to the whole of a calendar year) with
an SQL statement which gives me the last one before the date if the date
doesn't have one.

This is reasonably fast, but sending a single SQL statement from Perl
and then doing that with a hash would be obviously faster and some fun
figuring out

Bob's solution seems to be putting me onto the right track, anyway.

greetings, el

On 05/07/2022 13:22, JLM wrote:
[..]

> A question arises here. For how many days have sense to have
> no data? Is it possible to happen in the edge of one month
> to another? Then, similar arrangements should be done to the
> number of month, and so on.

[...]
--
To email me replace 'nospam' with 'el'

Rainer Weikusat

unread,

Jul 5, 2022, 12:29:40 PM7/5/22

to

Dr Eberhard W Lisse <nos...@lisse.NA> writes:

[...]

> [...]
> 2022-06-19: 0.855
> 2022-06-21: 0.8601
> 2022-06-22: 0.8589
> 2022-06-23: 0.8582
> 2022-06-29: 0.8646
> [...]
>
> which is non-continuous mainly because of weekends and so on.
>
>
> How do I fill up the empty key-value pairs by way of the last existing
> pair?

Not overly complicated way of doing that:

------
#* date generator
#
my @m_lens = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);

sub m_len
{
my ($y, $m) = @_;
my $ml;

$ml = $m_lens[$m - 1];
++$ml if $ml < 30 && !($y & 3) && !($y % 100 == 0 && $y % 400);
return $ml;
}

sub date_counter
{
my ($y, $m, $d) = split('-', $_[0]);

return sub {
++$d;

if ($d > m_len($y, $m)) {
$d = 1;

if ($m < 12) {
++$m;
} else {
$m = 1;
++$y;
}
}

return sprintf('%u-%02u-%02u', $y, $m, $d);
}
}

#* sample input data
#
my %data = qw(2022-06-19 0.855

2022-06-21 0.8601
2022-06-22 0.8589
2022-06-23 0.8582

2022-06-29 0.8646);

#* main
#
my @dates = sort(keys(%data));
my $dc = date_counter($dates[0]);
my ($cur_v, $n_d);

$cur_v = $data{$dates[0]};

for (@dates[1 .. $#dates]) {
$data{$n_d} = $cur_v while ($n_d = $dc->()) ne $_;
$cur_v = $data{$_};
}

print($_, "\t", $data{$_}, "\n") for sort(keys(%data));

Rainer Weikusat

unread,

Jul 5, 2022, 1:33:41 PM7/5/22

to

Bob Nichols <Bob.N...@ulex.org.invalid> writes:
> # This is just to get some values into %Rate for the demo:
> # You should populate %Rate properly with your database
> # selection code (as %exchangrate in your post).
> my %Rate = (
> '2022-06-19' => 0.855,
> '2022-06-21' => 0.8601,
> '2022-06-22' => 0.8589,
> '2022-06-23' => 0.8582,
> '2022-06-29' => 0.8646,
> );
> my @Dates = sort keys %Rate;

[...]

> sub previous_date {
> my $date = $_[0];
> return $date if exists $Rate{$date};
> return (grep $_ lt $date, @Dates)[-1];
> }
>

[...]

> # Iterate through every date between the first value and
> # the last value in @Dates:
> for (
> my $date = $Dates[0];
> $date le $Dates[-1];
> $date = iso8601( timestamp($date)+86400 )
> ) {
> next if exists $Rate{$date};

> my $previous = previous_date($date);
> $Rate{$date} = $Rate{$previous};

[...]

> }

With doing a benchmark, the exact effects of this can't be determined
(and they might not matter much, anyway) but this is a bad algorithm: It
iterates over an array of a certain length and scans the complete array
once for each iteration, hence, it's running time is proportional to the
square of the array length (ie, the algorithm is of quadratic
complexity).

Dr Eberhard Lisse

unread,

Jul 6, 2022, 8:18:10 AM7/6/22

to

Rainer,

thank you for your code (in a separate message).

I have modified both to produce identical output (4489 "completed' pairs
from 3897 returned by my SQL statement and then a few identical lines
to measure the time the actual code (without the SQL) takes

Your's takes indeed around 6 milliseconds whereas Bob's takes some 136
milliseconds (on my iMac) without the (print(f) statement(s)).

In practical terms (of my use case) this does not matter as you
suspected quite correctly, because the PostgreSQL server sits on another
continent and hence the execution time of the SQL statement to populate
the hash is the limiting factor, but I will of course now use your's as
speed was the aim of the exercise.

Learned something here.

Thanks, el

On 05/07/2022 19:33, Rainer Weikusat wrote:

> Bob Nichols <Bob.N...@ulex.org.invalid> writes:
[...]
> With doing a benchmark, the exact effects of this can't be determined
> (and they might not matter much, anyway) but this is a bad algorithm:
> It iterates over an array of a certain length and scans the complete
> array once for each iteration, hence, it's running time is
> proportional to the square of the array length (ie, the algorithm is
> of quadratic complexity).

On 05/07/2022 18:29, Rainer Weikusat wrote:
[...]

> Not overly complicated way of doing that:

Bo Lindbergh

unread,

Jul 12, 2022, 5:25:38 AM7/12/22

to

In article <jij055...@mid.individual.net>,

Dr Eberhard Lisse <nos...@lisse.NA> wrote:
> Currently I look up the exchange rate for each transaction (between 2
> and 100, last week, last month up to the whole of a calendar year) with
> an SQL statement which gives me the last one before the date if the date
> doesn't have one.
>
> This is reasonably fast, but sending a single SQL statement from Perl
> and then doing that with a hash would be obviously faster and some fun
> figuring out

You could use a temporary SQLite database to cache an appropriate range
of data from the remote server. Looking up individual dates in this cache
would be very fast, possibly faster than any fill-in-the-holes code written
in Perl.

/Bo Lindbergh

Rainer Weikusat

unread,

Jul 12, 2022, 2:44:04 PM7/12/22

to

Dr Eberhard Lisse <nos...@lisse.NA> writes:
> I have modified both to produce identical output (4489 "completed' pairs
> from 3897 returned by my SQL statement and then a few identical lines
> to measure the time the actual code (without the SQL) takes
>
> Your's takes indeed around 6 milliseconds whereas Bob's takes some 136
> milliseconds (on my iMac) without the (print(f) statement(s)).

That's interesting. I actually expected the opposite result. While I was
using a linear algorithm, the implementation (with a date enumeration
closure) wasn't exactly low-overhead and did a lot of stuff in
Perl. This usually means that the theoretical scalability advantage
doesn't turn into a pratical advantage unless the amount of input data
is insanely large.

Dr Eberhard W Lisse

unread,

Jul 15, 2022, 2:49:30 AM7/15/22

to

Indeed, the limiting factor is the SQL Query (the server is on another
continent). And at the most 365 rates per annum doesn't bother
PostgreSQL nor Perl :-)-O

Hmm maybe I should measure on my M1 :-)-O

el

> advantage doesn't turn into a practical advantage unless the amount of

Dr Eberhard Lisse

unread,

Jul 15, 2022, 5:01:24 AM7/15/22

to

Thanks, but besides being more complicated I really doubt that creating
a (temp) SQLite3 table, writing to it and then reading from it before
even doing any comparison will be faster that Rainer's fill-in-the-hole.

el

Dr Eberhard Lisse

unread,

Jul 15, 2022, 5:01:30 AM7/15/22

to

Thanks, but besides being more complicated I really doubt that creating
a (temp) SQLite3 table, writing to it and then reading from it before
even doing any comparison will be faster that Rainer's fill-in-the-hole.

el

On 12/07/2022 11:29, Bo Lindbergh wrote: