CASE WHEN date-string is valid THEN TO_DATE(date-string) ELSE NULL
END.
Is it possible to test a string for a valid date in Oracle 9i/10g/11g
without resorting to PL/SQL? If I use a homemade PL/SQL function, this
is performed once for each record, which increases the load time
considerably due to the millions of SQL - PL/SQL context switches back
and forth.
Regards,
Jaap.
case when regexp_like(date_string,format_regexp) then
to_date(date_string,format_mask) else null
Is it possible to test with format_regexp if ('31112009', 'ddmmyyyy')
is a valid date, or ('29022000', 'ddmmyyyy')?
Regards,
Jaap.
> Is it possible to test with format_regexp if ('31112009', 'ddmmyyyy')
> is a valid date, or ('29022000', 'ddmmyyyy')?
Hi Jaap,
imho you need some program logic for testing if a varchar is an
existing date. The example of yossarian shows you the way in perl.
What about an extra table with all days from the last 200 years ? That
will be 70.000 entries. One varchar colum and the corresponding date
column. Then you can outer join your table against it and you get a
real date or not.
hth
Kay
> The example of yossarian shows you the way in perl.
perl?
Thanks for the pointer. I will use a case statement similar to the
perl alternative on this page. It is less complicated than I thought
it would be.
Regards,
Jaap.
ok, the example in the link. i suppose it is perl.
sub isvaliddate {
my $input = shift;
if ($input =~ m!^((?:19|20)\d\d)[- /.](0[1-9]|1[012])[-
/.](0[1-9]|[12][0-9]|3[01])$!) {
# At this point, $1 holds the year, $2 the month and $3 the day of
the date entered
if ($3 == 31 and ($2 == 4 or $2 == 6 or $2 == 9 or $2 == 11)) {
return 0; # 31st of a month with 30 days
} elsif ($3 >= 30 and $2 == 2) {
return 0; # February 30th or 31st
} elsif ($2 == 2 and $3 == 29 and not ($1 % 4 == 0 and ($1 % 100 !=
0 or $1 % 400 == 0))) {
return 0; # February 29th outside a leap year
} else {
return 1; # Valid date
}
} else {
return 0; # Not a date
}
}
Kay
Zoals toen we jong waren:
select
ds.*,
case
when maand in (1,3,5,7,8,10,12) and dag between 1 and 31
or maand in (4,6,9,11) and dag between 1 and 30
or maand = 2 and dag between 1 and 28
or maand = 2 and dag = 29
and mod(jaar,4) = 0 and (mod(jaar,100) != 0 or mod(jaar,400) = 0)
then to_date(dag||maand||jaar,'DDMMYYYY')
else to_date('01011999','DDMMYYYY')
end
from (
select
substr(datum,1,2) dag,
substr(datum,3,2) maand,
substr(datum,5,4) jaar
from (
select
case when regexp_like(d.datum,'[0-9]{8}')
then d.datum
else '31122009'
end datum
from (
select '29022009' datum from dual
union all
select '31042009' datum from dual
union all
select '29022008' datum from dual
union all
select '29022000' datum from dual
union all
select '29021900' datum from dual
union all
select '00022000' datum from dual
union all
select '01022009' datum from dual
) d
)
) ds
/
How about checking if to_char(to_date(varcharstring)) = varcharstring?
(Did not test it, just an idea)
Shakespeare
Better stick to plays and sonnets, S.
Did you actually measure that or is that an assumption? Not sure
whether the regexp approach does suffer from the very same problem.
Here's another sketch of an idea for the case that you actually have
much less dates than rows: load all the dates with DISTINCT into a table
with two columns: original string, converted date. Do an UPDATE using
the custom PL/SQL function and finally do the import with a join against
that table. Maybe you can even pull that off with a WITH clause. Of
course, there are many variables in there so the approach might not be
feasible at all in your case (e.g. index on the original text column helps).
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
Context switches are very measurable, as is the number of times a stored procedure or function
is called. A table lookup will never beat a couple of lines of sql "logic".
Btw I did not ask whether they are measurable as such but whether OP had
_verified_ that these context switches are actually an issue in his
situation. While a context switch does have a price it may turn out
that other approaches do have higher prices in particular situations.
The only way to _know_ it is usually to test and measure.
> A table lookup will never beat a
> couple of lines of sql "logic".
When it comes to using words like "never" and "always" I am very
cautious. Next someone might turn up with an example where a table
lookup beats what you call "SQL logic" by an order of magnitude.
Cheers
I'd love that, to get to know the feeling to be wrong for once.
;-)
But you are right to be cautious.
Well, you can get scalability problems which can arise, for example,
Jonathan shows how at some data volume sorts can spill to disk and
slow things down: http://jonathanlewis.wordpress.com/2009/09/07/analytic-agony/
Of course, doing a four-pass operation when Oracle should have done an
optimal sort was eventually considered a bug, but before it was
considered a bug, it would quite possibly be a candidate for table
lookup coding to beat it.
I think this might generalize to a postulate that at "normal" memory
settings, there's going to be some situation where there's enough
complexity in a join aggregation forces multipath sorts that could be
avoided by table lookup coding. "Normal" is hard to define since part
of tuning is changing memory settings, so I won't go there, but I hope
I'm getting across that scaling volume and complexity can lead to all
"sorts" of examples.
None of this is to say analytics don't rock and roll, just that it is
important to delineate any limitations. One limitation is that it is
putting old 3GL style ordered control break logic into a non-
procedural language, possibly obscuring other implicit limitations.
>
> > Cheers
>
> > robert
>
> I'd love that, to get to know the feeling to be wrong for once.
On usenet, you don't need to be wrong to get the feeling! :-)
>
> ;-)
>
> But you are right to be cautious.
jg
--
@home.com is bogus.
Why don't we ever see SP2-00750?
I replaced the function with the same logic in a case statement. Tje
function and therefore the case statement are called more than once
for each record. I processed some 9 million records. Due to the change
the runtime dropped with a factor of 2.8!
Regards,
Jaap.