Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

how to test a string for a valid date in sql

3 views
Skip to first unread message

Jaap W. van Dijk

unread,
Dec 11, 2009, 5:01:07 AM12/11/09
to
I'm loading millions of records in a transformation step in a
datawarehouse with an INSERT INTO ... SELECT ... FROM ... Part of the
transformation is converting a date-string in a VARCHAR2 column into a
date in a DATE column, something like

CASE WHEN date-string is valid THEN TO_DATE(date-string) ELSE NULL
END.

Is it possible to test a string for a valid date in Oracle 9i/10g/11g
without resorting to PL/SQL? If I use a homemade PL/SQL function, this
is performed once for each record, which increases the load time
considerably due to the millions of SQL - PL/SQL context switches back
and forth.

Regards,
Jaap.

yossarian

unread,
Dec 11, 2009, 5:05:22 AM12/11/09
to

case when regexp_like(date_string,format_regexp) then
to_date(date_string,format_mask) else null

Jaap W. van Dijk

unread,
Dec 11, 2009, 5:25:05 AM12/11/09
to

Is it possible to test with format_regexp if ('31112009', 'ddmmyyyy')
is a valid date, or ('29022000', 'ddmmyyyy')?

Regards,
Jaap.

yossarian

unread,
Dec 11, 2009, 5:30:03 AM12/11/09
to
Jaap W. van Dijk wrote:

> Is it possible to test with format_regexp if ('31112009', 'ddmmyyyy')
> is a valid date, or ('29022000', 'ddmmyyyy')?

http://www.regular-expressions.info/dates.html

Kay Kanekowski

unread,
Dec 11, 2009, 9:01:27 AM12/11/09
to

Hi Jaap,
imho you need some program logic for testing if a varchar is an
existing date. The example of yossarian shows you the way in perl.

What about an extra table with all days from the last 200 years ? That
will be 70.000 entries. One varchar colum and the corresponding date
column. Then you can outer join your table against it and you get a
real date or not.

hth
Kay

yossarian

unread,
Dec 11, 2009, 9:04:37 AM12/11/09
to
Kay Kanekowski wrote:

> The example of yossarian shows you the way in perl.

perl?

Jaap W. van Dijk

unread,
Dec 11, 2009, 9:50:45 AM12/11/09
to

Thanks for the pointer. I will use a case statement similar to the
perl alternative on this page. It is less complicated than I thought
it would be.

Regards,
Jaap.

Kay Kanekowski

unread,
Dec 11, 2009, 11:24:06 AM12/11/09
to
yossarian schrieb:

> Kay Kanekowski wrote:
>
>> The example of yossarian shows you the way in perl.
>
> perl?

ok, the example in the link. i suppose it is perl.

sub isvaliddate {
my $input = shift;
if ($input =~ m!^((?:19|20)\d\d)[- /.](0[1-9]|1[012])[-
/.](0[1-9]|[12][0-9]|3[01])$!) {
# At this point, $1 holds the year, $2 the month and $3 the day of
the date entered
if ($3 == 31 and ($2 == 4 or $2 == 6 or $2 == 9 or $2 == 11)) {
return 0; # 31st of a month with 30 days
} elsif ($3 >= 30 and $2 == 2) {
return 0; # February 30th or 31st
} elsif ($2 == 2 and $3 == 29 and not ($1 % 4 == 0 and ($1 % 100 !=
0 or $1 % 400 == 0))) {
return 0; # February 29th outside a leap year
} else {
return 1; # Valid date
}
} else {
return 0; # Not a date
}
}

Kay

Gerard H. Pille

unread,
Dec 12, 2009, 8:16:55 AM12/12/09
to
Jaap W. van Dijk schreef:

Zoals toen we jong waren:

select
ds.*,
case
when maand in (1,3,5,7,8,10,12) and dag between 1 and 31
or maand in (4,6,9,11) and dag between 1 and 30
or maand = 2 and dag between 1 and 28
or maand = 2 and dag = 29
and mod(jaar,4) = 0 and (mod(jaar,100) != 0 or mod(jaar,400) = 0)
then to_date(dag||maand||jaar,'DDMMYYYY')
else to_date('01011999','DDMMYYYY')
end
from (
select
substr(datum,1,2) dag,
substr(datum,3,2) maand,
substr(datum,5,4) jaar
from (
select
case when regexp_like(d.datum,'[0-9]{8}')
then d.datum
else '31122009'
end datum
from (
select '29022009' datum from dual
union all
select '31042009' datum from dual
union all
select '29022008' datum from dual
union all
select '29022000' datum from dual
union all
select '29021900' datum from dual
union all
select '00022000' datum from dual
union all
select '01022009' datum from dual
) d
)
) ds
/

Shakespeare

unread,
Dec 13, 2009, 4:58:52 AM12/13/09
to
Jaap W. van Dijk schreef:

How about checking if to_char(to_date(varcharstring)) = varcharstring?
(Did not test it, just an idea)

Shakespeare

Gerard H. Pille

unread,
Dec 13, 2009, 6:02:50 AM12/13/09
to
Shakespeare wrote:
>
> How about checking if to_char(to_date(varcharstring)) = varcharstring?
> (Did not test it, just an idea)
>
> Shakespeare


Better stick to plays and sonnets, S.

Robert Klemme

unread,
Dec 13, 2009, 6:36:47 AM12/13/09
to

Did you actually measure that or is that an assumption? Not sure
whether the regexp approach does suffer from the very same problem.

Here's another sketch of an idea for the case that you actually have
much less dates than rows: load all the dates with DISTINCT into a table
with two columns: original string, converted date. Do an UPDATE using
the custom PL/SQL function and finally do the import with a join against
that table. Maybe you can even pull that off with a WITH clause. Of
course, there are many variables in there so the approach might not be
feasible at all in your case (e.g. index on the original text column helps).

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Gerard H. Pille

unread,
Dec 13, 2009, 8:21:50 AM12/13/09
to

Context switches are very measurable, as is the number of times a stored procedure or function
is called. A table lookup will never beat a couple of lines of sql "logic".

Robert Klemme

unread,
Dec 13, 2009, 9:39:46 AM12/13/09
to
On 13.12.2009 14:21, Gerard H. Pille wrote:
> Robert Klemme wrote:
>> On 11.12.2009 11:01, Jaap W. van Dijk wrote:
>>> I'm loading millions of records in a transformation step in a
>>> datawarehouse with an INSERT INTO ... SELECT ... FROM ... Part of the
>>> transformation is converting a date-string in a VARCHAR2 column into a
>>> date in a DATE column, something like
>>>
>>> CASE WHEN date-string is valid THEN TO_DATE(date-string) ELSE NULL
>>> END.
>>>
>>> Is it possible to test a string for a valid date in Oracle 9i/10g/11g
>>> without resorting to PL/SQL? If I use a homemade PL/SQL function, this
>>> is performed once for each record, which increases the load time
>>> considerably due to the millions of SQL - PL/SQL context switches back
>>> and forth.
>>
>> Did you actually measure that or is that an assumption? Not sure whether
>> the regexp approach does suffer from the very same problem.
>>
>> Here's another sketch of an idea for the case that you actually have
>> much less dates than rows: load all the dates with DISTINCT into a table
>> with two columns: original string, converted date. Do an UPDATE using
>> the custom PL/SQL function and finally do the import with a join against
>> that table. Maybe you can even pull that off with a WITH clause. Of
>> course, there are many variables in there so the approach might not be
>> feasible at all in your case (e.g. index on the original text column
>> helps).
>
> Context switches are very measurable, as is the number of times a stored
> procedure or function is called.

Btw I did not ask whether they are measurable as such but whether OP had
_verified_ that these context switches are actually an issue in his
situation. While a context switch does have a price it may turn out
that other approaches do have higher prices in particular situations.
The only way to _know_ it is usually to test and measure.

> A table lookup will never beat a
> couple of lines of sql "logic".

When it comes to using words like "never" and "always" I am very
cautious. Next someone might turn up with an example where a table
lookup beats what you call "SQL logic" by an order of magnitude.

Cheers

Gerard H. Pille

unread,
Dec 13, 2009, 9:50:57 AM12/13/09
to
Robert Klemme wrote:
>> A table lookup will never beat a couple of lines of sql "logic".
>
> When it comes to using words like "never" and "always" I am very
> cautious. Next someone might turn up with an example where a table
> lookup beats what you call "SQL logic" by an order of magnitude.
>
> Cheers
>
> robert
>

I'd love that, to get to know the feeling to be wrong for once.

;-)


But you are right to be cautious.

joel garry

unread,
Dec 14, 2009, 12:14:37 PM12/14/09
to
On Dec 13, 6:50 am, "Gerard H. Pille" <g...@skynet.be> wrote:
> Robert Klemme wrote:
> >> A table lookup will never beat a couple of lines of sql "logic".
>
> > When it comes to using words like "never" and "always" I am very
> > cautious. Next someone might turn up with an example where a table
> > lookup beats what you call "SQL logic" by an order of magnitude.

Well, you can get scalability problems which can arise, for example,
Jonathan shows how at some data volume sorts can spill to disk and
slow things down: http://jonathanlewis.wordpress.com/2009/09/07/analytic-agony/
Of course, doing a four-pass operation when Oracle should have done an
optimal sort was eventually considered a bug, but before it was
considered a bug, it would quite possibly be a candidate for table
lookup coding to beat it.

I think this might generalize to a postulate that at "normal" memory
settings, there's going to be some situation where there's enough
complexity in a join aggregation forces multipath sorts that could be
avoided by table lookup coding. "Normal" is hard to define since part
of tuning is changing memory settings, so I won't go there, but I hope
I'm getting across that scaling volume and complexity can lead to all
"sorts" of examples.

None of this is to say analytics don't rock and roll, just that it is
important to delineate any limitations. One limitation is that it is
putting old 3GL style ordered control break logic into a non-
procedural language, possibly obscuring other implicit limitations.

>
> > Cheers
>
> > robert
>
> I'd love that, to get to know the feeling to be wrong for once.

On usenet, you don't need to be wrong to get the feeling! :-)

>
> ;-)
>
> But you are right to be cautious.

jg
--
@home.com is bogus.
Why don't we ever see SP2-00750?

Jaap W. van Dijk

unread,
Dec 14, 2009, 1:02:48 PM12/14/09
to

I replaced the function with the same logic in a case statement. Tje
function and therefore the case statement are called more than once
for each record. I processed some 9 million records. Due to the change
the runtime dropped with a factor of 2.8!

Regards,
Jaap.

0 new messages