Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion data cleansing: externally or internally?
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
joel garry  
View profile  
 More options Nov 14 2011, 12:43 pm
Newsgroups: comp.databases.oracle.misc
From: joel garry <joel-ga...@home.com>
Date: Mon, 14 Nov 2011 09:43:42 -0800 (PST)
Local: Mon, Nov 14 2011 12:43 pm
Subject: Re: data cleansing: externally or internally?
On Nov 11, 12:51 pm, Robert Klemme <shortcut...@googlemail.com> wrote:

> On 11/10/2011 02:48 PM, Ramon F Herrera wrote:

> > On Nov 4, 12:51 am, geos<g...@nowhere.invalid>  wrote:
> >> there is a big text file with dirty data. a company wants it to be
> >> clean. there are some known patterns expressed as like or regexp. I
> >> first thought about two approaches:
> >> 1) do this on the system level
> >> 2) or in a database
> >> for the latter case it looks to me that I could use external tables or
> >> load data into temporary table and then do the cleaning.

> >> I am looking for pros and cons of each variant. my intuition tells me
> >> that loading into temporary table would give the most flexibility but
> >> also take additional space. I am not sure about the other methods. I
> >> would appreciate your opinion about what I should pay attention to when
> >> choosing the other methods. how are they restricted in terms of
> >> performance, flexibility and capabilities (eg. multitable loading)? I am
> >> also interested in good practices and your experience in similar cases
> >> you can share.

> You still did not disclose the type of processing you want to do.
> Without that information advice cannot be targeted at your scenario.

> > After more than a decade of experience my advice to you is: Use Oracle
> > as little as possible.

> > I wrote all my business logic in C/C++ making calls to the database
> > only as needed, and now my applications run much, much, much, much
> > faster. Not to mention the improved development and debug (can use a
> > debugger, not sure whether Oracle has something similar).

> > In essence, the only commands that I run in the database are basic
> > ones such as SELECT and UPDATE. No IFs or BUTs.

> This cannot be generalized as advice!  You do not even mention the type
> of application(s) you are talking about.  What may work good for the
> application types you work on may not work at all for other application
> types.

> Kind regards

>         robert

I agree.  All the answers added together are pretty good advice, I
could have qualified mine better with something about what database
limitations I was referring to.

When I first switched from VMS to the unix world, I took over
maintenance on a system that was like Ramon described, mostly because
of the limitations of the db engine (Unify, things like not being able
to join 5 tables).  That is when I came to the conclusion that
whatever else you like about C, using it directly as an application
language is just a huge mistake.  I can't judge C++, as I never
learned any good theoretical foundation for it (which could be my
shortcoming, or not).  The only rational rationalization for these
languages is the large number of CS students that get trained in them,
and that could be a mistake too.  And these mistakes keep getting made
over and over.  People tend to favor what they know, and favor new
code or "paradigms" over long term maintenance.

jg
--
@home.com is bogus.
Talk about dirty data left behind...
http://www.signonsandiego.com/news/2011/nov/14/buyer-of-minivan-finds...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.