Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Filling in "gaps" in data

34 views
Skip to first unread message

Jeremy

unread,
Oct 31, 2008, 10:45:59 AM10/31/08
to
Oracle 10g R2 10.2.0.1.0

Hi

I posted (couple of weeks ago) on this (as a sub-thread) but wanted to
ask it here with some clarity.

Say we have a table with date/time related data

create table hits
(date_time date,
ip_address varchar2(240));

And our data look like this:

date_time ip_address
--------- ----------
31-oct-2008 13:08 192.168.0.1
31-oct-2008 13:08 192.168.0.1
31-oct-2008 13:11 192.168.0.1
31-oct-2008 13:12 192.168.0.1
31-oct-2008 13:15 192.168.0.1
31-oct-2008 13:15 192.168.0.1

I want to produce a report that shows the number of hits per minute but
include the "gaps" where there is no data recorded.

So I would like to see:

date_time hits
--------- ----
31-oct-2008 13:08 2
31-oct-2008 13:09 0
31-oct-2008 13:10 0
31-oct-2008 13:11 1
31-oct-2008 13:12 1
31-oct-2008 13:13 0
31-oct-2008 13:14 0
31-oct-2008 13:15 2

Is it possible without outer-joining to another table with "time" data
in it?

I was previously pointed to http://download.oracle.com/docs/cd/B19306_
01/server.102/b14223/analys... which relates to data densification -
but this method involves getting the "blanks" from another table.

--
jeremy

Charles Hooper

unread,
Oct 31, 2008, 11:22:12 AM10/31/08
to
> jeremy

You need a way to generate rows with the missing time elements, for
instance by creating an inline view with a simple counter:
SELECT
LEVEL COUNTER
FROM
DUAL
CONNECT BY
LEVEL<=1000;
COUNTER
==========
1
2
3
4
5
6
7
8
9
10
...

With a slight modification to the above:
SELECT
DT.DATE_TIME
FROM
(SELECT
TO_DATE('31-OCT-2008','DD-MON-YYYY') + (LEVEL/1440) DATE_TIME
FROM
DUAL
CONNECT BY
LEVEL<=1000) DT;

====================
31-OCT-2008 00:01:00
31-OCT-2008 00:02:00
31-OCT-2008 00:03:00
31-OCT-2008 00:04:00
31-OCT-2008 00:05:00
31-OCT-2008 00:06:00
31-OCT-2008 00:07:00
31-OCT-2008 00:08:00
31-OCT-2008 00:09:00
31-OCT-2008 00:10:00

Once the rows with the missing time elements are available, you could
outer join your table to this inline view, something like this:
SELECT
DT.DATE_TIME,
NVL(COUNT(*)) HIT_COUNT
FROM
(SELECT
TO_DATE('31-OCT-2008','DD-MON-YYYY') + (LEVEL/1440) DATE_TIME
FROM
DUAL
CONNECT BY
LEVEL<=1000) DT,
HITS
WHERE
DT.DATE_TIME=HITS.DATE_TIME(+)
GROUP BY
DT.DATE_TIME;

There are of course more than 1000 minutes in a day (right around
1440), so some adjustment will be necessary.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.

Jeremy

unread,
Oct 31, 2008, 11:51:50 AM10/31/08
to
In article <37d61e27-c8a2-4724-8b0b-55f925d5b167
@t39g2000prh.googlegroups.com>, hoope...@yahoo.com says...>
Thanks Charles, exactly the sort of thing I was looking for.


--
jeremy

Mark D Powell

unread,
Oct 31, 2008, 4:37:51 PM10/31/08
to
> K&M Machine-Fabricating, Inc.- Hide quoted text -
>
> - Show quoted text -

Charles have you ever found an article or any form of documentation
that explains how connect by level can be used on dual with 10g+ to
create a series. This does not work on 9.2 and below and I have never
found any useful information on this technique.

-- Mark D Powell --

Charles Hooper

unread,
Oct 31, 2008, 8:56:16 PM10/31/08
to
On Oct 31, 4:37 pm, Mark D Powell <Mark.Pow...@eds.com> wrote:
> On Oct 31, 11:22 am, Charles Hooper <hooperc2...@yahoo.com> wrote:
> > Once the rows with the missing time elements are available, you could
> > outer join your table to this inline view, something like this:
> > SELECT
> >   DT.DATE_TIME,
> >   NVL(COUNT(*)) HIT_COUNT
> > FROM
> >   (SELECT
> >     TO_DATE('31-OCT-2008','DD-MON-YYYY') + (LEVEL/1440) DATE_TIME
> >   FROM
> >     DUAL
> >   CONNECT BY
> >     LEVEL<=1000) DT,
> >   HITS
> > WHERE
> >   DT.DATE_TIME=HITS.DATE_TIME(+)
> > GROUP BY
> >   DT.DATE_TIME;
>
> > There are of course more than 1000 minutes in a day (right around
> > 1440), so some adjustment will be necessary.
>
> > Charles Hooper
> > IT Manager/Oracle DBA
> > K&M Machine-Fabricating, Inc.

> Charles have you ever found an article or any form of documentation


> that explains how connect by level can be used on dual with 10g+ to
> create a series.  This does not work on 9.2 and below and I have never
> found any useful information on this technique.
>
> -- Mark D Powell

Hi Mark,

It is a bit interesting that a search of the Oracle documentation for
"connect by level" returns 0 results. A Metalink search for the same
phrase only returns a couple hits, most of which report bugs. I first
saw the method of using connect by level used in this Usenet forum,
and I recall having difficulty understanding why it worked If we are
able to consider that what Tom Kyte writes is an extension of the
Oracle documentation, the method is documented to an extent:
http://www.oracle.com/technology/oramag/oracle/07-mar/o27asktom.html
"If you are in Oracle9i Release 2 and above. You can skip the function
altogether and just use DUAL to generate rows and parse the string."

http://tkyte.blogspot.com/2007/02/what-is-your-fizzbuzz-factor.html

http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:906593600346091624

There are cautions with using connect by level, as it may lead to
excessive CPU consumption. I noticed this problem during testing some
time ago, and attempted to reproduce a test case which demonstrates
the problem. In my test, the excessive CPU consumption was not as
severe as I expected.

Test case, first with connect by level, and then by using a narrow
table containing 1,000,000 rows:
CREATE TABLE T1 (C1 NUMBER) PCTFREE 0;

INSERT INTO
T1
SELECT
ROWNUM
FROM
INVENTORY_TRANS
WHERE
ROWNUM<=1000000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1')

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

SELECT
ROWNUM COUNTER
FROM
DUAL
CONNECT BY
LEVEL<=1000000;

SELECT
ROWNUM
FROM
INVENTORY_TRANS
WHERE
ROWNUM<=1000000;

SELECT
COUNT(*)
FROM
(SELECT
ROWNUM
FROM
T1
WHERE
ROWNUM<=1000000);

SELECT
SYSDATE
FROM
DUAL;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

I used a custom program to execute the queries fetching 100,000 rows
at a time while generating a typical DBMS Xplan and a 10046 trace at
level 8. The highlights of the test results:
SQL_ID 85b6yxsrp1c74, child number 0
-------------------------------------
SELECT ROWNUM COUNTER FROM DUAL CONNECT BY LEVEL<=1000000

Plan hash value: 1731520519

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 2
(100)| |
| 1 | COUNT | | |
| |
| 2 | CONNECT BY WITHOUT FILTERING| | |
| |
| 3 | FAST DUAL | | 1 | 2 (0)|
00:00:01 |
------------------------------------------------------------------------------

PARSE #8:c=0,e=798,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=173195416371
EXEC #8:c=0,e=64,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=173195421821
FETCH
#8:c=982806,e=979081,p=0,cr=0,cu=0,mis=0,r=100000,dep=0,og=1,tim=173196401970
FETCH
#8:c=904806,e=914328,p=0,cr=0,cu=0,mis=0,r=100000,dep=0,og=1,tim=173197888861
FETCH
#8:c=889206,e=943805,p=0,cr=0,cu=0,mis=0,r=100000,dep=0,og=1,tim=173199421758
FETCH
#8:c=889206,e=1037772,p=0,cr=0,cu=0,mis=0,r=100000,dep=0,og=1,tim=173201068951
FETCH
#8:c=920405,e=1393510,p=0,cr=0,cu=0,mis=0,r=100000,dep=0,og=1,tim=173203052512
FETCH
#8:c=889206,e=930261,p=0,cr=0,cu=0,mis=0,r=100000,dep=0,og=1,tim=173204571047
FETCH
#8:c=904806,e=1326808,p=0,cr=0,cu=0,mis=0,r=100000,dep=0,og=1,tim=173206485772
FETCH
#8:c=920406,e=920646,p=0,cr=0,cu=0,mis=0,r=100000,dep=0,og=1,tim=173208784741
FETCH
#8:c=904806,e=917949,p=0,cr=0,cu=0,mis=0,r=100000,dep=0,og=1,tim=173210289328
FETCH
#8:c=889205,e=915164,p=0,cr=0,cu=0,mis=0,r=100000,dep=0,og=1,tim=173211790691
FETCH
#8:c=140401,e=138735,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=173212516501

17.10013 seconds elapsed start to end
9.235259 seconds CPU during fetch

-----------------------------------
SQL_ID c2c36dfayx56p, child number 0
-------------------------------------
SELECT ROWNUM FROM T1 WHERE ROWNUM<=1000000

Plan hash value: 3836375644

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
Inst |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 313 (100)|
| |
|* 1 | COUNT STOPKEY | | | |
| |
| 2 | TABLE ACCESS FULL| T1 | 1003K| 313 (13)| 00:00:02
| LT |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=1000000)

PARSE
#8:c=0,e=22870,p=0,cr=11,cu=0,mis=1,r=0,dep=0,og=1,tim=174586284770
EXEC #8:c=0,e=42,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=174586289856
FETCH
#8:c=920406,e=921052,p=0,cr=141,cu=0,mis=0,r=100000,dep=0,og=1,tim=174587212022
FETCH
#8:c=889206,e=1522366,p=1,cr=137,cu=0,mis=0,r=100000,dep=0,og=1,tim=174589309337
FETCH
#8:c=889205,e=996714,p=0,cr=137,cu=0,mis=0,r=100000,dep=0,og=1,tim=174590895840
FETCH
#8:c=842406,e=1032087,p=0,cr=137,cu=0,mis=0,r=100000,dep=0,og=1,tim=174592519029
FETCH
#8:c=873605,e=934658,p=1,cr=139,cu=0,mis=0,r=100000,dep=0,og=1,tim=174594040940
FETCH
#8:c=889206,e=2998915,p=60,cr=137,cu=0,mis=0,r=100000,dep=0,og=1,tim=174597624729
FETCH
#8:c=904806,e=2258202,p=101,cr=137,cu=0,mis=0,r=100000,dep=0,og=1,tim=174600471249
FETCH
#8:c=904806,e=2037804,p=103,cr=137,cu=0,mis=0,r=100000,dep=0,og=1,tim=174603098647
FETCH
#8:c=873605,e=1493508,p=19,cr=138,cu=0,mis=0,r=100000,dep=0,og=1,tim=174605180433
FETCH
#8:c=842406,e=895755,p=0,cr=137,cu=0,mis=0,r=100000,dep=0,og=1,tim=174606667248
FETCH #8:c=0,e=18,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=174607254191

20.969421 seconds elapsed start to end
8.829657 seconds CPU during fetch

It is a bit interesting that the connect by level method reported no
consistent reads on 10.2.0.3 and 11.1.0.7. It is also a bit
interesting to look at the CPU consumption when 0 rows were returned.

alex...@gmail.com

unread,
Nov 2, 2008, 7:07:26 PM11/2/08
to
On 31 окт, 10:51, Jeremy <jeremy0...@gmail.com> wrote:
> In article <37d61e27-c8a2-4724-8b0b-55f925d5b167
> @t39g2000prh.googlegroups.com>, hooperc2...@yahoo.com says...>

You can make counter without CONNECT BY clause using any big table for
example DBA_OBJECTS

select ROUND(sysdate, 'MI')+t.i/1440
from (select rownum i from dba_objects where rownum <1440) t, dual;

COUNTER

02.11.2008 19:07:00
02.11.2008 19:08:00
02.11.2008 19:09:00
02.11.2008 19:10:00
02.11.2008 19:11:00
02.11.2008 19:12:00
02.11.2008 19:13:00
02.11.2008 19:14:00
02.11.2008 19:15:00

then You can join this table with you query.

Alex

Jeremy

unread,
Nov 3, 2008, 6:53:35 AM11/3/08
to
In article <93b470f9-5000-46d1-9d2c-481123a835c4
@w1g2000prk.googlegroups.com>, alex...@gmail.com says...>


> You can make counter without CONNECT BY clause using any big table for
> example DBA_OBJECTS
>
> select ROUND(sysdate, 'MI')+t.i/1440
> from (select rownum i from dba_objects where rownum <1440) t, dual;
>
> COUNTER
>
> 02.11.2008 19:07:00
> 02.11.2008 19:08:00
> 02.11.2008 19:09:00
> 02.11.2008 19:10:00
> 02.11.2008 19:11:00
> 02.11.2008 19:12:00
> 02.11.2008 19:13:00
> 02.11.2008 19:14:00
> 02.11.2008 19:15:00
>
> then You can join this table with you query.
>

Thanks Alex though I am reluctant to join to a table that *may* have
sufficient rows.... even if it always does, somehow it seems to be a
kludge to introduce a dependency on a table that is not within the
control of the application. I knowthe answer to this is to create a
table in the application with simply 1000 rows (or however many we need)
but that then seems inelegant when compared to the method suggested by
Charles which seems to do away with any need to refer to a separate
table.

--
jeremy

William Robertson

unread,
Nov 4, 2008, 2:27:01 AM11/4/08
to
On Nov 1, 12:56 am, Charles Hooper <hooperc2...@yahoo.com> wrote:
> It is a bit interesting that a search of the Oracle documentation for
> "connect by level" returns 0 results.  A Metalink search for the same
> phrase only returns a couple hits, most of which report bugs.  I first
> saw the method of using connect by level used in this Usenet forum,
> and I recall having difficulty understanding why it worked

There is no special CONNECT BY LEVEL syntax, which is why this exact
phrase is not documented. There is just "CONNECT BY [condition]". The
documentation then links to "Conditions" ("A condition specifies a
combination of one or more expressions and logical (Boolean) operators
and returns a value of TRUE, FALSE, or UNKNOWN.") I personally prefer
ROWNUM <= n, but it could be any condition you like that is true until
you want it to be false, e.g.

SELECT ROWNUM FROM dual CONNECT BY DBMS_RANDOM.VALUE(0,1) > 0.5

Charles Hooper

unread,
Nov 5, 2008, 11:00:08 AM11/5/08
to
On Nov 4, 2:27 am, William Robertson <williamr2...@googlemail.com>
wrote:

William,

Your explanation and example make perfect sense, if you read the above
as connect child rows to parent rows until the specified condition is
false. The documentation does not seem to be quite as clear:
http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/queries003.htm
"CONNECT BY specifies the relationship between parent rows and child
rows of the hierarchy."

Note the term "relationship", which seems to indicate that the
relationship between child rows and parent rows must be defined.

The samples in the documentation all include the "PRIOR" syntax, which
might be part of the source of the confusion (at least it was for me
as I was first trying to understand this syntax). From the same link:
"PRIOR is most commonly used when comparing column values with the
equality operator. (The PRIOR keyword can be on either side of the
operator.) PRIOR causes Oracle to use the value of the parent row in
the column. Operators other than the equal sign (=) are theoretically
possible in CONNECT BY clauses. However, the conditions created by
these other operators can result in an infinite loop through the
possible combinations. In this case Oracle detects the loop at run
time and returns an error."

The above hints at the possibility of using something other than an
equal sign in the connect by syntax.

Regardless of how clear the documentation is, your explanation
describes how it actually works. Thanks for the reply.

Correcting a couple editing problems within my posts of this thread:
NVL(COUNT(*)) HIT_COUNT
Should be:
NVL(COUNT(*),0) HIT_COUNT

---------


SELECT
COUNT(*)
FROM
(SELECT
ROWNUM
FROM
T1
WHERE
ROWNUM<=1000000);

Actual SQL statement which was executed during testing:


SELECT
ROWNUM
FROM
T1
WHERE

ROWNUM<=1000000;

---------


SELECT
ROWNUM
FROM
INVENTORY_TRANS
WHERE
ROWNUM<=1000000;

The above was for a timed test using an existing table in the database
with at least 1,000,000 rows. The time required for that test to
complete was so significantly long, that I decided not to post the
results.

William Robertson

unread,
Nov 6, 2008, 2:25:52 AM11/6/08
to
On Nov 5, 4:00 pm, Charles Hooper <hooperc2...@yahoo.com> wrote:
> On Nov 4, 2:27 am, William Robertson <williamr2...@googlemail.com>
> wrote:
>
>
>
> > On Nov 1, 12:56 am, Charles Hooper <hooperc2...@yahoo.com> wrote:
>
> > > It is a bit interesting that a search of the Oracle documentation for
> > > "connect by level" returns 0 results.  A Metalink search for the same
> > > phrase only returns a couple hits, most of which report bugs.  I first
> > > saw the method of using connect by level used in this Usenet forum,
> > > and I recall having difficulty understanding why it worked
>
> > There is no special CONNECT BY LEVEL syntax, which is why this exact
> > phrase is not documented. There is just "CONNECT BY [condition]". The
> > documentation then links to "Conditions" ("A condition specifies a
> > combination of one or more expressions and logical (Boolean) operators
> > and returns a value of TRUE, FALSE, or UNKNOWN.") I personally prefer
> > ROWNUM <= n, but it could be any condition you like that is true until
> > you want it to be false, e.g.
>
> > SELECT ROWNUM FROM dual CONNECT BY DBMS_RANDOM.VALUE(0,1) > 0.5
>
> William,
>
> Your explanation and example make perfect sense, if you read the above
> as connect child rows to parent rows until the specified condition is
> false.  The documentation does not seem to be quite as clear:http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/querie...

> "CONNECT BY specifies the relationship between parent rows and child
> rows of the hierarchy."
>
> Note the term "relationship", which seems to indicate that the
> relationship between child rows and parent rows must be defined.
>
> The samples in the documentation all include the "PRIOR" syntax, which
> might be part of the source of the confusion (at least it was for me
> as I was first trying to understand this syntax).  From the same link:
> "PRIOR is most commonly used when comparing column values with the
> equality operator. (The PRIOR keyword can be on either side of the
> operator.) PRIOR causes Oracle to use the value of the parent row in
> the column. Operators other than the equal sign (=) are theoretically
> possible in CONNECT BY clauses. However, the conditions created by
> these other operators can result in an infinite loop through the
> possible combinations. In this case Oracle detects the loop at run
> time and returns an error."
>
> The above hints at the possibility of using something other than an
> equal sign in the connect by syntax.
>
> Regardless of how clear the documentation is, your explanation
> describes how it actually works.  Thanks for the reply.

Since that section of the documentation is about hierarchical queries,
perhaps the author didn't think it was worth going into the mechanism
and just described the use of CONNECT BY and PRIOR in that context.

I notice none of the examples show PRIOR used in the SELECT list
either.

0 new messages