A good, but perhaps unintended, feature of LSD

7 views
Skip to first unread message

Branimir Sesar

unread,
Sep 1, 2012, 7:11:04 PM9/1/12
to lsd-...@googlegroups.com
Dear LSD users,

I have found something that is a good feature of LSD for me, but may be
an unintended feature or even a bug.

To summarize, if there are several snapshots of a static table, LSD will
only return the rows from the latest snapshot, for rows that have the
same primary key. For example, lets say a static table called
"test_stats" has snapshot 1 and snapshot 2. Snapshot 1 contains the
following rows:

primary_key value
1 5
2 6
3 1

Snapshot 2 contains the following rows:

primary_key value
1 10
2 7
4 2

Running a query in table "test_stats" returns

primary_key value
1 10
2 7
3 1
4 2

The rows with primary_key 1 and 2 appear in both snapshots, but only the
rows from the latest snapshot are returned. This "feature" is useful to
me because now I do not need to rebuild some static table from scratch
every time I add more observations to my detection tables.

Comments?

Brani

Mario Juric

unread,
Sep 2, 2012, 9:08:53 PM9/2/12
to lsd-...@googlegroups.com
On 9/1/12 18:11 , Branimir Sesar wrote:
> Dear LSD users,
>
> I have found something that is a good feature of LSD for me, but may be
> an unintended feature or even a bug.
>

...[snip]...

>
> The rows with primary_key 1 and 2 appear in both snapshots, but only the
> rows from the latest snapshot are returned. This "feature" is useful to
> me because now I do not need to rebuild some static table from scratch
> every time I add more observations to my detection tables.
>

Hi Brani,
If I understand correctly what you're describing, this *is* a feature.
The contents of "snapshot/XXXXX" directory is _not_ the whole (logical)
snapshot -- it only contains the rows (actually, cells) that were
updated since the last snapshot. To get the full logical snapshot, you
have to make a union with the data files in all older snapshots. This is
what lets LSD be space-efficient: files of cells that haven't changed
are shared by all logical snapshots that include them.

Cheers,
--
Mario Juric,
Data Mgmt. Project Scientist, Large Synoptic Survey Telescope
Web : http://www.cfa.harvard.edu/~mjuric/
Phone : +1 617 744 9003 PGP: ~mjuric/crypto/public.key

Mario Juric

unread,
Sep 2, 2012, 9:17:51 PM9/2/12
to lsd-...@googlegroups.com
On 9/2/12 20:08 , Mario Juric wrote:
> On 9/1/12 18:11 , Branimir Sesar wrote:
>> Dear LSD users,
>>
>> I have found something that is a good feature of LSD for me, but may be
>> an unintended feature or even a bug.
>>
>
> ...[snip]...
>
>>
>> The rows with primary_key 1 and 2 appear in both snapshots, but only the
>> rows from the latest snapshot are returned. This "feature" is useful to
>> me because now I do not need to rebuild some static table from scratch
>> every time I add more observations to my detection tables.
>>
>
> Hi Brani,
> If I understand correctly what you're describing, this *is* a feature.
> The contents of "snapshot/XXXXX" directory is _not_ the whole (logical)
> snapshot -- it only contains the rows (actually, cells) that were
> updated since the last snapshot. To get the full logical snapshot, you
> have to make a union with the data files in all older snapshots.

Let me clarify this sentence: "..., you have to imagine LSD making a
Reply all
Reply to author
Forward
0 new messages