Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

fc-scan.sh: find(1) and record stat(2)'s within a SQLite3 database

7 views
Skip to first unread message

Ivan Shmakov

unread,
Nov 13, 2011, 12:35:17 AM11/13/11
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Archive-name: fc-scan-sh-is
Submitted-by: onei...@gmail.com
Last-modified: 2011-10-18 +00:00
Copyright-Notice: both the README and the code are in the public domain

README.fc-scan-2011-10-18-is -*- Text -*-

Synopsis

$ fc-scan-2011-10-18 [--omit-schema] [FIND-ARGUMENTS...] | sqlite3 DATABASE

Requirements

find(1), sqlite3(1)

Description

This version of the fc-scan tool uses find(1) to walk through
the filesystem and produce (thanks to the -printf option) the
SQL code to record the filenames and the results of the
respective stat(2) calls into a SQLite database.

The only command line argument processed is the leading
--omit-schema option, which causes the tool to omit the SQL
schema definition from its output. (The default is to include
the schema.)

All the other arguments are not processed by this version of the
tool at all, and are simply passed to find(1).

For space efficiency reasons, the filenames are split in two
components: the leading directories, and the base name (similar
to dirname(1) and basename(1).) Each distinct value of these
components is then recorded only once.

Examples

$ fc-scan-2011-10-18 --omit-schema ~/private/ -xdev | sqlite3 ~/.fc/priv.db

Bugs

This version of the tool is only intended as a demonstration of
the respective database schema.

The primary reason to record stat(2) results in the first place
was to be able to detect file change and perform re-computation
of the message digests (SHA-1, SHA256) of the regular files'
contents. This version of the tool does not compute message
digests at all.

The ASCII 0x27 (APOSTROPHE, ') code was deemed more likely to
appear in filenames. Thus, the SQL code generated uses ASCII
0x22 (QUOTATION MARK, ") to quote filename components instead,
exploiting a particular SQL extension offered by SQLite. This
ASCII 0x22 code is not escaped, thus the INSERT SQL statement
generated for a filename containing such a code will fail.

Filename components containing codes out of the safe ASCII range
(0x20 through 0x7E) should be recorded as BLOB's instead of
TEXT.

As SQLite doesn't seem to support octal numerals, access modes
are recorded as 4-character TEXT values.

Command line interface is non-existent. All the options are
simply passed to find(1).

README.fc-scan-2011-10-18-is ends here

#!/bin/bash
### fc-scan-2011-10-18.sh --- fc scan in Shell -*- Sh -*-

### Ivan Shmakov, 2011
## This code is in the public domain

### Code:

schema () {
## .
cat <<EOF
- -- Sessions

CREATE TABLE "session" (
id INTEGER PRIMARY KEY,
start INTEGER NOT NULL);

CREATE TABLE "session-closed" (
id INTEGER PRIMARY KEY REFERENCES "session" ("id"),
stop INTEGER NOT NULL);

- -- Filenames

CREATE TABLE "filename" (
id INTEGER PRIMARY KEY,
filename TEXT NOT NULL UNIQUE);

CREATE UNIQUE INDEX "filename-unique"
ON "filename" ("filename");

CREATE TABLE "wholename" (
id INTEGER PRIMARY KEY,
dirname INTEGER NOT NULL REFERENCES "filename" ("id"),
basename INTEGER NOT NULL REFERENCES "filename" ("id"));

CREATE UNIQUE INDEX "wholename-unique"
ON "wholename" ("dirname", "basename");

CREATE VIEW "wholename-text"
AS SELECT wn."id" AS "wholename-id",
wn."dirname" AS "dirname-id",
dn."filename" AS "dirname",
wn."basename" AS "basename-id",
bn."filename" AS "basename",
dn."filename" || '/' || bn."filename" AS "wholename"
FROM "wholename" wn
INNER JOIN "filename" dn
ON (dn."id" = wn."dirname")
INNER JOIN "filename" bn
ON (bn."id" = wn."basename");

- -- File status

- -- FIXME: mode is currently TEXT, to store a number in octal
CREATE TABLE "stat" (
id INTEGER PRIMARY KEY,
device INTEGER NOT NULL,
inode INTEGER NOT NULL,
links INTEGER NOT NULL,
type INTEGER NOT NULL,
atime INTEGER NOT NULL,
ctime INTEGER NOT NULL,
mtime INTEGER NOT NULL,
uid INTEGER NOT NULL,
gid INTEGER NOT NULL,
mode TEXT NOT NULL,
size INTEGER NOT NULL,
blocks INTEGER NOT NULL);

CREATE UNIQUE INDEX "stat-unique"
ON "stat" ("device", "inode", "links", "type",
"atime", "ctime", "mtime",
"uid", "gid", "mode",
"size", "blocks");

- -- Tie the wholenames to status

CREATE TABLE "wholename-stat" (
id INTEGER PRIMARY KEY,
session INTEGER NOT NULL REFERENCES "session" ("id"),
wholename INTEGER NOT NULL REFERENCES "wholename" ("id"),
stat INTEGER NOT NULL REFERENCES "stat" ("id"),
timestamp INTEGER NOT NULL);

- -- FIXME: currently allowing multiple stat's per wholename per session
- -- CREATE UNIQUE INDEX "wholename-stat-unique"
- -- ON "wholename-stat" ("session", "wholename");

CREATE VIEW "wholename-stat-text"
AS SELECT ws."session",
wt."wholename-id",
wt."dirname-id", wt."dirname",
wt."basename-id", wt."basename",
wt."wholename",
ws."timestamp",
ws."stat" AS "stat-id",
st."device", st."inode", st."links", st."type",
st."atime", st."ctime", st."mtime",
st."uid", st."gid", st."mode",
st."size", st."blocks"
FROM "wholename-stat" ws
INNER JOIN "wholename-text" wt
ON (wt."wholename-id" = ws."wholename")
INNER JOIN "stat" st
ON (st."id" = ws."stat");
EOF
}

scan () {
## .
cat <<EOF
- -- Temporary tables

INSERT INTO "session" ("start")
SELECT strftime ('%s', 'now')
LIMIT 1;
CREATE TEMP TABLE "temp.sid"
AS SELECT last_insert_rowid () AS "session";

CREATE TEMP TABLE "temp.scan" (
id INTEGER PRIMARY KEY,
timestamp INTEGER NOT NULL,
dirname TEXT NOT NULL,
basename TEXT NOT NULL,
dnid INTEGER REFERENCES "filename" ("id"),
bnid INTEGER REFERENCES "filename" ("id"),
wnid INTEGER REFERENCES "wholename" ("id"),
stid INTEGER REFERENCES "stat" ("id"),
device INTEGER NOT NULL,
inode INTEGER NOT NULL,
links INTEGER NOT NULL,
type INTEGER NOT NULL,
atime INTEGER NOT NULL,
ctime INTEGER NOT NULL,
mtime INTEGER NOT NULL,
uid INTEGER NOT NULL,
gid INTEGER NOT NULL,
mode TEXT NOT NULL,
size INTEGER NOT NULL,
blocks INTEGER NOT NULL);

- -- Record the data
EOF
## FIXME: allow filenames with " (ASCII 0x22)
## NB: use non-standard "" for a string, as ' is common in filenames
find "$@" \
-printf 'INSERT INTO "temp.scan" ("timestamp", "dirname", "basename", "device", "inode", "links", "type", "atime", "ctime", "mtime", "uid", "gid", "mode", "size", "blocks") VALUES (strftime ('\''%%s'\'', '\''now'\''), "%h", "%f", %D, %i, %n, '\''%y'\'', %A@, %C@, %T@, %U, %G, '\''%#m'\'', %s, %k);\n'

cat <<EOF
- -- Close the session
- -- NB: the actual scan is over by this time
INSERT INTO "session-closed" ("id", "stop")
SELECT s."session", strftime ('%s', 'now')
FROM "temp.sid" s
LIMIT 1;

- -- Copy the data into permanent tables

UPDATE "temp.scan"
SET "dnid" = (SELECT dn."id"
FROM "filename" dn
WHERE (dn."filename" = "temp.scan"."dirname")
LIMIT 1)
WHERE "dnid" IS NULL;
UPDATE "temp.scan"
SET "bnid" = (SELECT bn."id"
FROM "filename" bn
WHERE (bn."filename" = "temp.scan"."basename")
LIMIT 1)
WHERE "bnid" IS NULL;
INSERT INTO "filename" ("filename")
SELECT DISTINCT sf."filename"
FROM (SELECT "basename" AS "filename"
FROM "temp.scan" sc
WHERE sc."bnid" IS NULL
UNION SELECT "dirname" AS "filename"
FROM "temp.scan" sc
WHERE sc."dnid" IS NULL) sf;
UPDATE "temp.scan"
SET "dnid" = (SELECT dn."id"
FROM "filename" dn
WHERE (dn."filename" = "temp.scan"."dirname")
LIMIT 1)
WHERE "dnid" IS NULL;
UPDATE "temp.scan"
SET "bnid" = (SELECT bn."id"
FROM "filename" bn
WHERE (bn."filename" = "temp.scan"."basename")
LIMIT 1)
WHERE "bnid" IS NULL;
- -- update the wholename table
UPDATE "temp.scan"
SET "wnid" = (SELECT wn."id"
FROM "wholename" wn
WHERE (wn."dirname" = "temp.scan"."dnid"
AND wn."basename" = "temp.scan"."bnid")
LIMIT 1)
WHERE "wnid" IS NULL;
INSERT INTO "wholename" ("dirname", "basename")
SELECT DISTINCT sc."dnid", sc."bnid"
FROM "temp.scan" sc
WHERE sc."wnid" IS NULL;
UPDATE "temp.scan"
SET "wnid" = (SELECT wn."id"
FROM "wholename" wn
WHERE (wn."dirname" = "temp.scan"."dnid"
AND wn."basename" = "temp.scan"."bnid")
LIMIT 1)
WHERE "wnid" IS NULL;
- -- update the stat table
UPDATE "temp.scan"
SET "stid" = (SELECT st."id"
FROM "stat" st
WHERE (st."device" = "temp.scan"."device"
AND st."inode" = "temp.scan"."inode"
AND st."links" = "temp.scan"."links"
AND st."type" = "temp.scan"."type"
AND st."atime" = "temp.scan"."atime"
AND st."ctime" = "temp.scan"."ctime"
AND st."mtime" = "temp.scan"."mtime"
AND st."uid" = "temp.scan"."uid"
AND st."gid" = "temp.scan"."gid"
AND st."mode" = "temp.scan"."mode"
AND st."size" = "temp.scan"."size"
AND st."blocks" = "temp.scan"."blocks")
LIMIT 1)
WHERE "stid" IS NULL;
INSERT INTO "stat" (
"device", "inode", "links", "type",
"atime", "ctime", "mtime",
"uid", "gid", "mode", "size", "blocks")
SELECT DISTINCT "device", "inode", "links", "type",
"atime", "ctime", "mtime",
"uid", "gid", "mode", "size", "blocks"
FROM "temp.scan" sc
WHERE "stid" IS NULL;
UPDATE "temp.scan"
SET "stid" = (SELECT st."id"
FROM "stat" st
WHERE (st."device" = "temp.scan"."device"
AND st."inode" = "temp.scan"."inode"
AND st."links" = "temp.scan"."links"
AND st."type" = "temp.scan"."type"
AND st."atime" = "temp.scan"."atime"
AND st."ctime" = "temp.scan"."ctime"
AND st."mtime" = "temp.scan"."mtime"
AND st."uid" = "temp.scan"."uid"
AND st."gid" = "temp.scan"."gid"
AND st."mode" = "temp.scan"."mode"
AND st."size" = "temp.scan"."size"
AND st."blocks" = "temp.scan"."blocks")
LIMIT 1)
WHERE "stid" IS NULL;
- -- update the wholename-stat table
INSERT INTO "wholename-stat" ("wholename", "stat", "timestamp", "session")
SELECT sc."wnid", sc."stid", sc."timestamp",
(SELECT "session" FROM "temp.sid" LIMIT 1)
FROM "temp.scan" sc;
EOF
}

printf %s\\n\\n BEGIN\;
if [ "$1" = --omit-schema ] ; then
shift
else
schema
printf \\n
fi
scan "$@"
printf \\n%s\\n END\;

### Emacs trailer
## Local variables:
## coding: us-ascii
## fill-column: 72
## indent-tabs-mode: nil
## ispell-local-dictionary: "american"
## End:
### fc-scan-2011-10-18.sh ends here

- --
FSF associate member #7257
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIcBAEBAgAGBQJOv1cVAAoJEOhSCjIbp0Kh8WMP/iL/36fYvMkE5CKBIQCSk5/r
qNIh+i0VSq65DRT45qDjhSxBjVBdP4M5DXTxl7yb8VqsaTKztv1Cesld/+LRoesi
A9KUtAfFtkFeZmMIUxuR31xx2ja1ppxZl+SPsLQgK3hJK4LaSZtGKN3PEU50ua49
wn62NN3bvYn+cXBspaV/VAc2sqGg26zI6Y/z4MdMUY7VB2DVXcrL3XMAKKSKBYKC
UOSgYuyVepY9fEYT1qzGDQD+H3IbsQcQ6n5wUkPZ8TiJ0rvqzU2TTS3pTtL5KMVH
ahQcRp4/3GoYLPEixjUDPeR5shSq9K2wg+LSJLT1HBT1Iyc5B9qeN3AxUn5R11IL
aA0lFrC+BNJZfuDRAg6INh8WuizX0hjG4w6CicK1TwGEzXRnqcNQd1hJkiMquULa
1YgoDw+1diqR9P6BBl5sbKt1BYRCgtWgwBqtIQl21eVXizFXkSYqgvyOPZsV34fm
FXQ6suVqntWyvYMQjPC7UtG0eyk0T+gpSWIePZxmroXdJLDjKdD052Dy7tq8hJ4f
AIWinh63Z6Cyk6P0haCGjdM668r5rrwoD1yxttc7zOR9dCg1ru9OuqxnGh3LiBIs
DryWodpHRo5uf4pzD9JlY/MbgzGhwjb8+z/jXkFdbHQ4+QkCj3iDwlZbeZpr1ZVW
JZ5Y99AUhsCl9VDU5nNx
=EBFj
-----END PGP SIGNATURE-----

Ivan Shmakov

unread,
Dec 7, 2011, 12:57:19 PM12/7/11
to
Archive-name: fc-scan-perl-is
Submitted-by: onei...@gmail.com
Last-modified: 2011-12-07 +00:00
Copyright-Notice: both the README and the code are in the public domain

README.fc-scan-2011-12-07-is -*- Text -*-

Synopsis

$ sqlite3 DATABASE < fc-create.sql
$ find [FIND-ARGUMENTS...] -print0 | fc-scan-2011-12-07 DATABASE

Requirements

DBD::SQLite, DBI, Digest::SHA, File::Spec, IO::Handle,
Sys::Hostname, UUID.

Description

This version of the fc-scan tool reads a list of NUL-separated
filenames from standard input (as could be produced by, e. g.,
the -print0 find(1) command) and records them, along with the
results of the lstat(2) calls on these names, into a SQLite
database.

The one and only accepted command line argument is the filename
of the SQLite database file.

For space efficiency reasons, the filenames are split in two
components: the leading directories, and the base name (similar
to dirname(1) and basename(1).) Each distinct value of these
components is then recorded only once.

The schema used allows a single SQLite database file to hold the
information about multiple (consequent) runs of the tool. Thus,
it becomes possible to track changes to the filesystem state.

This Perl-based version of the tool fixes several bugs and
misfeatures of the original Shell-based one, and is intended to
be a basis for the future version supporting SHA digest
computation for the files changed between the runs.

In particular, filenames are now allowed to contain the octets
outside of the safe ASCII range (codes 32 through 126), and are
recorded as BLOB's if containing these. The NUL character isn't
allowed, but such filenames aren't possible with POSIX anyway.

The st_atim and st_dev fields of the stat structure (as returned
by stat(2)) are supposed to change often than most of the other
fields and, for space-efficiency reasons, are recorded in the
separate stat-fluid table (as opposed to the stat-solid one.)

Examples

$ sqlite3 fc.db < fc-create.sql
$ find ~/private/ -xdev -print0 | fc-scan-2011-12-07 fc.db

Bugs

This is an alpha version of both the tool and the data schema.

The primary reason to record stat(2) results in the first place
was to be able to detect file change and perform re-computation
of the message digests (SHA-1, SHA256) of the regular files'
contents. This version of the tool does not compute message
digests at all.

The Digest::SHA dependency of the code is thus completely bogus.

The st_rdev, st_blksize, and st_blocks fields are not recorded.

The st_mode field is split into access mode (which is assumed to
occupy the 12 least-significant bits) and file type (the rest)
fields, which are then recorded as is, which may lead to
portability problems.

SQLite doesn't seem to support conversion of integers to ASCII
octal-base strings. Thus, file access modes may look unfamiliar
(e. g., 420 instead of 0644.)

Explicit SQL transactions are to be used. The current version
of the code allows for race conditions.

Command line interface is non-existent.

README.fc-scan-2011-12-07-is ends here

#!/usr/bin/perl
### fc-scan-2011-12-07.perl -*- Perl -*-

### Copyright (C) 2011 Ivan Shmakov

## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or (at
## your option) any later version.

## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.

## You should have received a copy of the GNU General Public License
## along with this program. If not, see <http://www.gnu.org/licenses/>.

## Based on: fc-scan-2011-10-18.sh (2011-10-18 13:05:51+07:00.)
## Depends on: fc-create.sql (2011-11-30 00:41:12+07:00.)

### Code:

use strict;
use warnings;

use English qw (-no_match_vars);

require DBD::SQLite;
require DBI;
# require Data::Dump;
require Digest::SHA;
# require DateTime;
# require Getopt::Mixed "nextOption";
require IO::Handle;

my $progname;
$progname = $PROGRAM_NAME;
$progname =~ s/^.*\///;

# ## NB: a la GNU Autotools
# my ($PACKAGE, $PACKAGE_BUGREPORT);
# my ($PACKAGE_NAME, $PACKAGE_VERSION);
# $PACKAGE = "fccs";
# $PACKAGE_NAME = "FCCS";
# $PACKAGE_VERSION = "0.1";
# $PACKAGE_BUGREPORT = 'iv...@theory.asu.ru';

my $warnings_p = 1;
BEGIN {
$SIG{"__DIE__"} = sub {
die ("$progname: ", @_);
};
$SIG{"__WARN__"} = sub {
warn ("$progname: Warning: ", @_)
if ($warnings_p);
};
}

### Package: fcscan::dbuniq
package fcscan::dbuniq;

use strict;
use warnings;

sub bind1 {
my ($self, $st, @a) = @_;
## FIXME: ->bind_param () return value isn't checked
for (my $i = ($#a << 1); $i >= 0; $i -= 2) {
$st->bind_param (1 + ($i >> 1), $a[$i], $a[1 + $i]);
}
}

sub query {
my ($self, @args) = @_;
my $listref
= $self->{"listref1"};
my $row
= undef;
foreach my $st (@$listref) {
## FIXME: ->bind1 () return value isn't checked
$self->bind1 ($st, @args);
$st->execute ()
or die ($st->errstr ());
$row
= $st->fetchrow_hashref ();
## .
return undef
unless (defined ($row));
}
## .
return $row;
}

sub identifier {
my ($self, @args) = @_;
{
my $row
= $self->query (@args);
## .
return $row
if (defined ($row));
}
my $listref
= $self->{"listref2"};
foreach my $st (@$listref) {
## FIXME: ->bind1 () return value isn't checked
$self->bind1 ($st, @args);
$st->execute ()
or die ($st->errstr ());
}
my $lii
= $self->{"db"}->last_insert_id (undef, undef, undef, undef);
## .
return { $self->{"key"} => $lii };
}

sub new {
my ($class, $db, $listref1, $listref2) = @_;
my @list1 = map {
## .
$db->prepare ($_)
or die ($db->errstr ());
} @$listref1;
my @list2 = map {
## .
$db->prepare ($_)
or die ($db->errstr ());
} @$listref2;
my $self = {
"db" => $db,
"key" => "last_insert_id",
"listref1" => \@list1,
"listref2" => \@list2
};
# print STDERR (Data::Dump::dump ($self), "\n");
bless ($self, $class);

## .
$self;
}

### Package: fcscan::dbq
package fcscan::dbq;

use strict;
use warnings;

require File::Spec;
require Sys::Hostname;
require UUID;

## Utility

sub either {
for my $a (@_) {
## .
return $a
if (defined ($a));
};
## .
return undef;
}

sub hostname {
## .
return ($ENV{"HOSTNAME"}
// Sys::Hostname::hostname ());
}

sub hostuuid {
my $hostuuid_s
= $ENV{"HOSTUUID"};
my $hostuuid
= undef;
if (defined ($hostuuid_s)) {
0 <= UUID::parse ($hostuuid_s, $hostuuid)
or die ("Cannot parse host UUID: ", $hostuuid_s);
}
## .
return $hostuuid;
}

## Methods

sub init_environment {
my ($self) = @_;
my $o
= $self->{"options"};
my $hostname
= $o->{"hostname"} // hostname ();
my $hostuuid
= $o->{"hostuuid"} // hostuuid ();
my $q
= $self->{"dbuniq"}->{"environment"};
## NB: assuming that hostname is safe ASCII
my $row
= $q->identifier ($hostname, DBI::SQL_CHAR (),
$hostuuid, DBI::SQL_BLOB ())
or die ($!);
## .
return ($self->{"environment"}
= either (@$row{qw (id last_insert_id)}));
}

sub init_session {
my ($self) = @_;
$self->init_environment ()
unless (exists ($self->{"environment"}));
die ("Session already initialized")
if (exists ($self->{"session"}));
my $s
= $self->{"session"}
= { };
my $environ
= $self->{"environment"};
my $uuid;
UUID::generate ($uuid);
defined ($uuid)
or die ();
$s->{"uuid"}
= $uuid;
## NB: assuming POSIX Epoch, as per time(2)
my $start
= $s->{"start"}
= time ();
my $q
= $self->{"dbuniq"}->{"session"};
my $row
= $q->identifier ($environ, DBI::SQL_INTEGER (),
$uuid, DBI::SQL_BLOB (),
$start, DBI::SQL_INTEGER ())
or die ($!);
## .
return ($s->{"id"}
= either (@$row{qw (id last_insert_id)}));
}

sub close_session {
my ($self) = @_;
die ("Session not initialized")
unless (exists ($self->{"session"}));
my $s
= $self->{"session"};
## NB: assuming POSIX Epoch, as per time(2)
my $stop
= time ();
## FIXME: dbuniq not quite fits here
my $q
= $self->{"dbuniq"}->{"session-closed"};
my $row
= $q->identifier ($stop, DBI::SQL_INTEGER ())
or die ($!);
## .
return ($s->{"id"}
= either (@$row{qw (id last_insert_id)}));
}

sub filename_id {
my ($self, $filename) = @_;
my $q
= $self->{"dbuniq"}->{"filename"};
## count the "unsafe" octets (outside of the [32, 126] range)
my $unsafe
= grep { $_ < 32 || $_ > 126 } (unpack ("C*", $filename));
my $sql_type
= ($unsafe == 0 ? DBI::SQL_CHAR () : DBI::SQL_BLOB ());
my $row
= $q->identifier ($filename, $sql_type)
or die ($!);
## .
return either (@$row{qw (id last_insert_id)});
}

sub wholename_id {
my ($self, $wholename_or_bn_id, $dn_id) = @_;
my $bn_id
= undef;
if (defined ($dn_id)) {
$bn_id
= $wholename_or_bn_id;
} else {
my ($vol, $dn, $bn)
= File::Spec->splitpath ($wholename_or_bn_id);
$bn_id
= $self->filename_id ($bn);
$dn_id
= $self->filename_id ($dn);
}
my $q
= $self->{"dbuniq"}->{"wholename"};
my $row
= $q->identifier ($dn_id, DBI::SQL_INTEGER (),
$bn_id, DBI::SQL_INTEGER ())
or die ($!);
## .
return either (@$row{qw (id last_insert_id)});
}

sub stat_solid_id {
my ($self, $st) = @_;
my $q
= $self->{"dbuniq"}->{"stat-solid"};
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## stat () => dev ino Mod lnk uid gid Xdv sz at mt ct bsz blk
## solid => ino^1 lnk^3 typ ct^10 mt^9 uid^4 gid^5 mod sz^7
## FIXME: is it sensible to encode file type as an integer?
my $typ
= ($$st[2] >> 12);
my $mod
= ($$st[2] & ((1 << 12) - 1));
my $row
= $q->identifier ($$st[1], DBI::SQL_INTEGER (),
$$st[3], DBI::SQL_INTEGER (),
$typ, DBI::SQL_INTEGER (),
$$st[10], DBI::SQL_INTEGER (),
$$st[9], DBI::SQL_INTEGER (),
$$st[4], DBI::SQL_INTEGER (),
$$st[5], DBI::SQL_INTEGER (),
$mod, DBI::SQL_INTEGER (),
$$st[7], DBI::SQL_INTEGER ())
or die ($!);
## .
return either (@$row{qw (id last_insert_id)});
}

sub stat_fluid_id {
my ($self, $stat) = @_;
my $q
= $self->{"dbuniq"}->{"stat-fluid"};
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## stat () => dev ino Mod lnk uid gid Xdv sz at mt ct bsz blk
## fluid => dev^0 at^8
my $row
= $q->identifier ($$stat[0], DBI::SQL_INTEGER (),
$$stat[8], DBI::SQL_INTEGER ())
or die ($!);
## .
return either (@$row{qw (id last_insert_id)});
}

sub fs_stat_id {
my ($self, $fields) = @_;
my $session
= $self->{"session"}->{"id"};
my $w_id
= ($fields->{"wholename-id"}
// $self->wholename_id ($fields->{"wholename"}))
or die ($!);
## FIXME: obtain (guess?) filesystem id if not given
my $fs_id
= $fields->{"filesystem-id"};
my $stat
= $fields->{"stat"};
my $so_id
= ($fields->{"stat-solid-id"}
// $self->stat_solid_id ($stat))
or die ($!);
## NB: can be undef (NULL)
my $fl_id
= ($fields->{"stat-fluid-id"}
// (exists ($fields->{"stat"})
? ($self->stat_fluid_id ($stat)
or die ($!))
: undef));
## FIXME: obtain (guess?) idpair id if not given
my $id2_id
= $fields->{"idpair-id"};
## FIXME: shouldn't a sub-second resolution time be used here?
## FIXME: assumes POSIX Epoch, as per time(2)
my $time
= ($fields->{"timestamp"}
// time ());
my $q
= $self->{"dbuniq"}->{"fs-stat"};
my $row
= $q->identifier ($session, DBI::SQL_INTEGER (),
$w_id, DBI::SQL_INTEGER (),
$fs_id, DBI::SQL_INTEGER (),
$so_id, DBI::SQL_INTEGER (),
$fl_id, DBI::SQL_INTEGER (),
$id2_id, DBI::SQL_INTEGER (),
$time, DBI::SQL_INTEGER ())
or die ($!);
## .
return either (@$row{qw (id last_insert_id)});
}

sub new {
my ($class, $db, $options) = @_;

## Prepare queries
my @qenv
= ([("SELECT \"id\" FROM \"environment\""
. " WHERE (\"hostname\" = ?1"
. " AND (?2 IS NULL OR \"uuid\" = ?2))")],
[("INSERT INTO \"environment\""
. " (\"hostname\", \"uuid\")"
. " VALUES (?1, ?2)")]);
my @qses
= ([],
[("INSERT INTO \"session\""
. " (\"environment\", \"uuid\", \"start\")"
. " VALUES (?1, ?2, ?3)")]);
my @qsover
= ([],
[("INSERT INTO \"session-closed\""
. " (\"stop\")"
. " VALUES (?1)")]);
my @qfname
= ([("SELECT \"id\" FROM \"filename\""
. " WHERE (\"filename\" = ?1)")],
[("INSERT INTO \"filename\""
. " (\"filename\")"
. " VALUES (?1)")]);
my @qwname
= ([("SELECT \"id\" FROM \"wholename\""
. " WHERE (\"dirname\" = ?1"
. " AND \"basename\" = ?2)")],
[("INSERT INTO \"wholename\""
. " (\"dirname\", \"basename\")"
. " VALUES (?1, ?2)")]);
my @qssol
= ([("SELECT \"id\" FROM \"stat-solid\""
. " WHERE (\"inode\" = ?1"
. " AND \"links\" = ?2"
. " AND \"type\" = ?3"
. " AND \"ctime\" = ?4"
. " AND \"mtime\" = ?5"
. " AND \"uid\" = ?6"
. " AND \"gid\" = ?7"
. " AND \"mode\" = ?8"
. " AND \"size\" = ?9)")],
[("INSERT INTO \"stat-solid\""
. " (\"inode\", \"links\", \"type\","
. " \"ctime\", \"mtime\", \"uid\", \"gid\","
. " \"mode\", \"size\")"
. " VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)")]);
my @qsflu
= ([("SELECT \"id\" FROM \"stat-fluid\""
. " WHERE (\"device\" = ?1"
. " AND \"atime\" = ?2)")],
[("INSERT INTO \"stat-fluid\""
. " (\"device\", \"atime\")"
. " VALUES (?1, ?2)")]);
my @qstat
= ([("SELECT \"id\" FROM \"fs-stat\""
. " WHERE (\"session\" = ?1"
. " AND \"wholename\" = ?2"
. " AND \"filesystem\" IS ?3"
. " AND \"solid\" = ?4"
. " AND \"fluid\" IS ?5"
. " AND \"idpair\" IS ?6"
. " AND \"timestamp\" = ?7)")],
[("INSERT INTO \"fs-stat\""
. " (\"session\", \"wholename\", \"filesystem\","
. " \"solid\", \"fluid\", \"idpair\", \"timestamp\")"
. " VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7)")]);

my $self = {
"db" => $db,
"dbuniq" => {
"environment" => fcscan::dbuniq->new ($db, @qenv),
"session" => fcscan::dbuniq->new ($db, @qses),
"session-closed" => fcscan::dbuniq->new ($db, @qsover),
"filename" => fcscan::dbuniq->new ($db, @qfname),
"wholename" => fcscan::dbuniq->new ($db, @qwname),
"stat-solid" => fcscan::dbuniq->new ($db, @qssol),
"stat-fluid" => fcscan::dbuniq->new ($db, @qsflu),
"fs-stat" => fcscan::dbuniq->new ($db, @qstat)
},
"options" => $options
};
# print STDERR Data::Dump::dump ($self);
bless ($self, $class);

## .
$self;
}

### Back to the main package
package main;

sub process_files_from {
my ($dbq, $in, $options) = @_;
local $/
= ($options->{"nul_p"} ? "\000" : "\n");
## NB: still allow newline (LF) translation unless nul_p
if ($options->{"nul_p"}) {
binmode ($in)
or die ($!);
}
while (my $filename = <$in>) {
## NB: a paranoid sanity check
$/ eq chop ($filename)
or die ();
print STDERR ("I: processing=", $filename, "\n");
if (my @s = lstat ($filename)) {
my $s_id
= $dbq->fs_stat_id ({
"wholename" => $filename,
"stat" => \@s
})
or die ($!);
## FIXME: capture the filehandle and compute the digests
} else {
warn ($filename, ": Cannot lstat(2): ", $!);
}
}

## .
return 1;
}

## Parse command line arguments
unless ((1 + $#ARGV == 1)) {
print STDERR ($progname . ": Exactly one argument must be given\n"
# . "Try `$progname --help' for more information.\n"
);
exit 1;
}
my $dbfile
= $ARGV[0];

## Open database
my $db
= DBI->connect ("dbi:SQLite:dbname=${dbfile}", "", "")
or die ($!);

## Initialize the engine
my $dbq
= fcscan::dbq->new ($db);

## Initiate the session
$dbq->init_session ()
or die ("Failed to initiate session");

## Process the files
my $in
= IO::Handle->new_from_fd (fileno ("STDIN"), "r");
process_files_from ($dbq, $in, { "nul_p" => 1 })
or die ("Failed to process the files");

## Close the session
$dbq->close_session ()
or die ("Failed to close session");

### Emacs trailer
## Local variables:
## coding: us-ascii
## fill-column: 72
## indent-tabs-mode: nil
## ispell-local-dictionary: "american"
## End:
### fc-scan-2011-12-07.perl ends here

--- fc-create.sql --- SQLite schema for FC -*- sql -*-

--- Ivan Shmakov, 2011
-- This schema is in the public domain

--- Schema:

-- Digest types

CREATE TABLE "digest" (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
uuid BLOB NOT NULL);

CREATE UNIQUE INDEX "digest-name-unique"
ON "digest" ("name");

CREATE UNIQUE INDEX "digest-uuid-unique"
ON "digest" ("uuid");

-- Chunks

CREATE TABLE "chunk" (
id INTEGER PRIMARY KEY,
length INTEGER NOT NULL);

CREATE TABLE "chunk-digest" (
id INTEGER PRIMARY KEY,
chunk INTEGER NOT NULL REFERENCES "chunk" ("id"),
kind INTEGER NOT NULL REFERENCES "digest" ("id"),
value BLOB NOT NULL);

CREATE UNIQUE INDEX "chunk-digest-unique"
ON "chunk-digest" ("chunk", "kind");

-- NB: there's no way to enforce the uniqueness of the set of
-- (kind, value) pairs.

-- Filenames

CREATE TABLE "filename" (
id INTEGER PRIMARY KEY,
filename TEXT NOT NULL UNIQUE);
-- NB: filename may also be a BLOB; must be if unsafe

CREATE UNIQUE INDEX "filename-unique"
ON "filename" ("filename");

CREATE TABLE "wholename" (
id INTEGER PRIMARY KEY,
dirname INTEGER NOT NULL REFERENCES "filename" ("id"),
basename INTEGER NOT NULL REFERENCES "filename" ("id"));

CREATE UNIQUE INDEX "wholename-unique"
ON "wholename" ("dirname", "basename");

CREATE VIEW "wholename-text"
AS SELECT wn."id" AS "wholename-id",
wn."dirname" AS "dirname-id",
dn."filename" AS "dirname",
wn."basename" AS "basename-id",
bn."filename" AS "basename",
dn."filename" || bn."filename" AS "wholename"
FROM "wholename" wn
INNER JOIN "filename" dn
ON (dn."id" = wn."dirname")
INNER JOIN "filename" bn
ON (dn."id" = wn."basename");

-- Sessions

CREATE TABLE "environment" (
id INTEGER PRIMARY KEY,
uuid BLOB,
hostname TEXT NOT NULL);

CREATE UNIQUE INDEX "environment-unique"
ON "environment" ("uuid", "hostname");

CREATE TABLE "session" (
id INTEGER PRIMARY KEY,
environment INTEGER NOT NULL REFERENCES "environment" ("id"),
uuid BLOB NOT NULL,
start INTEGER NOT NULL);

CREATE UNIQUE INDEX "session-uuid-unique"
ON "session" ("uuid");

CREATE TABLE "session-closed" (
id INTEGER PRIMARY KEY REFERENCES "session" ("id"),
stop INTEGER NOT NULL);

CREATE TABLE "session-previous" (
id INTEGER PRIMARY KEY REFERENCES "session" ("id"),
previous INTEGER NOT NULL REFERENCES "session" ("id"));

-- File status

-- The stat-solid relation contains the values tied to file status
-- change timestamp (ctime), while stat-fluid contains the rest.

CREATE TABLE "stat-solid" (
id INTEGER PRIMARY KEY,
inode INTEGER NOT NULL,
links INTEGER NOT NULL,
type INTEGER NOT NULL,
ctime INTEGER NOT NULL,
mtime INTEGER NOT NULL,
uid INTEGER NOT NULL,
gid INTEGER NOT NULL,
mode INTEGER NOT NULL,
size INTEGER NOT NULL);

CREATE UNIQUE INDEX "stat-solid-unique"
ON "stat-solid" ("inode", "links", "type",
"ctime", "mtime",
"uid", "gid", "mode",
"size");

CREATE TABLE "stat-fluid" (
id INTEGER PRIMARY KEY,
device INTEGER NOT NULL,
atime INTEGER NOT NULL);

CREATE UNIQUE INDEX "stat-fluid-unique"
ON "stat-fluid" ("device", "atime");

CREATE TABLE "filesystem" (
id INTEGER PRIMARY KEY,
environment INTEGER NOT NULL REFERENCES "environment" ("id"),
directory TEXT NOT NULL,
uuid BLOB);
-- NB: directory is the mount point

CREATE TABLE "idname" (
id INTEGER PRIMARY KEY,
idname TEXT NOT NULL);

CREATE UNIQUE INDEX "idname-unique"
ON "idname" ("idname");

CREATE TABLE "idpair" (
id INTEGER PRIMARY KEY,
uidname INTEGER NOT NULL REFERENCES "idname" ("id"),
gidname INTEGER NOT NULL REFERENCES "idname" ("id"));

CREATE UNIQUE INDEX "idpair-unique"
ON "idpair" ("uidname", "gidname");

-- Filesystem

CREATE TABLE "fs-stat" (
id INTEGER PRIMARY KEY,
session INTEGER NOT NULL REFERENCES "session" ("id"),
wholename INTEGER NOT NULL REFERENCES "wholename" ("id"),
filesystem INTEGER REFERENCES "filesystem" ("id"),
solid INTEGER NOT NULL REFERENCES "stat-solid" ("id"),
fluid INTEGER REFERENCES "stat-fluid" ("id"),
idpair INTEGER REFERENCES "idpair" ("id"),
timestamp INTEGER NOT NULL);

CREATE UNIQUE INDEX "fs-stat-unique"
ON "fs-stat" ("wholename", "session");
-- NB: every wholename is only tried once

CREATE TABLE "fs-chunk" (
id INTEGER PRIMARY KEY,
session INTEGER NOT NULL REFERENCES "session" ("id"),
wholename INTEGER NOT NULL REFERENCES "wholename" ("id"),
chunk INTEGER NOT NULL REFERENCES "chunk" ("id"));

-- for newly generated digests
CREATE TABLE "fs-chunk-timestamp" (
id INTEGER PRIMARY KEY REFERENCES "fs-chunk",
timestamp INTEGER NOT NULL);

-- for digests assumed to have remained the same
CREATE TABLE "fs-chunk-inferred" (
id INTEGER PRIMARY KEY REFERENCES "fs-chunk" ("id"),
inferred INTEGER NOT NULL REFERENCES "fs-chunk" ("id"));

--- Emacs trailer
-- Local variables:
-- coding: us-ascii
-- mode: outline-minor
-- fill-column: 72
-- indent-tabs-mode: nil
-- outline-regexp: "---"
-- sql-product: sqlite
-- End:
--- fc-create.sql ends here
0 new messages