Hi Eric,
The file you want to look at is lib_common.c
unsigned char *thash(unsigned char *buf, int size)
222 {
223 MHASH td;
224 unsigned char *hash;
225
226 td = mhash_init(config->selected_hash);
227 if (td == MHASH_FAILED)
228 exit(1);
229
230 mhash(td, buf, size);
231 hash = mhash_end(td);
232 return hash;
233 }
selected_hash is determined in lessfs.cfg, the configuration file.
Lessfs can therefore use almost any hash that is supported by
mhash.
This routine is called thash, since in the beginning lessfs did
not rely on mhash and included native support for the tiger hash
only.
The mhash library is starting to show it's age.
I should switch to something else but have been reluctant to do
so, since I would like to spend time on a deduplicating and thin
kernel blockdevice (device-mapper) instead.
The actual work is done in lib_common.c here:
2242 unsigned int db_commit_block(unsigned char *dbdata,
2243 INOBNO inobno, unsigned long
dsize)
2244 {
2245 unsigned char *stiger = NULL;
2246 DAT *compressed;
2247 unsigned long long inuse;
2248 unsigned int ret = 0;
2249
2250 FUNC;
2251 LDEBUG("db_commit_block");
2252 stiger = thash(dbdata, dsize);
2253 create_hash_note(stiger);
2254 inuse = getInUse(stiger);
2255 if (0 == inuse) {
2256 compressed = lfscompress((unsigned char *) dbdata,
dsize);
2257 ret = compressed->size;
2258 bin_write_dbdata(DBDTA, stiger, config->hashlen,
compressed->data,
2259 compressed->size);
2260 DATfree(compressed);
2261 } else {
2262 loghash("commit_block : only updated inuse for hash
", stiger);
2263 }
2264 inuse++;
2265 update_inuse(stiger, inuse);
2266 LDEBUG("db_commit_block : dbb %llu-%llu", inobno.inode,
2267 inobno.blocknr);
2268 bin_write_dbdata(DBB, (char *) &inobno,
sizeof(INOBNO), stiger,
2269 config->hashlen);
2270 delete_hash_note(stiger);
2271 s_free(stiger);
2272 return (ret);
2273 }
So line 2252 calculates the hash of the block that is about to be
stored.
2254 then checks if an identical chunk of data is already present.
When present, inuse >0 then we do not store the block.
The reference counter (inuse) is incremented and the database DBU
is updated by the call : update_inuse
Then the files inode->block number table is updated so that the
current block number points to the hash that we calculated
I hope this provides you with enough information to get an idea.
Please let me know should you have additional questions.
Mark.
------
Op 3 jul. 2014 om 00:48 heeft Eric <> het volgende
geschreven:
Hello,
From
what I have read on online posts, lessfs
does not implement byte-by-byte comparisons
(as most dedup file
systems); however, do
you think you could let me know which part
of the files deal with the mhash files? I
would like to implement lessfs's hashing
algorithms to create a more efficient
way of checking for collisions and/or and
ensuring the data block are in fact redundant
data
(My
technique will replace the need of
byte-by-byte comparison,
and eliminate
the amount of overhead)
Yes,
I understand that the probability
of collisions occurring
is very low, but this is a project I am
very interested in