Removing invalid files

431 views
Skip to first unread message

Noiz2

unread,
Aug 31, 2009, 7:36:50 PM8/31/09
to Soundminer Users Collective
Maybe I'm missing something but in V3 it was not a big deal to get rid
of records with invalid paths (it's been awhile but I remember a
command for this - or am I bonkers?). With V4 I can "validate paths"
and get a nice list that I can ??? Frame for viewing? I'm certainly
not going to go one by one and delete them. If SM can find them then
why cant it mark them? Then I could get rid of them. I probably
haven't hit this in V4 because I haven't been doing a lot of
housekeeping till now (did a lot in V3). But it seems that the
"solution" for a lot of these kinds of issues now in V4 is "just
rescan the library" and that is getting less and less practical is my
library is getting larger. Plus I tend to have a bunch of specialized
DB's that I keep for handiness.

So If I haven't just over looked the obvious I would like to see some
library maintenance features added/ added back into the next version.
I want to be able to identify duplicates (or flag them), the same for
invalid files and invalid file paths. I think flagging them is
enough, I can then do what I want with them. But right now other than
knowing there is a bunch of junk records I have no option other than
rescanning the whole library and that can take a few days, well
nights, because I for instance don't want music files mixed with the
FX files so they have to be scanned in separate passes and any of my
specialized libraries will also need their own pass. Last time I did
the whole library it took over five hours. So if I mess around a
bunch it might take a week (assuming I need to be using SM in the day
and only scan in after work) before all the DB's are back up to date.

stevep

unread,
Sep 2, 2009, 7:29:52 AM9/2/09
to Soundminer Users Collective
Scott

I know that JD was working on a duplicates function, but it seems to
me that you want to just delete records which can be done easily in
the contextual menus. You can batch select files, browsed recrods,
control-click and hit REMOVE or even DELETE.

Problem with the two functions is that to find that info, you have to
iterate over every record to see if it valid. this can take as much
time as scanning and while we are there, why does it take so long ot
scan your records? I can scan 250k of records in less than 4 hours.

are they mp3s with no overviews? this will grealty reduce the speed
of scanning. Are yo uscanning over a network?

steve

Noiz2

unread,
Sep 2, 2009, 11:14:22 AM9/2/09
to Soundminer Users Collective
Well first let me say that I hadn't scanned in awhile and just
rescanned the all inclusive Db last night and is was only a couple of
hours so I missed on the length. I had remembered scanning over night
but I was wrong SM is a bunch faster, maybe it's the move from the G4
to the G5? So it's not quite the nightmare I portrayed.

Going through the Db by had one record at a time is not a workaround.
And as a FYI the DELETE function doesn't work (at least on my system)
on files with an invalid path, you have to REMOVE.

Even if it took as long as a scan it would still be VERY useful. Now
one has to first embed the whole Db before rescanning to make sure one
didn't inadvertently redo meta data and miss embedding at the time
(I'm probably being anal but I hate finding that I lost info out of
stupidity). So rescanning actually would take significantly longer
(in combo).

Out of curiosity what can one do with the invalid path list? Seems
like there should be something useful one could do with it but I can't
see what.

Thanks
SK

DF

unread,
Sep 2, 2009, 12:43:04 PM9/2/09
to Soundminer Users Collective
I'm with you. Now I'm a big fan of SM, don't get me wrong here, but
by far it's weakest link (IMO) is mass database management. I've not
been using SoundLog for several years, but the thing I still miss
constantly, is the ability to relink records. (Actually I'm just
recently getting back into using Soundlog to generate names again.)

Since there's no current way to tell what data has been embedded or
not, Scott's point about embedding everything again, or losing data
because it wasn't embedded, is valid. Couple that with no way to
remove duplicates or broken links and it's a recipe for lost data, or
a database that only partially works.

Due to these issues, I'm always afraid of moving anything, for fear
it'll break a link in SM. I have numerous databases, for things like
old files, commercial, things that are restricted, etc.. And these
databases get compiled from different locations in the Finder.
(Example - new commercial files will be on a different drive due to
space, but gets imported into the Commercial database). And because
I'm afraid to ever move anything, for fear of having to wipe the
databases & start over, this makes my Finder-level management very
disorganized.

At one point I thought that the preference "Check if file exists in
database" would solve this, but it checks the path not the file name.
So if you move a file in the Finder & scan, rather than relink this,
it'll give you another record. So you're left with a broken link in
the original, and the only way to get rid of it is manually, or to
wipe the db & rescan.

>>>>>>>>>>>>>>>>>>
"With V4 I can "validate paths"
and get a nice list that I can ??? Frame for viewing? I'm certainly
not going to go one by one and delete them. If SM can find them then
why cant it mark them? "
<<<<<<<<<<<<<<<<

+1 How do I use this list? Marking those broken records would be a
huge help.

Best,

- Dave

Noiz2

unread,
Sep 2, 2009, 3:24:34 PM9/2/09
to Soundminer Users Collective
Dave,

Does SLP work in 10.4, 10.5 & PT 7.x? I switched to SM and haven't
opened SLP in awhile. I thought it was broken with the later PT & OS
but? If not I should dust it off because it's a nice compliment to SM
and having mirrored info would be a nice reassurance.

And let me add that while there are a few things that bug me about SM
V4 (the list is a lot shorter that PT's) I am still very happy with
it. I get a bit frustrated by things that V3 could do that V4 can't
or that worked more to my liking in V3. Removing orphans was one, the
whole Admin window I liked better (I'm still often searching for
functions that I know I used in V3 but are either missing or someplace
I didn't expect) and the ability to record inside SM was something
that I used semi regularly. I also liked that projects could be setup
with export options. I find myself having to go to Prefs a lot to
change output options (interleaved VS dual stereo being the most
common switch needed).
But I assume I was in the minority on some of those preferences so...

Cheers
SK

DF

unread,
Sep 2, 2009, 3:59:00 PM9/2/09
to Soundminer Users Collective
On Sep 2, 12:24 pm, Noiz2 <sc...@askinc.net> wrote:
> Dave,
>
> Does SLP work in 10.4, 10.5 & PT 7.x?  I switched to SM and haven't
> opened SLP in awhile.  I thought it was broken with the later PT & OS
> but?  If not I should dust it off because it's a nice compliment to SM
> and having mirrored info would be a nice reassurance.
>

Well, I'm not sure really. I had to license a few things. The
version I'm using, I took a LOT of things out. I just used what I'd
built into SLP as a starting point, and mutilated it to do just a
handful of things.

But I was unable to continue support on it when those licensed things
stopped doing things like - long file names, installing components
correctly, etc.. There were no sales to support additional
investments in keeping it running, and thankfully there is SM which
I'd much rather use myself.

- Dave

Jeremiah Moore

unread,
Sep 2, 2009, 4:10:48 PM9/2/09
to Soundminer Users Collective, Justin Drury, DF
DF -

You can re-link from the SM browser. Select any number of records and
control-click(right-click) / Relink Selected.


I relocate things in the Finder frequently by this method:

1) Move folders / files in the Finder.

2) Isolate the contents/subcontents of the folders in question to the
SM browser.

First, activate Metabrowser, and go /Summary/Pathname/etc... to drill
down to the folder(s) you're moving. This is looking at the Pathname
field in the database, so no worries that the files are already moved.
Cmd-click to select multiples (they have to be on the same level of
hierarchy to multiple-select). Now hit "Return" on your keyboard, and
all items in your SM database matching that(those) path(s) are
displayed as the found set. (**see note)

3) select-all and right-click/Relink Selected.

4) Wait.... Wait.... Test a few records to be sure it worked.

5) Run "check filepaths" to be sure nothing was missed. And do a
"rebuild" on MetaBrowser, as its index will now be out-of-date.


** BEWARE a potentially serious bug here. If you have two folders,
one of whose names is in its entirety a subset of another (as in a
folder called "Hamster Claps" and another called "Hamster Claps
Funnier"), selecting the one-which-is-a-subset (Hamster Claps) will
ALSO select files from the one-which-is-a-superset (Hamster Claps
Funnier). I assume this is because a query is being run which says
"match records which contain this" and, sure enough, it does.

I've discussed the bug with Justin, but no telling what level of
priority this is for a fix / or what performace hit would be taken by
changing the query to be more precise.

(perhaps it could be solved by the db / query seeing the closing "/"
of the path?)

-jeremiah
--
-----------------------------------------------------------
jeremiah moore | SOUND | jmo...@northstation.net
http://www.jeremiahmoore.com/
http://babyjane.com/timeweb/

DF

unread,
Sep 2, 2009, 4:26:39 PM9/2/09
to Soundminer Users Collective
I just did a quick test of this - the parts I could find, and it did
work. It's pretty early to tell how convenient this is going to be.
Step 2 will work when you know exactly what's been moved, and if you
do it in chunks. This is nowhere near as convenient as simply being
in a database & choosing a folder or disk to relink. But it did work,
so I'll have to test it more. Maybe there's a way to avoid the
Metabrowser step.

> 5) Run "check filepaths" to be sure nothing was missed. And do a
> "rebuild" on MetaBrowser, as its index will now be out-of-date.

I couldn't find how to rebuild. I never use the Metabrowser.

Thanks for this tip though - it warrants some investigation for sure.

- Dave

DF

unread,
Sep 2, 2009, 4:34:46 PM9/2/09
to Soundminer Users Collective
OK test 2 did what I wanted.

1) Scanned in a small folder of files into a test DB, which also had
subfolders.
2) Renamed the top level folder in the finder, to break the links in
the DB.
3) Selected ALL records in that DB.
4) Control-CLick to relink
5) Selected the top level folder again.

The relink worked!

Do we know if this will work well on a DB of many thousands of records
(say 200,000)?

If this continues to work well, this feature needs to be made more
public! None of my colleagues knew it was in there!

- Dave

Jeremiah Moore

unread,
Sep 2, 2009, 4:43:28 PM9/2/09
to Soundminer Users Collective, DF, Justin Drury
DF - You can also re-link an entire database. If stuff is spread
across drives, you have to do multiple passes - the highest level you
can select is a Volume.

As steve mentioned, it's not quicker than re-scanning. But it can be
good for peace of mind. (I too wonder if I've failed to stamp
metadata. I would prefer an iTunes-like mode, where metadata is
stamped every time it's edited.)

The metabrowser step is just a shortcut for selecting a folder of
sounds (as represented in the database) and therefore not having to
relink the whole DB.

Metabrowser rebuild button is on lower-right of metabrowser pane when
it's open. Metabrowser is cool, and warrants a little exploration.
Especially the "Pathname" section, and the "Field Tagger" section.

-jeremiah


p.s. I've successfully re-lunk databases with 200,000+ records. I
just run check filepaths after.

The one bugaboo - and potentially a showstopper - w/ relink is if you
have duplicate filenames anywhere, you're hosed because Filename is
the unique ID for relinking. I once had a bunch of phone buttons I'd
inherited, with filenames like "1" and "2". Bad news for relink.

Justin and Steve are big proponents of "metadata always in the files /
rescan if in doubt" but a lot of us small creative editors, who
constantly "work" our metadata, have an ethic of "the database is
golden, don't touch it!"

Even the order in which you've scanned has value.... recent stuff will
show up at the bottom if unsorted. And maybe that helps you locate
something you can't recall the exact name of. And this goes away if
you trash the DB and rescan. Unless maybe you've got some dated or
numbered folder scheme in the Finder.

DF

unread,
Sep 2, 2009, 5:07:26 PM9/2/09
to Soundminer Users Collective


On Sep 2, 1:43 pm, Jeremiah Moore <jeremiahmo...@gmail.com> wrote:
> DF -  You can also re-link an entire database.  If stuff is spread
> across drives, you have to do multiple passes - the highest level you
> can select is a Volume.

Plenty good! :)


>
> p.s.  I've successfully re-lunk databases with 200,000+ records.  I
> just run check filepaths after.

Good to know!

>
> The one bugaboo - and potentially a showstopper - w/ relink is if you
> have duplicate filenames anywhere, you're hosed because Filename is
> the unique ID for relinking.  I once had a bunch of phone buttons I'd
> inherited, with filenames like "1" and "2".  Bad news for relink.

Anyone who knows me knows my fixation for unique names. It was the
basis of the SLP naming scheme, which I still adhere to. Sadly very
few people listen to me when I'm on my soapbox about it, but this is a
pure case for why its necessary.

>
> Justin and Steve are big proponents of "metadata always in the files /
> rescan if in doubt" but a lot of us small creative editors, who
> constantly "work" our metadata, have an ethic of "the database is
> golden, don't touch it!"

I agree, strangely with both. I was recently taken aback though, that
with this philosophy of "Metadata in the files", that the "Save"
button in the info editor, was NOT saving to the files but just to the
database. I was TRYING to have Metadata in the files, but that @#$%
"Save" button wasn't saving to the files. I still have strong
feelings that those buttons need to be more specific in what they're
doing.

Again - thanks for this info on relinking! If this works as it
appears to, I must retract a large chunk of my earlier responses. I
just had no idea it was there.

- Dave

Nathaniel Reichman

unread,
Sep 2, 2009, 10:14:56 PM9/2/09
to Soundminer Users Collective
On September 2, 2009, at 5:07 PM, DF wrote:

Justin and Steve are big proponents of "metadata always in the files /
rescan if in doubt" but a lot of us small creative editors, who
constantly "work" our metadata, have an ethic of "the database is
golden, don't touch it!"

I agree, strangely with both.  I was recently taken aback though, that
with this philosophy of "Metadata in the files", that the "Save"
button in the info editor, was NOT saving to the files but just to the
database. I was TRYING to have Metadata in the files, but that @#$%
"Save" button wasn't saving to the files.  I still have strong
feelings that those buttons need to be more specific in what they're
doing.


I find this really interesting, too.  From the music editorial perspective, we change or update the description fields frequently while we work, and I've always been a little bit annoyed by the way Retrospect is forced to re-backup large amounts of audio data just because we wrote in a little tweak to the metadata of a file and embedded it.  So I DON'T write the metadata into the files, and DO treat my databases as golden.  And, I never rescan.

There are some misconceptions about the way SM works out there, especially surrounding this issue (read my Daw-Mac posts about SM).

I'm also amazed by the number of editors out there suffering with Digibase.  The SM team should consider making a demo of the software available.  Maybe via time-limited iLok asset, I don't know.  What I do know is that demo videos of SM in action should be more apparent on the website (are they only available via FTP?).


Best,

Nathaniel Reichman



Noiz2

unread,
Sep 2, 2009, 11:12:15 PM9/2/09
to Soundminer Users Collective
Well...

I managed to miss that one also. Just when you thought it was safe to
complain!

So many of my delete invalid paths issue are mitigated by this since
If I re-link I wont have invalid paths, probably. I also inherited a
bunch of files and have been renaming as I find issues but I'm certain
there are still errant ones floating around. But that's not SM's
fault.

There is probably a manual I should have read in more detail!

Cheers
SK

Justin

unread,
Sep 3, 2009, 9:39:40 AM9/3/09
to Soundminer Users Collective
Re: Relinking.

We have a bit of a conundrum here as a LOT of our users don't have
unique filenames... So when it relinks it need to do it on a bunch of
criteria.

For what its worth we have plotted out new relink code that will be
multi-threaded, will optionally link on demand(as opposed to having to
update the database). This'll mean that if you simply move a folder
of sounds from one drive to another then it'll remember that move.
There's just so many variants(moving a file into different folders).

Ideally I guess we could say...move the files in Soundminer(via a
virtual finder) and that way we'd know how the files were moved and
update accordingly. But as Dave mentioned there can be multiple
databases, and multiple spotting lists all referencing this now old
path. So it needs to be like a ProTools relink(but better we hope).

Anyway...we are thinking about it, and whatever we do will be an
improvement, but additional thoughts/discussions are welcome.

Noiz2

unread,
Sep 3, 2009, 12:01:02 PM9/3/09
to Soundminer Users Collective
It seems like a natural for a non user editable metadata field, a
unique ID field. One would have to do one big embed at the start but
after that every scanned file gets a unique ID that doesn't nec have
to do with it's file name.

Cheers
SK

DF

unread,
Sep 3, 2009, 12:22:57 PM9/3/09
to Soundminer Users Collective
On Sep 3, 6:39 am, Justin <jaydee1...@gmail.com> wrote:
> Re: Relinking.
>
> We have a bit of a conundrum here as a LOT of our users don't have
> unique filenames... So when it relinks it need to do it on a bunch of
> criteria.
>

It's not for a lack of "Soapbox Preaching" :) I even wrote an article
about the need for this. But even among my close friends, all I ever
got was a casual, "Yeah makes sense, but I never had any problems."
Of course these were the same people that had PT screw up a relink at
some point & call me to try to figure out how to straighten things
out. If they had Unique file names it wouldn't be an issue.

Which reminds me - I really miss the unique number for transfers you
had in SM3. I used that religiously. I do have it adding suffixes,
but without an incrementing number, there's no way to prevent
duplicates from being on the drives somewhere. All I have to do is
forget to change it in transfer, and spot the same file into another
session and WHAM - duplicate filename. So if I may request that go
back in.

Also, while you're reworking whatever relink scheme is final, I BEG
you not to take out what you have now! It'd be a shame to have this
thing working just like I'd like it to, and then lose it. Please make
it another routine & leave what you have as an option!

- Dave

Jeremiah Moore

unread,
Sep 3, 2009, 1:08:32 PM9/3/09
to Soundminer Users Collective, DF, Justin Drury
Thanks for chiming in, Justin. Yes, relink is a potentially critical
function...in a world where the user can move and rename files at
will.

Unique IDs of some sort seem essential to any true hard-linking.
Since there's just no way you can be assured people won't have unique
filenames. (Many people are not so disciplined or even aware as DF.)

just riffing on uniquity:
I wonder if a checksum on the audio data itself could serve as a
unique ID of some sort. (AUID? audio unique ID?) (would it be unique
enough?) Or a hash of audio checksum, track duration, sample rate,
bit depth? This would probably take a lot of processing to generate,
akin to computing Overviews. Could be stored in DB and stamped into
files during Embed. Benefit is that it can be regenerated from the
file if it's not embedded. Of course, if you have a use-case where
somebody opens/edits/saves the file externally, it breaks. Ditto for
corrupted files. But then, you might argue that it's no longer the
same file. Also, checking relinks by this method would be very slow
when the AUID is not embedded. Would want to be optional for sure.

Another thought... this one mac-centric: Don't files and folders in
HFS+ have unique IDs already, per instance of a file/folder on a
particular disk? I recall the terms "filespec" and "folderspec".
First step on a relink could simply query the filesystem and get a new
path. Simple and quick, yes? Is this not how the OS points aliases
to a file even when you move the file? Breaks if the files are
duplicated or moved to a different disk.

UIDs really make it work for Pro Tools. It's slow to relink by them,
but it works reliably if you need it.

-jeremiah

charles deenen

unread,
Sep 3, 2009, 1:17:36 PM9/3/09
to jmo...@northstation.net, Soundminer Users Collective, DF, Justin Drury

(justin's going to hit me on the head for chiming in too :)

In a database with a massive amount of files, I do have thousands of
duplicate file-names and files. yet I'm faced with a database with
also thousands of unlinked/missing files.
I've been going through them by batch-renaming the paths etc, but it's
been taking days of my time, and not even 1/10th done.
I can't do a rescan, since I have thousands of files (or more) which
don't have embedded meta-data, and no way to "backup" the meta-data to
them. I'd have to convert them all first to BWAV, AIFF or something.
And here lies another problem. Haven't found a single program yet that
does this reliably without crashing (and I've tried many)...

So that brings me to that I still agree that a smarter, more robust
relink feature is needed. Something where the user can specify certain
criteria:
- file-name only
- file-name and last X entries in the path
etc. etc.

Then maybe someday I'll have a lib which actually will play every
single file :)

cheers

-cd

Noiz2

unread,
Sep 3, 2009, 3:09:26 PM9/3/09
to Soundminer Users Collective
...
> It's not for a lack of "Soapbox Preaching" :)  I even wrote an article
> about the need for this.  But even among my close friends, all I ever
> got was a casual, "Yeah makes sense, but I never had any problems."....

Well you trained me!, or maybe I should say SLP did.

> Which reminds me - I really miss the unique number for transfers you
> had in SM3.  I used that religiously.  I do have it adding suffixes,
> but without an incrementing number, there's no way to prevent
> duplicates from being on the drives somewhere.  All I have to do is
> forget to change it in transfer, and spot the same file into another
> session and WHAM - duplicate filename.  So if I may request that go
> back in.
>

To sort of pile on. I would really like to see the option for project
specific naming. ie If a file is transfered out os SM with any
processing, VST speed changes etc. That I could have those files have
the project Id and some incrementing numbering scheme appended to the
file name. If your working on a big film it's not that big a deal but
some of us do a lot of smaller freelance jobs and I often find my self
on three or four projects at the same time. Right now I working on a
feature, three shorts, a trailer and seven one act plays. When its
all over and after I archive a PT session, I go through and toss any
library sounds and add the rest to the library. I don't nec keep
anything that had a minor change but it would be nice to have project
specific naming so A) I could easily tell what to immediately toss and
B) so all the altered files had new unique names and I would know what
project they were created for. Most of this is possible now but
requires going into general preffs every time you change projects and
making sure you type the same options you did last time etc. And
using a renaming app after the fact to add incremental numbering.
It's OK when things are slow but when they get crazy it's easy to
slip.
Cheers
SK

> - Dave

DF

unread,
Sep 4, 2009, 12:40:25 PM9/4/09
to Soundminer Users Collective
These could prevent many many problems in the future. It won't help
past issues, but like unique file names, this has some real merit.

1) Unique number option on transfers like we had in SM3. NOTHING
beats a Finder-level unique name.
2) Unique number generated by hardware and an internal counter (the
user could also set up some ID info to use here), and embedded into a
"parent" field. This field could be a future relink option.
3) Unique number field (same as #2) SM generates numbers for "Selected
Record" that have this field empty.
4) A "child" field, with the parent file's parent ID. With this, you
can find every file that has ever been spotted from a file, or get
back to the original file. I had this in SLP, if a user created a
record FROM another record, the 2 would be linked (or 3 or 10, or
100......)

The way the parent/child worked is like this:

1) I spot a section of an explo file.
2) The unique ID of the "parent" file, goes into the "child" field of
the new file, and the spotted file gets a unique parent number like in
the above step 2.
3) If something is ever spotted from a "child" file, the "child" value
gets transferred to the spotted file's "child" field. (Basically,
when spotting it's an "If child value exists use child value, else use
parent value)
4) To find all files/records from this parent/child relationship, we
search both the parent & child field for the child value. (If no child
value present, we use the parent value to find all the children)

Example - first "record" is a parent, the rest are children (files
spotted directly from the parent file):

Source File name - Parent ID - Child ID
Explosion DF1000 - MYHARDWAREID12345 - (Child ID empty)

Explosion DF1001 - MYHARDWAREID12346 - MYHARDWAREID12345
Explosion DF1002 - MYHARDWAREID12347 - MYHARDWAREID12345
Explosion DF1003 - MYHARDWAREID12348 - MYHARDWAREID12345

Now if we spot from that last "child" file the new "child" is:
Explosion DF1004 - MYHARDWAREID12349 - MYHARDWAREID12345

Now to find all of these "related" files we search the parent & child
fields for - MYHARDWAREID12345

Simplified:
The Parent ID ALWAYS is a unique new ID, the Child ID takes on the
Child ID of the file it came from (or the parent ID if the source file
has no Child ID).

In this way we can always find the parent files only as well, by
searching for empty child fields, etc..

This is more forward thinking than trying to recover past issues. But
as our libraries get larger, it actually gets harder & harder to find
things with more & more records of duplicate material. We need to
implement some ideas like this NOW to get control of our data down the
line.

- Dave

Jeremiah Moore

unread,
Sep 11, 2009, 4:17:50 PM9/11/09
to david.f...@gmail.com, Soundminer Users Collective
Thanks DF!

Your point is quite salient: preserving parent/child relationships in
metadata will provide richer ways to navigate / manage / organize our
sounds in the future.

It becomes possible to imagine a "tree browser" in which one can
navigate the "lineage" of files. (Of course, there's still the issue
of protools audiosuite for instance not playing along)


Terminology is a point of minor confusion to me... It seems logical
(to me) that the field attached to a file which contains the ID of
it's source would be called "Parent," as in "This file's Parent is
__xxx_file__" It seems you're calling it "Child" as in "this file is
a child of _xxx file__"


-jeremiah
Reply all
Reply to author
Forward
0 new messages