It took me entirely too much time to write this (simple?) Ruby script.
Warning to the TDD folks, there are NO tests athough it would NOT be
difficult to write them. I chose (wisely or otherwise) to rely on
numbers (missing and total) to tell me whether or not I've missed
anything.
So, what does this behemouth do? It simply iterates through a single
directory looking for files named with a particular, sequential format
and tracks any files missing from the sequence. My need is to identify
large sequences of missing files so I can attempt to locate them
elsewhere.
Where do I think this script needs work? Range boundry checking. (What
if the first few files in the sequence are missing?) Also, I hate the
definition of the current, previous, next, etc. files although their
names do have a cerain DSL feel! I would think there should be a
regular expression that could do the trick but me and regex are not on
the same planet.
Of course the script could be parameterized... And I noticed a
performance hit when I added the calls to File.basename but it was
minimal and I liked the look of that call over "DSC_" + sprintf("%04d",
i) + ".JPG".
Any teachers/professors out there? I could not help but think that this
would make a really cool programming exercise. But then, who teaches
Ruby in high school or college?!
Anyway, the list has been a little quite so I thought I would toss you
all some red meat. Have at it.
Bill
#!/usr/bin/env ruby
directory = '/some/directory/where/you/keep/photos/'
start_range = 1
end_range = 9999
missing = 0
total = 0;
for i in start_range..end_range do
# I know we're treading on thin ice here cocerning range bounds but
I'm a
# live life on the edge" kind of guy!
the_file_before_that = directory + "DSC_" + sprintf("%04d", i - 2) +
".JPG"
prev_file = directory + "DSC_" + sprintf("%04d", i - 1) + ".JPG"
current_file = directory + "DSC_" + sprintf("%04d", i) + ".JPG"
next_file = directory + "DSC_" + sprintf("%04d", i + 1) + ".JPG"
the_file_after_that = directory + "DSC_" + sprintf("%04d", i + 2) +
".JPG"
# Let's keep track of the total missing.
if !File.exists?(current_file) then
total += 1
end
# Singletons - Current does not exist but previous and next do.
if File.exists?(prev_file) &&
!File.exists?(current_file) &&
File.exists?(next_file) then
print File.basename(current_file) + "\n"
# Start of a series of missing files.
elsif File.exists?(prev_file) &&
!File.exists?(current_file) &&
!File.exists?(next_file) then
# Exclude cases where only two sequential files are missing!
if File.exists?(the_file_after_that) then
print File.basename(current_file) + "\n"
else
print File.basename(current_file)
end
# End of a series of missing files.
elsif !File.exists?(prev_file) &&
!File.exists?(current_file) &&
File.exists?(next_file) then
# Exclude the count when the series included only two files since
other
# script logic prevents series of two files from being printed on a
single
# line.
if File.exists?(the_file_before_that) then
print File.basename(current_file)+ "\n"
else
missing += 2
print File.basename(current_file) + " (" + missing.to_s + ")\n"
end
missing = 0
# In the middle of a series of missing files.
elsif !File.exists?(current_file) then
unless (missing > 0) then print "-" end
missing += 1
end
end
print "Total missing from this sequence: " + total.to_s + "\n"
if File.exists ?(prev_file) &&
If I've read my own code correctly, 11 separate calls to File.exists
are made for each iteration in the worst case and a best case of 4
calls per iteration. Ehren's makes one call per iteration as the
best/worst case; not to mention about a 50% reduction in code. The
difference is clearly attributable to design. I'm humbled. To my
defense, I only loop once whereas Ehren's code loops three times,
meaning that for large ranges with many missing files my version may
begin to approach Ehren's design in performance (perhaps...).
BB
On Nov 14, 4:02 pm, "ehren murdick" <ehren.murd...@gmail.com> wrote:
> i've attached my solution.
> here's what the output looks like:
>
> ehren@laptop:~/sandbox$ ruby missing.rb
> DSC_0001.JPG to DSC_0002.JPG : 2 missing in this series
> DSC_0004.JPG to DSC_0006.JPG : 3 missing in this series
> DSC_0008.JPG to DSC_0098.JPG : 91 missing in this series
> DSC_0102.JPG to DSC_0988.JPG : 887 missing in this series
> DSC_0990.JPG to DSC_9999.JPG : 9010 missing in this series
> total series:5
> total missing files:9993
>
> > if File.exists?(prev_file) &&
> missing.rb
> 1KDownload
> > if File.exists ?(the_file_after_that) then
Well, I went a different way and paid more attention to the spirit of
the spec that to Bill's code, but still skipped the tests ;-)
rab:Data $ cd 2004/2004-09-19
rab:2004-09-19 $ ~/code/ruby/file_ranges.rb | grep -e missing
2 missing: 100_1319.jpg .. 100_1320.jpg
1 missing: 100_1369.jpg
1 missing: 100_1376.jpg
1 missing: 100_1384.jpg
2 missing: 100_1388.jpg .. 100_1389.jpg
rab:2004-09-19 $ ~/code/ruby/file_ranges.rb
100_1317.jpg .. 100_1318.jpg
2 missing: 100_1319.jpg .. 100_1320.jpg
100_1321.jpg .. 100_1368.jpg
1 missing: 100_1369.jpg
100_1370.jpg .. 100_1375.jpg
1 missing: 100_1376.jpg
100_1377.jpg .. 100_1383.jpg
1 missing: 100_1384.jpg
100_1385.jpg .. 100_1387.jpg
2 missing: 100_1388.jpg .. 100_1389.jpg
100_1390.jpg .. 100_1394.jpg
rab:2004-09-19 $ cd ../../2006/Roll\ 41
rab:Roll 41 $ ~/code/ruby/file_ranges.rb
Photo_101906_001.jpg .. Photo_101906_015.jpg
rab:Roll 41 $
I don't try to have a fixed range or a fixed file name format. I only
expect that the variable number portion is the last set of contiguous
digits before any extension.
I'm also printing both the existing ranges and the missing ranges
(counts for the missing). As you can see from my output above, you
can easily grep for just the missing lines or leave all the output to
see the low and high ends of the existing files. In my picture
directories, the numbers rarely began at 1 or ended anywhere near all
9's.
Oh, and I didn't have even a single call against File.
-Rob
--
Rob Biedenharn
Personal: Rob_Bie...@alum.mit.edu
Professional: R...@AgileConsultingLLC.com
The "What the?" Award goes to Rob B. Two lines in (OK, three lines in
minus the comments and whitespace) and I was lost but you can't beat
the versatility. Just don't ask me to maintain it!
The "Damn! That's some beautiful code!!!" Award goes to Scott B.
Practical. Succinct. Legible. Functional.
Both these guys make me want to hand my PowerBook to the next homeless
guy I see and check-in to the nearest Tibetan monastery!
"Very Honorable Mentions" to Ehren and Matt who caused me to question
my approach which is exactly what I was looking for in submitting my
code originally.
I just hope Jim Weirich doesn't post a solution or I am certain it is
off the Air Tibet ticket counter for me!
BB
Both these guys make me want to hand my PowerBook to the next homeless
guy I see and check-in to the nearest Tibetan monastery!
rab:ruby $ ./fr_benchmark.sh
file_ranges_bill.rb
min / avg / max
user 0.395/0.396/0.403
real 1.153/1.173/1.713
sys 0.757/0.759/0.768
file_ranges_ehren.rb
min / avg / max
user 0.812/0.823/0.842
real 0.827/0.842/0.868
sys 0.014/0.016/0.019
file_ranges_matt.rb
min / avg / max
user 0.083/0.084/0.085
real 0.207/0.208/0.222
sys 0.123/0.124/0.126
file_ranges_rob.rb
min / avg / max
user 0.018/0.018/0.018
real 0.025/0.025/0.027
sys 0.007/0.007/0.008
file_ranges_scott.rb
min / avg / max
user 0.028/0.028/0.029
real 0.035/0.036/0.057
sys 0.007/0.007/0.009
We're going to have a lot to talk about Monday. (Do you want to go Bill?)
-Rob
--
Rob Biedenharn
Personal: Rob_Bie...@alum.mit.edu
(C) 513-295-4739
-Rob
OK, here's a new set of benchmark numbers that have all 4 of Matt's
solutions (matt and matt_v1 should be very close since they're nearly
identical -- I have to make a few little changes to get these to run
on my pictures in my directory). Looks like the use of Fixnums for
most of the work is a big win.
-Rob
file_ranges_bill.rb
min / avg / max
user 0.395/0.397/0.401
real 1.157/1.169/1.256
sys 0.761/0.764/0.769
file_ranges_ehren.rb
min / avg / max
user 0.821/0.835/0.903
real 0.841/0.879/1.159
sys 0.016/0.019/0.031
file_ranges_matt.rb
min / avg / max
user 0.084/0.084/0.086
real 0.210/0.214/0.322
sys 0.125/0.125/0.128
file_ranges_matt_v1.rb
min / avg / max
user 0.082/0.083/0.087
real 0.206/0.224/0.486
sys 0.123/0.124/0.130
file_ranges_matt_v2.rb
min / avg / max
user 0.083/0.084/0.087
real 0.209/0.217/0.506
sys 0.125/0.125/0.130
file_ranges_matt_v3.rb
min / avg / max
user 0.009/0.009/0.012
real 0.017/0.033/0.325
sys 0.007/0.007/0.011
file_ranges_rob.rb
min / avg / max
user 0.018/0.018/0.019
real 0.026/0.032/0.275
sys 0.007/0.007/0.009
file_ranges_scott.rb
min / avg / max
user 0.028/0.028/0.031
real 0.036/0.045/0.374
sys 0.007/0.007/0.010
new set of benchmark numbers
Except for the variety of output formats, it might be instructional to
use part of the CodeFest to discuss (and BUILD) some acceptance tests
for this.
-Rob
Anyway, I think the solution is pretty flexible to other types of
string ranges and the Array#ranges extension could come in handy in
the future.
I don't expect it to be particularly speedy, but at least there are
tests ;)
Anyway, comments welcome. I'm new to the group and hope to make it
to the meeting on Monday ;p
Brian
;p
I just hope Jim Weirich doesn't post a solution or I am certain it is
off the Air Tibet ticket counter for me!
Also, you can have it set to send a digest. In which case it might take
awhile to get the posts in the email.
ed
Email (Approximately 4 emails per day)
Send each message to me as it arrives
Brian
;p
-Rob
> ------=_Part_2717_30726253.1163838152240
> Content-Type: text/html; charset=ISO-8859-1
> X-Google-AttachSize: 1671
>
> On 11/15/06, <b class="gmail_sendername">Bill</b> <<a href="mailto:booksma...@gmail.com">booksma...@gmail.com</a>> wrote:<div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> I just hope Jim Weirich doesn't post a solution or I am certain it is<br>off the Air Tibet ticket counter for me!</blockquote><div><br>Dang ... I wasn't going to respond, but now how can I resist. Sorry about the late response, but I'm at the Rails Edge conference and am behind on mail.
> <br><br></div></div>Notes: I used "IMG_xxxx.jpg" as the pattern (because that's how files are stored on my system). Unit tests are attached. The solution is a bit overly "golfed" (14 lines currently).
> <br><br>Here is sample output on my system:<br><br>$ ruby seq.rb <br>Missing IMG_0001.jpg ... IMG_0601.jpg (601 files)<br>Missing IMG_0607.jpg ... IMG_0618.jpg (12 files)<br>Missing IMG_0642.jpg ... IMG_0643.jpg (2 files)
> <br>Missing IMG_0646.jpg ... IMG_0646.jpg (1 files)<br>Missing IMG_0692.jpg ... IMG_0692.jpg (1 files)<br>Missing IMG_0696.jpg ... IMG_0696.jpg (1 files)<br>Missing IMG_0917.jpg ... IMG_9999.jpg (9083 files)<br><br>-- <br>
> -- <br>-- Jim Weirich <a href="mailto:j...@weirichhouse.org">j...@weirichhouse.org</a> <a href="http://onestepback.org">http://onestepback.org</a><br>-----------------------------------------------------------------<br>
> "Beware of bugs in the above code; I have only proved it correct,<br>not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)
>
> ------=_Part_2717_30726253.1163838152240--
--- /Users/rab/Desktop/testseq.rb 2006-11-18 16:27:25.000000000
-0500
+++ test_file_ranges_jim.rb 2006-11-18 17:47:06.000000000 -0500
@@ -1,9 +1,12 @@
#!/usr/bin/env ruby
# -*- ruby -*-
+# Jim Weirich
+
require 'test/unit'
+require 'rubygems'
require 'flexmock'
-require 'seq'
+require 'file_ranges_jim'
class TestMissing < Test::Unit::TestCase
include FlexMock::TestCase
@@ -49,4 +52,16 @@
check_missing(n) { |lo, hi| @mock.check(lo, hi) }
end
end
+
+ def test_file_ordering
+ @mock.should_receive(:check).with( 1, 2000).once
+ @mock.should_receive(:check).with(2004, 9999).once
+ %w[ IMG_2001.jpg
+ IMG_2002.jpg
+ IMG_2003.jpg
+ IMG_10000.jpg ].sort.uniq.each do |fn|
+ n = fn.match(/(\d+)/).captures.first.to_i
+ check_missing(n) { |lo, hi| @mock.check(lo, hi) }
+ end
+ end
end
rab:ruby $ ./test_file_ranges_jim.rb
Loaded suite ./test_file_ranges_jim
Started
..F....
Finished in 0.002301 seconds.
1) Failure:
test_file_ordering(TestMissing)
[/usr/local/lib/ruby/gems/1.8/gems/flexmock-0.4.3/lib/flexmock.rb:239:in
`check'
/usr/local/lib/ruby/gems/1.8/gems/flexmock-0.4.3/lib/flexmock.rb:301:in
`call'
/usr/local/lib/ruby/gems/1.8/gems/flexmock-0.4.3/lib/flexmock.rb:111:in
`method_missing'
/usr/local/lib/ruby/gems/1.8/gems/flexmock-0.4.3/lib/flexmock.rb:248:in
`mock_wrap'
/usr/local/lib/ruby/gems/1.8/gems/flexmock-0.4.3/lib/flexmock.rb:108:in
`method_missing'
./test_file_ranges_jim.rb:64:in `test_file_ordering'
./file_ranges_jim.rb:8:in `check_missing'
./test_file_ranges_jim.rb:64:in `test_file_ordering'
./test_file_ranges_jim.rb:60:in `test_file_ordering']:
in mock 'checker': no matching handler found for check(1, 9999)
7 tests, 1 assertions, 1 failures, 0 errors
This "test" shows the defect (which Jim would've surely seen if he had
a few more files ;-). However, I couldn't see how to write a test that
exposed the bug in seq.rb since it's not actually in check_missing.
To fix the defect, I also have to then fix the test ;-( I'd love to
see how this should be addressed.
-Rob
This "test" shows the defect (which Jim would've surely seen if he had
a few more files ;-). However, I couldn't see how to write a test that
exposed the bug in seq.rb since it's not actually in check_missing.