Parsing file paths with regular expressions

jayh...@gmail.com

unread,

Sep 14, 2005, 2:37:17 PM9/14/05

to

I'm trying to write a method that strips excess file separator
characters from a file path, so it would get something like this as
input: C:\\\Documents and Settings\\My Documents\\\\\Work\\\ and return
something more like this: C:\Documents and Settings\My Documents\Work\

My code looks like this:

private String getCorrectPath(String input) {
char fs = File.separatorChar;
String newInput;
try {
String regex = (Character.isLetter(fs) ? String.valueOf(fs) : "\\"
+ fs) + "{2,}";
newInput = input.replaceAll(regex, String.valueOf(fs))
}
catch(Exception e) {
return null;
}
return newInput;
}

This seems to work fine for something like Unix, but whenever I run it
on Windows, where the file separator is the same as the quoting
character, I get the following exception:

java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.charAt(Unknown Source)
at java.util.regex.Matcher.appendReplacement(Unknown Source)
at java.util.regex.Matcher.replaceAll(Unknown Source)
at java.lang.String.replaceAll(Unknown Source)

Can anybody help me figure out what's going on?

Thanks,

Jay

jan V

unread,

Sep 14, 2005, 2:49:30 PM9/14/05

to

> My code looks like this:
>

> catch(Exception e) {
> return null;
> }

This style will hurt you big time if you continue using it.. you should
always catch exception types that are as specific (tight) as possible for
the logic contained by the try block.

Roedy Green

unread,

Sep 14, 2005, 3:09:39 PM9/14/05

to

On 14 Sep 2005 11:37:17 -0700, jayh...@gmail.com wrote or quoted :

>This seems to work fine for something like Unix, but whenever I run it
>on Windows, where the file separator is the same as the quoting
>character, I get the following exception:

there your splitter looks like this "\\\\" to get a represent the
literal \ You have to double once for Java and once for regex. See
http://mindprod.com/jgloss/regex.html
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

Joan

unread,

Sep 14, 2005, 3:14:55 PM9/14/05

to

<jayh...@gmail.com> wrote in message
news:1126723037.8...@g44g2000cwa.googlegroups.com...

> I'm trying to write a method that strips excess file separator
> characters from a file path, so it would get something like
> this as
> input: C:\\\Documents and Settings\\My Documents\\\\\Work\\\
> and return
> something more like this: C:\Documents and Settings\My
> Documents\Work\

I use this and it works great.
String pp = (new File(filename)).getCanonicalPath();

Thomas Hawtin

unread,

Sep 14, 2005, 3:18:11 PM9/14/05

to

jayh...@gmail.com wrote:
>
> private String getCorrectPath(String input) {
> char fs = File.separatorChar;
> String newInput;
> try {
> String regex = (Character.isLetter(fs) ? String.valueOf(fs) : "\\"
> + fs) + "{2,}";
> newInput = input.replaceAll(regex, String.valueOf(fs))
> }

> This seems to work fine for something like Unix, but whenever I run it

> on Windows, where the file separator is the same as the quoting
> character, I get the following exception:
>
> java.lang.StringIndexOutOfBoundsException: String index out of range: 1
> at java.lang.String.charAt(Unknown Source)
> at java.util.regex.Matcher.appendReplacement(Unknown Source)
> at java.util.regex.Matcher.replaceAll(Unknown Source)
> at java.lang.String.replaceAll(Unknown Source)

Matcher.quoteReplacement is your friend. Use it like you would
PreparedStatement in JDBC (assuming JRE 5.0 or later). \ is significant
in replacement strings as well as in the regex itself. It needs to be
escaped.

Perhaps a better API design would be to throw an
IllegalArgumentException whenever the replacement text is illegal.

Tom Hawtin
--
Unemployed English Java programmer
http://jroller.com/page/tackline/

David Segall

unread,

Sep 15, 2005, 9:43:59 AM9/15/05

to

"jan V" <n...@nul.be> wrote:

>> My code looks like this:
>>
>> catch(Exception e) {
>> return null;
>> }
>
>This style will hurt you big time if you continue using it..

How? I can see that he should produce some diagnostic output, and
possibly terminate the program, but why is it important to cater of
each exception individually?

jan V

unread,

Sep 15, 2005, 10:05:50 AM9/15/05

to

"David Segall" <da...@nowhere.net> wrote in message
news:dguii1he3bjmmo3d6...@4ax.com...

> "jan V" <n...@nul.be> wrote:
>
> >> My code looks like this:
> >>
> >> catch(Exception e) {
> >> return null;
> >> }
> >
> >This style will hurt you big time if you continue using it..

> How? why is it important to cater of each exception individually?

[The following text is taken from "Mastering Javabeans", Copyright (c) 1997
Sybex]

When a method throws a lot of different exceptions, it is tempting to
succumb to laziness and simply catch the root Exception type once
instead of laboriously specifying a catch clause for every possible
Exception subclass declared in the method's throws clause. As usual
when programming, the lazy "trick" can come back to haunt you when
you hit a bug. The problem with using a blanket catchall is that you will
stop the JVM from throwing those all-important subclasses of Runtime-
Exception at the spot where they occur. These include, among others,
the following tell-tale bug detectors par excellence:

· ArithmeticException
· ArrayIndexOutOfBoundsException
· ClassCastException
· ClassNotFoundException
· CloneNotSupportedException
· IllegalArgumentException
· IllegalMonitorStateException
· IndexOutOfBoundsException
· NullPointerException
· NumberFormatException
· SecurityException

Any Java programmer with a modicum of Java experience knows these
exceptions well, as they can be thrown by code involving daily breadand-
butter things like math, arrays, casting, cloning objects, invoking
methods, parameter passing, and threads. The problem with blanket
catchalls is that these usually have the side effect of throwing away
important information. Here is an example of a problematic catchall:

try {
// lots of code here
// more code here, can throw a whole mix of Exceptions
} catch (Exception allOfThem) { // LAZY !!
System.out.println("Oh dear, our XYZ step failed!");
}

If a common, and often bug-related, exception like a NullPointer-
Exception happened anywhere within the try block, then you would
not even have a clue that it happened because you would probably
think that some method threw a checked exception instead, and not that
an even more important unchecked exception occurred (unchecked
exceptions are all instances of classes RunTimeException and Error,
and their subclasses).

So, the lesson should be clear. Always explicitly include a catch clause
for every checked exception, so that run-time exceptions will halt your
program during development. For example, when you need to invoke
Constructor.newInstance(), you should use the following code template:

try {
object = someConstructor.newInstance(.);
} catch (InstantiationException x) {
// suitable code
} catch (IllegalAccessException x) {
// suitable code
} catch (IllegalArgumentException x) {
// suitable code
} catch (InvocationTargetException x) {
// suitable code
}

This verbosity will repay itself hundredfold in debugging hours saved.
So, in effect, this is the true lazy approach! (A good programmer is lazy,
but uses defensive coding techniques to be able to afford this laziness.)

kempshall

unread,

Sep 15, 2005, 11:39:28 AM9/15/05

to

No kidding, but I'm not all that familiar with the regex API so I'm not
really sure what exceptions the String.replaceAll method can throw. The
StringOutOfBoundsException, for example, isn't even listed in the API
-- only the PatternSyntaxException is. I can always make the code
tighter once I have some idea of what's going on.

Oliver Wong

unread,

Sep 15, 2005, 11:54:55 AM9/15/05

to

"kempshall" <jayh...@gmail.com> wrote in message
news:1126798768.5...@g49g2000cwa.googlegroups.com...

Assuming you mean StringIndexOutOfBoundsException, this is a
RuntimeException which frequently aren't documented.

- Oliver

jan V

unread,

Sep 15, 2005, 2:16:13 PM9/15/05

to

"kempshall" <jayh...@gmail.com> wrote in message
news:1126798768.5...@g49g2000cwa.googlegroups.com...

StringIndexOutOfBoundsException is an unchecked exception.. and you need
these kinds of exceptions to halt your program (at the *very* least during
development) so that you can plug the cause of the exception. Your program
ending due to a StringIndexOutOfBoundsException is normally a sign that
you've got a bug somewhere..

"I can always make the code tighter once I have some idea of what's going

on." ... this is a very dangerous technique, because it's so easy to forget
to tighten things later.. mind you, there are a number of code quality
analysis tools which will flag catch Exception in your code.

Thomas Hawtin

unread,

Sep 15, 2005, 2:33:13 PM9/15/05

to

Oliver Wong wrote:
> "kempshall" <jayh...@gmail.com> wrote in message
> news:1126798768.5...@g49g2000cwa.googlegroups.com...
>
>>No kidding, but I'm not all that familiar with the regex API so I'm not
>>really sure what exceptions the String.replaceAll method can throw. The
>>StringOutOfBoundsException, for example, isn't even listed in the API
>>-- only the PatternSyntaxException is. I can always make the code
>>tighter once I have some idea of what's going on.

I looked up the "not a bug" for this:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4689750

> Assuming you mean StringIndexOutOfBoundsException, this is a
> RuntimeException which frequently aren't documented.

The advice in Effective Java, IIRC, is to always document runtime
exceptions.

Often NPEs aren't documented, nor do the docs hint as to whether null is
acceptable. In fact you'll quite often get an NPE from a subsequent
method. That indicates to me that nulls and RuntimeExceptions should
have attention paid to them.

Oliver Wong

unread,

Sep 15, 2005, 2:57:33 PM9/15/05

to

"Thomas Hawtin" <use...@tackline.plus.com> wrote in message
news:4329bf43$0$17465$ed2e...@ptn-nntp-reader04.plus.net...

> Oliver Wong wrote:
>> Assuming you mean StringIndexOutOfBoundsException, this is a
>> RuntimeException which frequently aren't documented.

When I said this, I meant it as "Here's how it is in practice; learn to
deal with it." not "Here's the best practice that I recommend everyone
follows."

> The advice in Effective Java, IIRC, is to always document runtime
> exceptions.

I disagree with "always". More details below.

> Often NPEs aren't documented, nor do the docs hint as to whether null is
> acceptable. In fact you'll quite often get an NPE from a subsequent
> method. That indicates to me that nulls and RuntimeExceptions should have
> attention paid to them.

I think the majority of NullPointerExceptions come from unintentional
bugs in the code. Given that these bugs are unintentional, you can hardly
expect them to be documented. HOWEVER, in my opinion, if you're writing a
method, and you require that the parameters not be null, and you check them
and find out that they are indeed null, I recommend throwing
IllegalArgumentException instead of NullPointerException. I think the former
is much more descriptive of what the problem was, when seen in a stack
trace. Similarly, if you're writing a method which reads instance fields
which you require to not be null, but they do turn out to be null, I
recommend throwing IllegalStateException.

In both these situations, I think it's a bit overboard to put @throws
javadoc tags that explicitly say that IllegalArgumentException may be
thrown. I think it would suffice to put a comment stating your requirements
for the parameters, either in the main body of the JavaDoc comment, or in
the @param tags.

- Oliver

jan V

unread,

Sep 15, 2005, 4:00:44 PM9/15/05

to

> I think the majority of NullPointerExceptions come from unintentional
> bugs in the code.

"Unintentional bugs"...?! For all those years I thought that was the only
type of bug... ;-)

> in my opinion, if you're writing a
> method, and you require that the parameters not be null, and you check
them
> and find out that they are indeed null, I recommend throwing
> IllegalArgumentException instead of NullPointerException. I think the
former
> is much more descriptive of what the problem was, when seen in a stack
> trace.

Totally with you on this one. When I'm bored, I browse my library methods to
try and find places where I could insert

if (arg == null) throw new IllegalArgumentException("...........");

> In both these situations, I think it's a bit overboard to put @throws
> javadoc tags that explicitly say that IllegalArgumentException may be
> thrown.

I don't think so... I like my docs to be as explicit as possible.