Tags: backslashes, essentials, file, fileseparator, folks, groups, java, multiple, replacing, separator, slashes, value
Replace multiple backslashes and forward slashes with File.separator
i'm looking for a way of replacing groups of backslashes and forward slashes with the File.separator value.
would be converted to
I tried splitting it into two passes one for backslashes and one for forward
but no joy.
String path = "com\\\\string/some///other////package "
String pass1= path.replaceAll( "\\*" , File.separator);
String pass2= pass1.replaceAll( "/" , File.separator);
but the forward slash pass throws an out of index exception
And by removing the second pass and just working with the first pass
instead of removing groups of backslashes it just replaces all the
backslashes in the string with backslashes i.e. no visible effect.
If anyone has any idea as to how to fix this that would be great.
I've got it working with a StringTokenizer and the delimiter string "\\/"
but I'd like to figure out why the regex didn't work..
Leave a comment...
- 9 Comments
"\\*" should be "\\\\+" (one or more backslashes. You need 4--yes, 4--backslashes in a Java string literal to get a single literal backslash in a regex.)
File.separator could contain \ which need to be escaped before regex sees it, so you'd probably need to do something likeString sep = System.getProperty("file.separator".replaceAll("\\\\", "\\\\\\\\");
and then use sep where you're using File.separator
Or something like that.
Why do you need to do this though?#1; Mon, 16 Jul 2007 02:02:00 GMT
The correct regex for this is "[/\\\\]+". That's a character class which matches either a forward slash or a backslash, and the plus sign causes it to match one or more. Using this regex, you only have to make one pass. As targaryen pointed out, backslashes are also special in the replacement string, so they have to be escaped. Here's another way to do that:
path = path.replaceAll("[/\\\\]+", "\\" + File.separator);#2; Mon, 16 Jul 2007 02:02:00 GMT
- Thanks for the refinement uncle_alice.#3; Mon, 16 Jul 2007 02:02:00 GMT
Thanks a million for this. I can't believe it was quite so hard to find?
I'm attempting to write a static path sanitizer method that will take in paths
that may look something like this
and convert it to
I'll give that a bash now.
Mark.#4; Mon, 16 Jul 2007 02:02:00 GMT
- Hi uncle_alice, Thanks also for your help. I'd never have figured that out in a million years.Why is it so complicated? Mark.#5; Mon, 16 Jul 2007 02:02:00 GMT
here's a couple of interesting sites about regular expressions
and the split method is quite handy
to answer your question about complicated,
it seems that way to us but it's simple for uncle_alice
Walken16#6; Mon, 16 Jul 2007 02:02:00 GMT
I hope this slash conversion voodoo is actually necessary -- only necessary if the paths are being written into a script or otherwise externalized to the OS. If you're doing this just to use when constructing a FileInputStream for example, you're doing too much work. Just use forward slashes in your Java code and it will translate them into the platform-specific path names for you.#7; Mon, 16 Jul 2007 02:02:00 GMT
> Why is it so complicated?
If you mean why so many backslashes, it's because regexes and String literals both use the backslash as an escape character. If you want the regex compiler to see one backslash, you have to put two in the String literal. But if you want to match a backslash, you have to escape it with another backslash, which means putting four in the String literal.
In the second argument to replaceAll(), dollar signs are special because you can embed group references like $0, $1, etc. in the string. To insert a literal '$' in the result, you have to escape it with a backslash. That means literal backslashes also have to be escaped, so it's four for one in the second argument, too.#8; Mon, 16 Jul 2007 02:02:00 GMT
- Thanks a million Uncle_alice.This works perfectly for me now. Thanks for the explanation as to the use of all the backslashes in the replace string.Mark.#9; Mon, 16 Jul 2007 02:02:00 GMT