Page 1 of 1
About Question enthuware.ocpjp.v7.2.1425 :
Posted: Wed Mar 27, 2013 6:52 am
by aleksey232
Q: Which of the following patterns will correctly capture all Hex numbers that are delimited by at least one whitespace at either end in an input text?
A: (\s|\b)0[xX][0-9a-fA-F]+(\s|\b)
"0x22" does not contain any spaces, but the number will still be captured.
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Wed Mar 27, 2013 8:47 am
by admin
You need the delimiter if there are multiple numbers in the string. In your example, there is only one number, which matches the pattern, so there is no question of delimiter.
In other words, the question does not ask you to match white space. It asks you to use white space as a delimiter.
HTH,
Paul.
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Thu Mar 28, 2013 5:09 am
by aleksey232
Then I think that question should clarify that input string must contain whitespace delimiters.
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Thu Mar 28, 2013 7:04 am
by admin
Hi Aleksey,
I am not sure what you mean because the question does say, "...are delimited by at least one whitespace...". So it is clear that whitespace is to be used as a delimiter.
HTH,
Paul.
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Fri Sep 20, 2013 9:06 am
by The_Nick
Would it make sense using & operator in regex? or better there is the chance of getting question on it:)?
The_Nick.
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Sat Mar 15, 2014 5:08 pm
by accurate_guy
"0x1+0x2" contains two hex numbers delimited by "+" but not a space. Nevertheless they are matched by the pattern.
The problem with the pattern is that there exist characters which are not a space but form a word boundary. This applies to all non-word characters (eg: "0x1@0x2").
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Sat Mar 15, 2014 9:22 pm
by admin
You are right. The pattern should be: (\s|^)0[xX][0-9a-fA-F]+(\s|$)
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Sun Mar 16, 2014 4:20 am
by accurate_guy
Thanks for your fast replies.
A problem of the new pattern is that it doesn't match two hex numbers separated by just one space, eg "0x1 0x2". Only the first hex number is matched because the space is already consumed by the first one.
To fix this I've added \G (the end of the previous match):
(^|\s|\G)0[xX][0-9a-fA-F]+(\s|$)
I am not sure if the \G operator needs to be known in the exam.
In practice I would put the hex number itself inside parenthesis (as a capturing group) to exclude the spaces from the match.
Regards
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Sun Mar 16, 2014 5:24 am
by admin
Are you sure, I just tried it and it matches correctly. Here is the code:
Code: Select all
Pattern pattern =
Pattern.compile("(\\s||^)0[xX][0-9a-fA-F]+(\\s||$)");
Matcher matcher = pattern.matcher("0x22 0x44");
while (matcher.find()) {
System.out.println("Found the text "+matcher.group()+" starting at " +matcher.start()+" and ending at index "+ matcher.end());
}
Output:
Code: Select all
Found the text 0x22 starting at 0 and ending at index 5
Found the text 0x44 starting at 5 and ending at index 9
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Sun Mar 16, 2014 6:53 am
by accurate_guy
Yes, I am sure. Somehow double-pipes (||) came into your code.
Try using
Code: Select all
Pattern pattern = Pattern.compile("(\\s|^)0[xX][0-9a-fA-F]+(\\s|$)");
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Sun Mar 16, 2014 9:09 pm
by admin
You are right. Not sure why || works. Must be fixed.
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Tue Mar 18, 2014 3:56 am
by accurate_guy
(\\s||^) "works" because it means
* space
* or nothing
* or beginning of the line
The problem with it is that it matches other delimiters than space.
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Mon Jul 21, 2014 2:43 am
by bptoth
In the first option example in fact 0x1a seems to be captured not only 0x1
In the second option description in "[a-zA-Z_0-9]" is there need for the underscore?
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Tue Mar 31, 2015 1:11 am
by pfilaretov
Regex is hard to understand for me, so please clarify one question..
regex "[\s\b]0[xX][0-9a-fA-F]+[\s\b]" won't compile. The error is "Illegal/unsupported escape sequence near index 4".
The square brakets ('[' and ']') mean "OR" or "RANGE", aren't they? So why is that ok: "[abc]", and that is not: "[\s\b]"?
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Tue Mar 31, 2015 7:27 am
by admin
Not really sure why

Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Tue Jan 03, 2017 3:00 pm
by jagoneye
The explanation should have matcher.end()-1 to return the correct ending index since it returns past the index integer.
Re: About Question enthuware.ocpjp.v7.2.1425 :
Posted: Tue Jan 03, 2017 10:24 pm
by admin
In Java, the ending index is almost always one after after the last. For example, if you do substring(1, 3), it will return characters from index 1 and 2 (not 3). The ending character (or element in the case of a list) is excluded. The explanation just prints the value of match.end() from that perspective. Changing it from end()-1 will just cause confusion.