Enthuware

Posted: **Wed Mar 27, 2013 6:52 am**

Q: Which of the following patterns will correctly capture all Hex numbers that are delimited by at least one whitespace at either end in an input text?
A: (\s|\b)0[xX][0-9a-fA-F]+(\s|\b)

"0x22" does not contain any spaces, but the number will still be captured.

Posted: **Wed Mar 27, 2013 8:47 am**

You need the delimiter if there are multiple numbers in the string. In your example, there is only one number, which matches the pattern, so there is no question of delimiter.

In other words, the question does not ask you to match white space. It asks you to use white space as a delimiter.

HTH,
Paul.

Posted: **Thu Mar 28, 2013 5:09 am**

Then I think that question should clarify that input string must contain whitespace delimiters.

Posted: **Thu Mar 28, 2013 7:04 am**

Hi Aleksey,
I am not sure what you mean because the question does say, "...are delimited by at least one whitespace...". So it is clear that whitespace is to be used as a delimiter.

HTH,
Paul.

Posted: **Fri Sep 20, 2013 9:06 am**

Would it make sense using & operator in regex? or better there is the chance of getting question on it:)?
The_Nick.

Posted: **Sat Mar 15, 2014 5:08 pm**

"0x1+0x2" contains two hex numbers delimited by "+" but not a space. Nevertheless they are matched by the pattern.
The problem with the pattern is that there exist characters which are not a space but form a word boundary. This applies to all non-word characters (eg: "0x1@0x2").

Posted: **Sat Mar 15, 2014 9:22 pm**

You are right. The pattern should be: (\s|^)0[xX][0-9a-fA-F]+(\s|$)

Posted: **Sun Mar 16, 2014 4:20 am**

Thanks for your fast replies.

A problem of the new pattern is that it doesn't match two hex numbers separated by just one space, eg "0x1 0x2". Only the first hex number is matched because the space is already consumed by the first one.
To fix this I've added \G (the end of the previous match):
(^|\s|\G)0[xX][0-9a-fA-F]+(\s|$)

I am not sure if the \G operator needs to be known in the exam.
In practice I would put the hex number itself inside parenthesis (as a capturing group) to exclude the spaces from the match.

Regards

Posted: **Sun Mar 16, 2014 5:24 am**

Are you sure, I just tried it and it matches correctly. Here is the code:

Code: Select all

            Pattern pattern = 
            Pattern.compile("(\\s||^)0[xX][0-9a-fA-F]+(\\s||$)");

            Matcher matcher = pattern.matcher("0x22 0x44");

            while (matcher.find()) {
                System.out.println("Found the text "+matcher.group()+" starting at " +matcher.start()+" and ending at index "+ matcher.end());
            }

Output:

Code: Select all

Found the text 0x22  starting at 0 and ending at index 5
Found the text 0x44 starting at 5 and ending at index 9

Posted: **Sun Mar 16, 2014 6:53 am**

Yes, I am sure. Somehow double-pipes (||) came into your code.
Try using

Code: Select all

Pattern pattern = Pattern.compile("(\\s|^)0[xX][0-9a-fA-F]+(\\s|$)");

Posted: **Sun Mar 16, 2014 9:09 pm**

You are right. Not sure why || works. Must be fixed.

Posted: **Tue Mar 18, 2014 3:56 am**

(\\s||^) "works" because it means
* space
* or nothing
* or beginning of the line
The problem with it is that it matches other delimiters than space.

Posted: **Mon Jul 21, 2014 2:43 am**

In the first option example in fact 0x1a seems to be captured not only 0x1
In the second option description in "[a-zA-Z_0-9]" is there need for the underscore?

Posted: **Tue Mar 31, 2015 1:11 am**

Regex is hard to understand for me, so please clarify one question..

regex "[\s\b]0[xX][0-9a-fA-F]+[\s\b]" won't compile. The error is "Illegal/unsupported escape sequence near index 4".
The square brakets ('[' and ']') mean "OR" or "RANGE", aren't they? So why is that ok: "[abc]", and that is not: "[\s\b]"?

Posted: **Tue Mar 31, 2015 7:27 am**

Not really sure why

Posted: **Tue Jan 03, 2017 3:00 pm**

The explanation should have matcher.end()-1 to return the correct ending index since it returns past the index integer.

Posted: **Tue Jan 03, 2017 10:24 pm**

In Java, the ending index is almost always one after after the last. For example, if you do substring(1, 3), it will return characters from index 1 and 2 (not 3). The ending character (or element in the case of a list) is excluded. The explanation just prints the value of match.end() from that perspective. Changing it from end()-1 will just cause confusion.

Enthuware

About Question enthuware.ocpjp.v7.2.1425 :

About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :

Re: About Question enthuware.ocpjp.v7.2.1425 :