| 
 | 
Working with Text | 
In many languages the sentence terminator is a period. The English language also uses the period to specify a decimal separator, to indicate an ellipsis mark, and to terminate abbreviations. Because the period has more than one purpose, theBreakIteratorclass cannot always determine sentence boundaries with accuracy.First, let's look at a case where sentence boundary analysis does work. You start by creating a
BreakIteratorwith thegetSentenceInstancemethod:To show the sentence boundaries, use the theBreakIterator sentenceIterator = BreakIterator.getSentenceInstance(currentLocale);markBoundariesmethod, which the previous section discussed. ThemarkBoundariesmethod prints carets ('^') beneath a string to indicate boundary positions. In the following example, the sentence boundaries are properly identified:You can also locate the boundaries of sentences that end with question marks and exclamation points:She stopped. She said, "Hello there," and then went on. ^ ^ ^Using the period as a decimal point does not cause an error:He's vanished! What will we do? It's up to us. ^ ^ ^ ^An ellipsis mark (three spaced periods) indicates the omission of text within a quoted passage. In the next example, the ellipses erroneously generate sentence boundaries:Please add 1.5 liters to the tank. ^ ^Abbreviations might also cause errors. If a period is followed by whitespace and an uppercase letter, the"No man is an island . . . every man . . . " ^ ^ ^ ^ ^ ^^BreakIteratordetects a bogus sentence boundary:My friend, Mr. Jones, has a new dog. The dog's name is Spot. ^ ^ ^ ^
| 
 | 
Working with Text |