StringUtil (The Adobe AEM Quickstart and Web Application.)

java.lang.Object
- opennlp.tools.util.StringUtil

public class StringUtil
extends java.lang.Object

Constructor Summary

Constructors
Constructor and Description

StringUtil()

Constructors
Constructor and Description
`StringUtil()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static void`	`computeShortestEditScript(java.lang.String wordForm, java.lang.String lemma, int[][] distance, java.lang.StringBuffer permutations)` Computes the Shortest Edit Script (SES) to convert a word into its lemma.
`static java.lang.String`	`decodeShortestEditScript(java.lang.String wordForm, java.lang.String permutations)` Read predicted SES by the lemmatizer model and apply the permutations to obtain the lemma from the wordForm.
`static java.lang.String`	`getShortestEditScript(java.lang.String wordForm, java.lang.String lemma)` Get the SES required to go from a word to a lemma.
`static boolean`	`isEmpty(java.lang.CharSequence theString)` Returns `true` if `CharSequence.length()` is `0` or `null`.
`static boolean`	`isWhitespace(char charCode)` Determines if the specified character is a whitespace.
`static boolean`	`isWhitespace(int charCode)` Determines if the specified character is a whitespace.
`static int[][]`	`levenshteinDistance(java.lang.String wordForm, java.lang.String lemma)` Computes the Levenshtein distance of two strings in a matrix.
`static java.lang.String`	`toLowerCase(java.lang.CharSequence string)` Converts to lower case independent of the current locale via `Character.toLowerCase(int)` which uses mapping information from the UnicodeData file.
`static java.lang.String`	`toUpperCase(java.lang.CharSequence string)` Converts to upper case independent of the current locale via `Character.toUpperCase(char)` which uses mapping information from the UnicodeData file.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - StringUtil
```
public StringUtil()
```
- Method Detail
  - isWhitespace
```
public static boolean isWhitespace(char charCode)
```
    Determines if the specified character is a whitespace. A character is considered a whitespace when one of the following conditions is meet:
    - Its a Character.isWhitespace(int) whitespace.
    - Its a part of the Unicode Zs category (Character.SPACE_SEPARATOR).
    Character.isWhitespace(int) does not include no-break spaces. In OpenNLP no-break spaces are also considered as white spaces.
    Parameters:
    
    charCode -
    
    Returns:
    
    true if white space otherwise false
  - isWhitespace
```
public static boolean isWhitespace(int charCode)
```
    Determines if the specified character is a whitespace. A character is considered a whitespace when one of the following conditions is meet:
    - Its a Character.isWhitespace(int) whitespace.
    - Its a part of the Unicode Zs category (Character.SPACE_SEPARATOR).
    Character.isWhitespace(int) does not include no-break spaces. In OpenNLP no-break spaces are also considered as white spaces.
    Parameters:
    
    charCode -
    
    Returns:
    
    true if white space otherwise false
  - toLowerCase
```
public static java.lang.String toLowerCase(java.lang.CharSequence string)
```
    Converts to lower case independent of the current locale via Character.toLowerCase(int) which uses mapping information from the UnicodeData file.
    
    Parameters:
    
    string -
    
    Returns:
    
    lower cased String
  - toUpperCase
```
public static java.lang.String toUpperCase(java.lang.CharSequence string)
```
    Converts to upper case independent of the current locale via Character.toUpperCase(char) which uses mapping information from the UnicodeData file.
    
    Parameters:
    
    string -
    
    Returns:
    
    upper cased String
  - isEmpty
```
public static boolean isEmpty(java.lang.CharSequence theString)
```
    Returns true if CharSequence.length() is 0 or null.
    
    Returns:
    
    true if CharSequence.length() is 0, otherwise false
    
    Since:
    
    1.5.1
  - levenshteinDistance
```
public static int[][] levenshteinDistance(java.lang.String wordForm,
                                          java.lang.String lemma)
```
    Computes the Levenshtein distance of two strings in a matrix. Based on pseudo-code provided here: https://en.wikipedia.org/wiki/Levenshtein_distance#Computing_Levenshtein_distance which in turn is based on the paper Wagner, Robert A.; Fischer, Michael J. (1974), "The String-to-String Correction Problem", Journal of the ACM 21 (1): 168-173
    
    Parameters:
    
    wordForm - the form
    
    lemma - the lemma
    
    Returns:
    
    the distance
  - computeShortestEditScript
```
public static void computeShortestEditScript(java.lang.String wordForm,
                                             java.lang.String lemma,
                                             int[][] distance,
                                             java.lang.StringBuffer permutations)
```
    Computes the Shortest Edit Script (SES) to convert a word into its lemma. This is based on Chrupala's PhD thesis (2008).
    
    Parameters:
    
    wordForm - the token
    
    lemma - the target lemma
    
    distance - the levenshtein distance
    
    permutations - the number of permutations
  - decodeShortestEditScript
```
public static java.lang.String decodeShortestEditScript(java.lang.String wordForm,
                                                        java.lang.String permutations)
```
    Read predicted SES by the lemmatizer model and apply the permutations to obtain the lemma from the wordForm.
    
    Parameters:
    
    wordForm - the wordForm
    
    permutations - the permutations predicted by the lemmatizer model
    
    Returns:
    
    the lemma
  - getShortestEditScript
```
public static java.lang.String getShortestEditScript(java.lang.String wordForm,
                                                     java.lang.String lemma)
```
    Get the SES required to go from a word to a lemma.
    
    Parameters:
    
    wordForm - the word
    
    lemma - the lemma
    
    Returns:
    
    the shortest edit script

Class StringUtil

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

StringUtil

Method Detail

isWhitespace

isWhitespace

toLowerCase

toUpperCase

isEmpty

levenshteinDistance

computeShortestEditScript

decodeShortestEditScript

getShortestEditScript