public class StringUtil
extends java.lang.Object
| Constructor and Description |
|---|
StringUtil() |
| Modifier and Type | Method and Description |
|---|---|
static void |
computeShortestEditScript(java.lang.String wordForm,
java.lang.String lemma,
int[][] distance,
java.lang.StringBuffer permutations)
Computes the Shortest Edit Script (SES) to convert a word into its lemma.
|
static java.lang.String |
decodeShortestEditScript(java.lang.String wordForm,
java.lang.String permutations)
Read predicted SES by the lemmatizer model and apply the
permutations to obtain the lemma from the wordForm.
|
static java.lang.String |
getShortestEditScript(java.lang.String wordForm,
java.lang.String lemma)
Get the SES required to go from a word to a lemma.
|
static boolean |
isEmpty(java.lang.CharSequence theString)
Returns
true if CharSequence.length() is
0 or null. |
static boolean |
isWhitespace(char charCode)
Determines if the specified character is a whitespace.
|
static boolean |
isWhitespace(int charCode)
Determines if the specified character is a whitespace.
|
static int[][] |
levenshteinDistance(java.lang.String wordForm,
java.lang.String lemma)
Computes the Levenshtein distance of two strings in a matrix.
|
static java.lang.String |
toLowerCase(java.lang.CharSequence string)
Converts to lower case independent of the current locale via
Character.toLowerCase(int) which uses mapping information
from the UnicodeData file. |
static java.lang.String |
toUpperCase(java.lang.CharSequence string)
Converts to upper case independent of the current locale via
Character.toUpperCase(char) which uses mapping information
from the UnicodeData file. |
public static boolean isWhitespace(char charCode)
Character.isWhitespace(int) whitespace.Character.SPACE_SEPARATOR).Character.isWhitespace(int) does not include no-break spaces.
In OpenNLP no-break spaces are also considered as white spaces.charCode - public static boolean isWhitespace(int charCode)
Character.isWhitespace(int) whitespace.Character.SPACE_SEPARATOR).Character.isWhitespace(int) does not include no-break spaces.
In OpenNLP no-break spaces are also considered as white spaces.charCode - public static java.lang.String toLowerCase(java.lang.CharSequence string)
Character.toLowerCase(int) which uses mapping information
from the UnicodeData file.string - public static java.lang.String toUpperCase(java.lang.CharSequence string)
Character.toUpperCase(char) which uses mapping information
from the UnicodeData file.string - public static boolean isEmpty(java.lang.CharSequence theString)
true if CharSequence.length() is
0 or null.true if CharSequence.length() is 0, otherwise
falsepublic static int[][] levenshteinDistance(java.lang.String wordForm,
java.lang.String lemma)
wordForm - the formlemma - the lemmapublic static void computeShortestEditScript(java.lang.String wordForm,
java.lang.String lemma,
int[][] distance,
java.lang.StringBuffer permutations)
wordForm - the tokenlemma - the target lemmadistance - the levenshtein distancepermutations - the number of permutationspublic static java.lang.String decodeShortestEditScript(java.lang.String wordForm,
java.lang.String permutations)
wordForm - the wordFormpermutations - the permutations predicted by the lemmatizer modelpublic static java.lang.String getShortestEditScript(java.lang.String wordForm,
java.lang.String lemma)
wordForm - the wordlemma - the lemmaCopyright © 2010 - 2023 Adobe. All Rights Reserved