public enum TokenAnnotation extends Enum<TokenAnnotation>
Enum Constant and Description |
---|
ADJECTIVE
Marks a token is an adjective
|
ADVERB
Marks a token is an adverb
|
CASE_SENSITIVE
Marks that a token should not be lowercased by the indexer.
|
CHARACTERS
Marks character data.
|
CONJUNCTION
Marks a token is a conjunction
|
ELEMENT_ATTRIBUTE
Marks an attribute.
|
INTERJECTION
Marks a token is an interjection
|
LEMMA
Marks a token that is a lemma
|
LOCALE
Marks a token as a locale bounary.
|
MULTIPART
(Experimental) Identifies a token as a multi-part token.
|
NOT_INDEXED
Marks a non-indexable token.
|
NOUN
Marks a token is a noun
|
PREFIX
Marks a token as a prefix (pre-clitic)
|
PREPOSITION
Marks a token is a preposition
|
PRONOUN
Marks a token is a pronoun
|
PROTECTED
Marks a token as protected.
|
SCOPE_END
Marks a token as a scope end marker.
|
SCOPE_START
Marks a token as a scope start marker.
|
STEM
Marks a token as the stemmed or base form of a word.
|
STOPWORD
Marks a token as a stopword
|
SUFFIX
Marks a token as a suffix (post-clitic)
|
TOKENIZED
Marks that a token has been tokenized.
|
VERB
Marks a token is a verb
|
WILDCARD
Marks a token as a wildcard query term.
|
WILDCARD_COMPONENT
Marks a token as being part of a larger wildcard expression.
|
Modifier and Type | Method and Description |
---|---|
int |
index()
Get the numeric index for this TokenAnnotation.
|
long |
inverseMask()
Get the inverse bit mask for this TokenAnnotation.
|
long |
mask()
Get the bit mask for this TokenAnnotation.
|
static long |
mask(TokenAnnotation... annotations)
Get the union of masks for
annotations . |
static TokenAnnotation |
valueOf(String name)
Returns the enum constant of this type with the specified name.
|
static TokenAnnotation[] |
values()
Returns an array containing the constants of this enum type, in
the order they are declared.
|
public static final TokenAnnotation NOT_INDEXED
public static final TokenAnnotation TOKENIZED
public static final TokenAnnotation LOCALE
Locale tokens are used to annotate language regions in a token list.
The token's text will be a locale, encoded as a language tag.
A LOCALE
token in a token list indicates a change in locale for all tokens the follow.
If the token's text is an empty string, this implies a null
locale,
indicating that the locale set on the field value should be used.
public static final TokenAnnotation LEMMA
public static final TokenAnnotation STEM
public static final TokenAnnotation NOUN
public static final TokenAnnotation PRONOUN
public static final TokenAnnotation VERB
public static final TokenAnnotation ADJECTIVE
public static final TokenAnnotation ADVERB
public static final TokenAnnotation PREPOSITION
public static final TokenAnnotation CONJUNCTION
public static final TokenAnnotation INTERJECTION
public static final TokenAnnotation STOPWORD
public static final TokenAnnotation PREFIX
public static final TokenAnnotation SUFFIX
public static final TokenAnnotation WILDCARD
public static final TokenAnnotation WILDCARD_COMPONENT
For instance, when the token "abc*def" is being processed for wildcard tokenization, this will temporirily generate 3 tokens: "abc", "*", "def". The "abc" and "def" tokens will be annotated as WILDCARD_COMPONENT. The "*" token will be annotated as WILDCARD.
public static final TokenAnnotation CASE_SENSITIVE
public static final TokenAnnotation SCOPE_START
Tokens annotated with SCOPE_START
should also be annotated as TOKENIZED
to avoid being retokenized.
public static final TokenAnnotation SCOPE_END
Tokens annotated with SCOPE_START
should also be annotated as TOKENIZED
to avoid being retokenized.
public static final TokenAnnotation ELEMENT_ATTRIBUTE
public static final TokenAnnotation CHARACTERS
public static final TokenAnnotation PROTECTED
Protected tokens will not be matched by wildcard or range queries.
public static final TokenAnnotation MULTIPART
Valid separator characters are: '.', ',', '-', ':', '/'
Token must not begin or end with a separator character.
Token must not contain 2 separator characters in a row.
Token's position increment must not be 0.
public static TokenAnnotation[] values()
for (TokenAnnotation c : TokenAnnotation.values()) System.out.println(c);
public static TokenAnnotation valueOf(String name)
name
- the name of the enum constant to be returned.IllegalArgumentException
- if this enum type has no constant with the specified nameNullPointerException
- if the argument is nullpublic int index()
public long mask()
public long inverseMask()
public static long mask(TokenAnnotation... annotations)
annotations
.Copyright © 2018 Attivio, Inc. All Rights Reserved.
PATENT NOTICE: Attivio, Inc. Software Related Patents. With respect to the Attivio software product(s) being used, the following patents apply: Querying Joined Data Within A Search Engine Index: United States Patent No.(s): 8,073,840. Ordered Processing of Groups of Messages: U.S. Patent No.(s) 8,495,656. Signal processing approach to sentiment analysis for entities in documents: U.S. Patent No.(s) 8,725,494. Other U.S. and International Patents Pending.