Enum TokenAnnotation
- java.lang.Object
-
- java.lang.Enum<TokenAnnotation>
-
- com.attivio.sdk.token.TokenAnnotation
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.Comparable<TokenAnnotation>
public enum TokenAnnotation extends java.lang.Enum<TokenAnnotation> implements java.io.Serializable
Enumeration of annotations that can be placed on tokens.
-
-
Enum Constant Summary
Enum Constants Enum Constant Description ADJECTIVE
Marks a token is an adjectiveADVERB
Marks a token is an adverbCASE_SENSITIVE
Marks that a token should not be lowercased by the indexer.CHARACTERS
Marks character data.CONJUNCTION
Marks a token is a conjunctionELEMENT_ATTRIBUTE
Marks an attribute.ENTITY_OUTPUT
Experimental: Marks a token as being the output for a matching entity.FUZZY
Experimental: Marks a token as being generated due to a fuzzy/inexact match.INTERJECTION
Marks a token is an interjectionLEMMA
Marks a token that is a lemmaLOCALE
Marks a token as a locale bounary.MULTIPART
(Experimental) Identifies a token as a multi-part token.NOT_INDEXED
Marks a non-indexable token.NOUN
Marks a token is a nounPREFIX
Marks a token as a prefix (pre-clitic)PREPOSITION
Marks a token is a prepositionPRONOUN
Marks a token is a pronounPROTECTED
Marks a token as protected.SCOPE_END
Marks a token as a scope end marker.SCOPE_START
Marks a token as a scope start marker.STEM
Marks a token as the stemmed or base form of a word.STOPWORD
Marks a token as a stopwordSUFFIX
Marks a token as a suffix (post-clitic)TOKENIZED
Marks that a token has been tokenized.VERB
Marks a token is a verbWILDCARD
Marks a token as a wildcard query term.WILDCARD_COMPONENT
Marks a token as being part of a larger wildcard expression.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int
index()
Get the numeric index for this TokenAnnotation.long
inverseMask()
Get the inverse bit mask for this TokenAnnotation.long
mask()
Get the bit mask for this TokenAnnotation.static long
mask(TokenAnnotation... annotations)
Get the union of masks forannotations
.static TokenAnnotation
valueOf(java.lang.String name)
Returns the enum constant of this type with the specified name.static TokenAnnotation[]
values()
Returns an array containing the constants of this enum type, in the order they are declared.
-
-
-
Enum Constant Detail
-
NOT_INDEXED
public static final TokenAnnotation NOT_INDEXED
Marks a non-indexable token.
-
TOKENIZED
public static final TokenAnnotation TOKENIZED
Marks that a token has been tokenized.
-
LOCALE
public static final TokenAnnotation LOCALE
Marks a token as a locale bounary.Locale tokens are used to annotate language regions in a token list. The token's text will be a locale, encoded as a language tag. A
LOCALE
token in a token list indicates a change in locale for all tokens the follow. If the token's text is an empty string, this implies anull
locale, indicating that the locale set on the field value should be used.- Since:
- 5.5.0 patch 95
-
LEMMA
public static final TokenAnnotation LEMMA
Marks a token that is a lemma
-
STEM
public static final TokenAnnotation STEM
Marks a token as the stemmed or base form of a word.
-
FUZZY
public static final TokenAnnotation FUZZY
Experimental: Marks a token as being generated due to a fuzzy/inexact match.NOTE: The FUZZY token annotation should never be placed on a surface token.
-
ENTITY_OUTPUT
public static final TokenAnnotation ENTITY_OUTPUT
Experimental: Marks a token as being the output for a matching entity.Entity Output tokens will be stacked on the position for the start of the entity match. Offsets for Entity Output tokens will reflect the start/end of the entity match.
-
NOUN
public static final TokenAnnotation NOUN
Marks a token is a noun
-
PRONOUN
public static final TokenAnnotation PRONOUN
Marks a token is a pronoun
-
VERB
public static final TokenAnnotation VERB
Marks a token is a verb
-
ADJECTIVE
public static final TokenAnnotation ADJECTIVE
Marks a token is an adjective
-
ADVERB
public static final TokenAnnotation ADVERB
Marks a token is an adverb
-
PREPOSITION
public static final TokenAnnotation PREPOSITION
Marks a token is a preposition
-
CONJUNCTION
public static final TokenAnnotation CONJUNCTION
Marks a token is a conjunction
-
INTERJECTION
public static final TokenAnnotation INTERJECTION
Marks a token is an interjection
-
STOPWORD
public static final TokenAnnotation STOPWORD
Marks a token as a stopword
-
PREFIX
public static final TokenAnnotation PREFIX
Marks a token as a prefix (pre-clitic)
-
SUFFIX
public static final TokenAnnotation SUFFIX
Marks a token as a suffix (post-clitic)
-
WILDCARD
public static final TokenAnnotation WILDCARD
Marks a token as a wildcard query term.
-
WILDCARD_COMPONENT
public static final TokenAnnotation WILDCARD_COMPONENT
Marks a token as being part of a larger wildcard expression.For instance, when the token "abc*def" is being processed for wildcard tokenization, this will temporirily generate 3 tokens: "abc", "*", "def". The "abc" and "def" tokens will be annotated as WILDCARD_COMPONENT. The "*" token will be annotated as WILDCARD.
-
CASE_SENSITIVE
public static final TokenAnnotation CASE_SENSITIVE
Marks that a token should not be lowercased by the indexer.
-
SCOPE_START
public static final TokenAnnotation SCOPE_START
Marks a token as a scope start marker.Tokens annotated with
SCOPE_START
should also be annotated asTOKENIZED
to avoid being retokenized.
-
SCOPE_END
public static final TokenAnnotation SCOPE_END
Marks a token as a scope end marker.Tokens annotated with
SCOPE_START
should also be annotated asTOKENIZED
to avoid being retokenized.
-
ELEMENT_ATTRIBUTE
public static final TokenAnnotation ELEMENT_ATTRIBUTE
Marks an attribute.
-
CHARACTERS
public static final TokenAnnotation CHARACTERS
Marks character data.
-
PROTECTED
public static final TokenAnnotation PROTECTED
Marks a token as protected.Protected tokens will not be matched by wildcard or range queries.
-
MULTIPART
public static final TokenAnnotation MULTIPART
(Experimental) Identifies a token as a multi-part token.Valid separator characters are: '.', ',', '-', ':', '/'
Token must not begin or end with a separator character.
Token must not contain 2 separator characters in a row.
Token's position increment must not be 0.
-
-
Method Detail
-
values
public static TokenAnnotation[] values()
Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:for (TokenAnnotation c : TokenAnnotation.values()) System.out.println(c);
- Returns:
- an array containing the constants of this enum type, in the order they are declared
-
valueOf
public static TokenAnnotation valueOf(java.lang.String name)
Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)- Parameters:
name
- the name of the enum constant to be returned.- Returns:
- the enum constant with the specified name
- Throws:
java.lang.IllegalArgumentException
- if this enum type has no constant with the specified namejava.lang.NullPointerException
- if the argument is null
-
index
public int index()
Get the numeric index for this TokenAnnotation.
-
mask
public long mask()
Get the bit mask for this TokenAnnotation.
-
inverseMask
public long inverseMask()
Get the inverse bit mask for this TokenAnnotation.
-
mask
public static long mask(TokenAnnotation... annotations)
Get the union of masks forannotations
.
-
-