Enum TokenAnnotation

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Comparable<TokenAnnotation>

    public enum TokenAnnotation
    extends java.lang.Enum<TokenAnnotation>
    implements java.io.Serializable
    Enumeration of annotations that can be placed on tokens.
    • Enum Constant Summary

      Enum Constants 
      Enum Constant Description
      ADJECTIVE
      Marks a token is an adjective
      ADVERB
      Marks a token is an adverb
      CASE_SENSITIVE
      Marks that a token should not be lowercased by the indexer.
      CHARACTERS
      Marks character data.
      CONJUNCTION
      Marks a token is a conjunction
      ELEMENT_ATTRIBUTE
      Marks an attribute.
      ENTITY_OUTPUT
      Experimental: Marks a token as being the output for a matching entity.
      FUZZY
      Experimental: Marks a token as being generated due to a fuzzy/inexact match.
      INTERJECTION
      Marks a token is an interjection
      LEMMA
      Marks a token that is a lemma
      LOCALE
      Marks a token as a locale bounary.
      MULTIPART
      (Experimental) Identifies a token as a multi-part token.
      NOT_INDEXED
      Marks a non-indexable token.
      NOUN
      Marks a token is a noun
      PREFIX
      Marks a token as a prefix (pre-clitic)
      PREPOSITION
      Marks a token is a preposition
      PRONOUN
      Marks a token is a pronoun
      PROTECTED
      Marks a token as protected.
      SCOPE_END
      Marks a token as a scope end marker.
      SCOPE_START
      Marks a token as a scope start marker.
      STEM
      Marks a token as the stemmed or base form of a word.
      STOPWORD
      Marks a token as a stopword
      SUFFIX
      Marks a token as a suffix (post-clitic)
      TOKENIZED
      Marks that a token has been tokenized.
      VERB
      Marks a token is a verb
      WILDCARD
      Marks a token as a wildcard query term.
      WILDCARD_COMPONENT
      Marks a token as being part of a larger wildcard expression.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int index()
      Get the numeric index for this TokenAnnotation.
      long inverseMask()
      Get the inverse bit mask for this TokenAnnotation.
      long mask()
      Get the bit mask for this TokenAnnotation.
      static long mask​(TokenAnnotation... annotations)
      Get the union of masks for annotations.
      static TokenAnnotation valueOf​(java.lang.String name)
      Returns the enum constant of this type with the specified name.
      static TokenAnnotation[] values()
      Returns an array containing the constants of this enum type, in the order they are declared.
      • Methods inherited from class java.lang.Enum

        clone, compareTo, equals, finalize, getDeclaringClass, hashCode, name, ordinal, toString, valueOf
      • Methods inherited from class java.lang.Object

        getClass, notify, notifyAll, wait, wait, wait
    • Enum Constant Detail

      • NOT_INDEXED

        public static final TokenAnnotation NOT_INDEXED
        Marks a non-indexable token.
      • TOKENIZED

        public static final TokenAnnotation TOKENIZED
        Marks that a token has been tokenized.
      • LOCALE

        public static final TokenAnnotation LOCALE
        Marks a token as a locale bounary.

        Locale tokens are used to annotate language regions in a token list. The token's text will be a locale, encoded as a language tag. A LOCALE token in a token list indicates a change in locale for all tokens the follow. If the token's text is an empty string, this implies a null locale, indicating that the locale set on the field value should be used.

        Since:
        5.5.0 patch 95
      • LEMMA

        public static final TokenAnnotation LEMMA
        Marks a token that is a lemma
      • STEM

        public static final TokenAnnotation STEM
        Marks a token as the stemmed or base form of a word.
      • FUZZY

        public static final TokenAnnotation FUZZY
        Experimental: Marks a token as being generated due to a fuzzy/inexact match.

        NOTE: The FUZZY token annotation should never be placed on a surface token.

      • ENTITY_OUTPUT

        public static final TokenAnnotation ENTITY_OUTPUT
        Experimental: Marks a token as being the output for a matching entity.

        Entity Output tokens will be stacked on the position for the start of the entity match. Offsets for Entity Output tokens will reflect the start/end of the entity match.

      • PRONOUN

        public static final TokenAnnotation PRONOUN
        Marks a token is a pronoun
      • ADJECTIVE

        public static final TokenAnnotation ADJECTIVE
        Marks a token is an adjective
      • ADVERB

        public static final TokenAnnotation ADVERB
        Marks a token is an adverb
      • PREPOSITION

        public static final TokenAnnotation PREPOSITION
        Marks a token is a preposition
      • CONJUNCTION

        public static final TokenAnnotation CONJUNCTION
        Marks a token is a conjunction
      • INTERJECTION

        public static final TokenAnnotation INTERJECTION
        Marks a token is an interjection
      • STOPWORD

        public static final TokenAnnotation STOPWORD
        Marks a token as a stopword
      • PREFIX

        public static final TokenAnnotation PREFIX
        Marks a token as a prefix (pre-clitic)
      • SUFFIX

        public static final TokenAnnotation SUFFIX
        Marks a token as a suffix (post-clitic)
      • WILDCARD

        public static final TokenAnnotation WILDCARD
        Marks a token as a wildcard query term.
      • WILDCARD_COMPONENT

        public static final TokenAnnotation WILDCARD_COMPONENT
        Marks a token as being part of a larger wildcard expression.

        For instance, when the token "abc*def" is being processed for wildcard tokenization, this will temporirily generate 3 tokens: "abc", "*", "def". The "abc" and "def" tokens will be annotated as WILDCARD_COMPONENT. The "*" token will be annotated as WILDCARD.

      • CASE_SENSITIVE

        public static final TokenAnnotation CASE_SENSITIVE
        Marks that a token should not be lowercased by the indexer.
      • SCOPE_START

        public static final TokenAnnotation SCOPE_START
        Marks a token as a scope start marker.

        Tokens annotated with SCOPE_START should also be annotated as TOKENIZED to avoid being retokenized.

      • SCOPE_END

        public static final TokenAnnotation SCOPE_END
        Marks a token as a scope end marker.

        Tokens annotated with SCOPE_START should also be annotated as TOKENIZED to avoid being retokenized.

      • ELEMENT_ATTRIBUTE

        public static final TokenAnnotation ELEMENT_ATTRIBUTE
        Marks an attribute.
      • CHARACTERS

        public static final TokenAnnotation CHARACTERS
        Marks character data.
      • PROTECTED

        public static final TokenAnnotation PROTECTED
        Marks a token as protected.

        Protected tokens will not be matched by wildcard or range queries.

      • MULTIPART

        public static final TokenAnnotation MULTIPART
        (Experimental) Identifies a token as a multi-part token.

        Valid separator characters are: '.', ',', '-', ':', '/'

        Token must not begin or end with a separator character.

        Token must not contain 2 separator characters in a row.

        Token's position increment must not be 0.

    • Method Detail

      • values

        public static TokenAnnotation[] values()
        Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:
        for (TokenAnnotation c : TokenAnnotation.values())
            System.out.println(c);
        
        Returns:
        an array containing the constants of this enum type, in the order they are declared
      • valueOf

        public static TokenAnnotation valueOf​(java.lang.String name)
        Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)
        Parameters:
        name - the name of the enum constant to be returned.
        Returns:
        the enum constant with the specified name
        Throws:
        java.lang.IllegalArgumentException - if this enum type has no constant with the specified name
        java.lang.NullPointerException - if the argument is null
      • index

        public int index()
        Get the numeric index for this TokenAnnotation.
      • mask

        public long mask()
        Get the bit mask for this TokenAnnotation.
      • inverseMask

        public long inverseMask()
        Get the inverse bit mask for this TokenAnnotation.
      • mask

        public static long mask​(TokenAnnotation... annotations)
        Get the union of masks for annotations.