Class Token

  • All Implemented Interfaces:
    TokenAnnotationSet, java.io.Serializable, java.lang.CharSequence, java.lang.Cloneable, java.lang.Comparable<Token>

    public class Token
    extends java.lang.Object
    implements java.lang.Comparable<Token>, java.lang.Cloneable, java.lang.CharSequence, java.io.Serializable, TokenAnnotationSet
    Represents a Token.
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected int endOffset
      The start offset in the source text
      protected int startOffset
      The start offset in the source text
      protected java.lang.String text
      The character data for this token
    • Constructor Summary

      Constructors 
      Constructor Description
      Token​(char[] text)
      Construct a new Token with text.
      Token​(java.lang.CharSequence text)
      Construct a new Token with text.
      Token​(java.lang.String text)
      Construct a new token with text
      Token​(java.lang.String text, TokenAnnotation annotation)
      Construct a new annotated token with text.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void append​(java.lang.CharSequence value)
      Append value to the end of this token.
      boolean bufferEquals​(Token token)
      Return true if this Token's buffer contains the same text as token.
      boolean bufferEquals​(java.lang.String token)
      Return true if this Token's buffer contains the same text as token.
      char charAt​(int index)
      Returns the char value at the specified index.
      Token clone()  
      int compareTo​(Token other)
      int compareToIgnoreCase​(Token other)
      Case insensitive equivalent to compareTo.
      boolean contains​(char c)
      Returns true if this Token contains c.
      boolean containsAnnotation​(TokenAnnotation annotation)
      Returns true if the specified annotation is set.
      boolean containsWhitespace()
      Returns true if this Token contains any whitespace.
      boolean containsWildcard()
      Returns true if token contains a wildcard character.
      static Token createElementAttributeToken​(java.lang.String key, java.lang.String value)
      Creates an element attribute token.
      static Token createEndScopeToken​(java.lang.String text)
      Creates an end scope token.
      static Token createStartScopeToken​(java.lang.String text)
      Creates a start scope token.
      protected boolean equals​(Token other, long annotationMask)
      Compare this token to other.
      boolean equals​(java.lang.Object other)
      Compares this Token to another object.
      long getAnnotations()
      Get all set token annotations as a bit mask.
      int getEndOffset()
      Get the end offset from the source text for this token.
      int getStartOffset()
      Get the start offset from the source text for this token.
      java.lang.String getText()
      Get the token text.
      int hashCode()
      int indexOf​(char ch)
      Returns the index within this token of the first occurence of the specified character.
      boolean isMatchAll()
      Returns true if this token is a single character wildcard '*' query.
      boolean isScopeEnd()
      Returns true if this token designates the end of a scope.
      boolean isScopeStart()
      Returns true if this token designates the start of a scope.
      boolean isScopeToken()
      Returns true if this token designates the start or end of a scope.
      boolean isSearchTerm()
      Returns true if this token is an standard atomic search term.
      boolean isSurfaceForm()
      Returns true if this token is a surface token.
      int length()
      Returns the length of this token's text.
      int offsetGap​(Token previous)
      Get the offset gap between previous and this token.
      void setAnnotation​(TokenAnnotation annotation)
      Set a TokenAnnotation.
      void setAnnotations​(long mask)
      Set the token annotations from a bit mask.
      void setEndOffset​(int offset)
      Set the end offset from the source text for this token.
      void setLength​(int length)
      Truncate token text to length.
      void setStartOffset​(int offset)
      Set the start offset from the source text for this token.
      void setText​(java.lang.String value)
      Set the token text.
      void setValue​(char[] value)
      Set the token text to value.
      void setValue​(char[] value, int start, int end)
      Sets the token text.
      void setValue​(java.lang.CharSequence value)
      Set the token text to value.
      void setValue​(java.lang.String value)
      Set the token text to value.
      void setValue​(java.lang.String value, int start, int end)
      Sets the token text to be a substring of value.
      Token subSequence​(int start, int end)
      Returns a new character sequence that is a subsequence of this sequence.
      Token toLowerCase()
      Convert all characters in this Token to be LowerCased.
      Phrase toPhrase()
      Convert this token into a suitable Phrase query.
      Phrase toPhrase​(int offsetBase)
      Convert this token into a suitable Phrase query.
      void toQueryString​(java.lang.StringBuilder buffer)
      Encode this Token into buffer suitable for parsing inside a phrase() operator.
      java.lang.String toString()
      Returns the String value representation of this Token.
      Token toUpperCase()
      Convert all characters in this Token to be LowerCased.
      void unsetAnnotation​(TokenAnnotation annotation)
      Unset a TokenAnnotation.
      static Token valueOf​(java.lang.Object value)
      Return the token representation of an arbitrary value.
      void write​(java.lang.StringBuilder buffer)
      Append this Token's text to the StringBuilder.
      void writeTo​(java.lang.StringBuilder buffer)
      Serialize this Token to buffer.
      • Methods inherited from class java.lang.Object

        finalize, getClass, notify, notifyAll, wait, wait, wait
      • Methods inherited from interface java.lang.CharSequence

        chars, codePoints
    • Field Detail

      • text

        protected java.lang.String text
        The character data for this token
      • startOffset

        protected int startOffset
        The start offset in the source text
      • endOffset

        protected int endOffset
        The start offset in the source text
    • Constructor Detail

      • Token

        public Token​(java.lang.String text)
        Construct a new token with text
      • Token

        public Token​(char[] text)
        Construct a new Token with text.
      • Token

        public Token​(java.lang.CharSequence text)
        Construct a new Token with text.
      • Token

        public Token​(java.lang.String text,
                     TokenAnnotation annotation)
        Construct a new annotated token with text.
    • Method Detail

      • createStartScopeToken

        public static Token createStartScopeToken​(java.lang.String text)
        Creates a start scope token.
        Parameters:
        text - the scope name
        Returns:
        the created start scope token
      • createEndScopeToken

        public static Token createEndScopeToken​(java.lang.String text)
        Creates an end scope token.
        Parameters:
        text - the scope name
        Returns:
        the created end scope token
      • createElementAttributeToken

        public static Token createElementAttributeToken​(java.lang.String key,
                                                        java.lang.String value)
        Creates an element attribute token.
        Parameters:
        key - the attribute name
        value - the attribute value
        Returns:
        the created element attribute token
      • toPhrase

        public Phrase toPhrase()
        Convert this token into a suitable Phrase query.
      • toPhrase

        public Phrase toPhrase​(int offsetBase)
        Convert this token into a suitable Phrase query.
      • isSearchTerm

        public boolean isSearchTerm()
        Returns true if this token is an standard atomic search term.

        Returns false if this is a special term, such as a wildcard, regex, or fuzzy search term.

      • isSurfaceForm

        public boolean isSurfaceForm()
        Returns true if this token is a surface token.
      • offsetGap

        public int offsetGap​(Token previous)
        Get the offset gap between previous and this token.

        NOTE: if this Token or the previous token do not contain offsets, 1 is returned.

        NOTE: Returns 0 if previous is null.

      • setAnnotations

        public void setAnnotations​(long mask)
        Set the token annotations from a bit mask.
        Specified by:
        setAnnotations in interface TokenAnnotationSet
      • isScopeToken

        public boolean isScopeToken()
        Returns true if this token designates the start or end of a scope.
      • isScopeStart

        public boolean isScopeStart()
        Returns true if this token designates the start of a scope.
      • isScopeEnd

        public boolean isScopeEnd()
        Returns true if this token designates the end of a scope.
      • containsWildcard

        public boolean containsWildcard()
        Returns true if token contains a wildcard character.

        Wildcard Characters:

        • asterisk - '*' - match 0 or more characters.
        • question mark - '?' - match any single character.
        • full width asterisk - '*' - treated same as asterisk.
        Returns:
        true if token contains a wildcard character
      • isMatchAll

        public boolean isMatchAll()
        Returns true if this token is a single character wildcard '*' query.
      • containsWhitespace

        public boolean containsWhitespace()
        Returns true if this Token contains any whitespace.
      • contains

        public boolean contains​(char c)
        Returns true if this Token contains c.
      • getStartOffset

        public int getStartOffset()
        Get the start offset from the source text for this token.
      • setStartOffset

        public void setStartOffset​(int offset)
        Set the start offset from the source text for this token.
      • getEndOffset

        public int getEndOffset()
        Get the end offset from the source text for this token.

        The end offset is actually 1 past the final column of source text. To get the source text, you should do the following:
        String tokenSource = sourceText.substring(token.getStartOffset(), token.getEndOffset());

      • setEndOffset

        public void setEndOffset​(int offset)
        Set the end offset from the source text for this token.
      • indexOf

        public int indexOf​(char ch)
        Returns the index within this token of the first occurence of the specified character.
        Parameters:
        ch - a character (Unicode code point).
        Returns:
        the index of the first occurrence of the character this token, or -1 if the character does not occur.
      • setValue

        public void setValue​(char[] value)
        Set the token text to value.
      • setValue

        public void setValue​(java.lang.String value)
        Set the token text to value.
      • setLength

        public void setLength​(int length)
        Truncate token text to length.
      • setValue

        public void setValue​(java.lang.CharSequence value)
        Set the token text to value.
      • append

        public void append​(java.lang.CharSequence value)
        Append value to the end of this token.
      • setValue

        public void setValue​(char[] value,
                             int start,
                             int end)
        Sets the token text.
        Parameters:
        value - source array containing token text
        start - the start index in value to start copying from
        end - the end index in value to stop copying at (not inclusive)
      • setValue

        public void setValue​(java.lang.String value,
                             int start,
                             int end)
        Sets the token text to be a substring of value.
        Parameters:
        value - source value containing token text.
        start - the start index in value to start copying from
        end - the end index in value to stop copying at (not inclusive)
      • getText

        public java.lang.String getText()
        Get the token text.
      • setText

        public void setText​(java.lang.String value)
        Set the token text.
      • subSequence

        public Token subSequence​(int start,
                                 int end)
        Returns a new character sequence that is a subsequence of this sequence.
        Specified by:
        subSequence in interface java.lang.CharSequence
        Parameters:
        start - the begin index, inclusive.
        end - the end index, exclusive.
        Returns:
        the specified subsequence.
      • charAt

        public char charAt​(int index)
        Returns the char value at the specified index.

        An index ranges from 0 to length() - 1. The first char value of the sequence is at index 0, the next at index 1, and so on, as for array indexing.

        Specified by:
        charAt in interface java.lang.CharSequence
        Parameters:
        index - the index of the char value to be returned
        Returns:
        the specified char value
      • length

        public int length()
        Returns the length of this token's text.
        Specified by:
        length in interface java.lang.CharSequence
      • toString

        public java.lang.String toString()
        Returns the String value representation of this Token.

        NOTE: token.equals( token.toString() ) will always equal true.

        Specified by:
        toString in interface java.lang.CharSequence
        Overrides:
        toString in class java.lang.Object
        Returns:
        the string representation of this Token
      • bufferEquals

        public boolean bufferEquals​(Token token)
        Return true if this Token's buffer contains the same text as token.
      • bufferEquals

        public boolean bufferEquals​(java.lang.String token)
        Return true if this Token's buffer contains the same text as token.
      • equals

        public boolean equals​(java.lang.Object other)
        Compares this Token to another object.
        Overrides:
        equals in class java.lang.Object
        Parameters:
        other - The object to compare this Token against.
        Returns:
        True if the given object represents a Token that is equal to this Token, or a String that is lexigraphically equivalent to this Token.
      • equals

        protected boolean equals​(Token other,
                                 long annotationMask)
        Compare this token to other.

        Only compare token annotations set on annotationMask.

      • compareTo

        public int compareTo​(Token other)
        Specified by:
        compareTo in interface java.lang.Comparable<Token>
      • compareToIgnoreCase

        public int compareToIgnoreCase​(Token other)
        Case insensitive equivalent to compareTo.

        NOTE: this method does not take Locale into account. To be truely Locale Aware, you should toString() this Token and use a java.text.Collator.

      • toLowerCase

        public Token toLowerCase()
        Convert all characters in this Token to be LowerCased.

        WARNING: this method is NOT Locale-aware.

        Returns:
        a reference to this Token.
      • toUpperCase

        public Token toUpperCase()
        Convert all characters in this Token to be LowerCased.

        WARNING: this method is NOT Locale-aware.

        Returns:
        a reference to this Token.
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • clone

        public Token clone()
        Overrides:
        clone in class java.lang.Object
      • write

        public void write​(java.lang.StringBuilder buffer)
        Append this Token's text to the StringBuilder.
      • toQueryString

        public void toQueryString​(java.lang.StringBuilder buffer)
        Encode this Token into buffer suitable for parsing inside a phrase() operator.
      • writeTo

        public void writeTo​(java.lang.StringBuilder buffer)
        Serialize this Token to buffer.
      • valueOf

        public static Token valueOf​(java.lang.Object value)
        Return the token representation of an arbitrary value.