public class Token extends Object implements Comparable<Token>, Cloneable, CharSequence, Serializable, TokenAnnotationSet
Modifier and Type | Field and Description |
---|---|
protected int |
endOffset
The start offset in the source text
|
protected int |
startOffset
The start offset in the source text
|
protected String |
text
The character data for this token
|
Constructor and Description |
---|
Token(char[] text)
Construct a new Token with
text . |
Token(CharSequence text)
Construct a new Token with
text . |
Token(String text)
Construct a new token with
text |
Token(String text,
TokenAnnotation annotation)
Construct a new annotated token with
text . |
Modifier and Type | Method and Description |
---|---|
void |
append(CharSequence value)
Append
value to the end of this token. |
boolean |
bufferEquals(String token)
Return true if this Token's buffer contains the same text as
token . |
boolean |
bufferEquals(Token token)
Return true if this Token's buffer contains the same text as
token . |
char |
charAt(int index)
Returns the char value at the specified index.
|
Token |
clone()
Returns a copy of this Token instance.
|
int |
compareTo(Token other) |
int |
compareToIgnoreCase(Token other)
Case insensitive equivalent to compareTo.
|
boolean |
contains(char c)
Returns
true if this Token contains c . |
boolean |
containsAnnotation(TokenAnnotation annotation)
Returns true if the specified annotation is set.
|
boolean |
containsWhitespace()
Returns
true if this Token contains any whitespace. |
boolean |
containsWildcard()
Returns true if token contains a wildcard character.
|
static Token |
createElementAttributeToken(String key,
String value)
Creates an element attribute token.
|
static Token |
createEndScopeToken(String text)
Creates an end scope token.
|
static Token |
createStartScopeToken(String text)
Creates a start scope token.
|
boolean |
equals(Object other)
Compares this Token to another object.
|
protected boolean |
equals(Token other,
long annotationMask)
Compare this token to
other . |
long |
getAnnotations()
Get all set token annotations as a bit mask.
|
int |
getEndOffset()
Get the end offset from the source text for this token.
|
int |
getStartOffset()
Get the start offset from the source text for this token.
|
String |
getText()
Get the token text.
|
int |
hashCode() |
int |
indexOf(char ch)
Returns the index within this token of the first occurence of the specified character.
|
boolean |
isMatchAll()
Returns true if this token is a single character wildcard '*' query.
|
boolean |
isScopeEnd()
Returns true if this token designates the end of a scope.
|
boolean |
isScopeStart()
Returns true if this token designates the start of a scope.
|
boolean |
isScopeToken()
Returns true if this token designates the start or end of a scope.
|
boolean |
isSearchTerm()
Returns
true if this token is an standard atomic search term. |
boolean |
isSurfaceForm()
Returns
true if this token is a surface token. |
int |
length()
Returns the length of this token's text.
|
int |
offsetGap(Token previous)
Get the offset gap between
previous and this token. |
void |
setAnnotation(TokenAnnotation annotation)
Set a TokenAnnotation.
|
void |
setAnnotations(long mask)
Set the token annotations from a bit mask.
|
void |
setEndOffset(int offset)
Set the end offset from the source text for this token.
|
void |
setLength(int length)
Truncate token text to length.
|
void |
setStartOffset(int offset)
Set the start offset from the source text for this token.
|
void |
setText(String value)
Set the token text.
|
void |
setValue(char[] value)
Set the token text to
value . |
void |
setValue(char[] value,
int start,
int end)
Sets the token text.
|
void |
setValue(CharSequence value)
Set the token text to
value . |
void |
setValue(String value)
Set the token text to
value . |
void |
setValue(String value,
int start,
int end)
Sets the token text to be a substring of
value . |
Token |
subSequence(int start,
int end)
Returns a new character sequence that is a subsequence of this sequence.
|
Token |
toLowerCase()
Convert all characters in this Token to be LowerCased.
|
Phrase |
toPhrase()
Convert this token into a suitable Phrase query.
|
Phrase |
toPhrase(int offsetBase)
Convert this token into a suitable Phrase query.
|
void |
toQueryString(StringBuilder buffer)
Encode this Token into
buffer suitable for parsing inside a phrase() operator. |
String |
toString()
Returns the String value representation of this Token.
|
Token |
toUpperCase()
Convert all characters in this Token to be LowerCased.
|
void |
unsetAnnotation(TokenAnnotation annotation)
Unset a TokenAnnotation.
|
static Token |
valueOf(Object value)
Return the token representation of an arbitrary
value . |
void |
write(StringBuilder buffer)
Append this Token's text to the StringBuilder.
|
void |
writeTo(StringBuilder buffer)
Serialize this Token to
buffer . |
finalize, getClass, notify, notifyAll, wait, wait, wait
chars, codePoints
protected String text
protected int startOffset
protected int endOffset
public Token(String text)
text
public Token(char[] text)
text
.public Token(CharSequence text)
text
.public Token(String text, TokenAnnotation annotation)
text
.public static Token createStartScopeToken(String text)
text
- the scope namepublic static Token createEndScopeToken(String text)
text
- the scope namepublic static Token createElementAttributeToken(String key, String value)
key
- the attribute namevalue
- the attribute valuepublic Phrase toPhrase()
public Phrase toPhrase(int offsetBase)
public boolean isSearchTerm()
true
if this token is an standard atomic search term.
Returns false
if this is a special term, such as a wildcard, regex, or fuzzy search term.
public boolean isSurfaceForm()
true
if this token is a surface token.public long getAnnotations()
getAnnotations
in interface TokenAnnotationSet
public int offsetGap(Token previous)
previous
and this token.
NOTE: if this Token or the previous
token do not contain offsets, 1
is returned.
NOTE: Returns 0
if previous
is null
.
public void setAnnotations(long mask)
setAnnotations
in interface TokenAnnotationSet
public void setAnnotation(TokenAnnotation annotation)
setAnnotation
in interface TokenAnnotationSet
public boolean containsAnnotation(TokenAnnotation annotation)
containsAnnotation
in interface TokenAnnotationSet
public void unsetAnnotation(TokenAnnotation annotation)
unsetAnnotation
in interface TokenAnnotationSet
public boolean isScopeToken()
public boolean isScopeStart()
public boolean isScopeEnd()
public boolean containsWildcard()
public boolean isMatchAll()
public boolean containsWhitespace()
true
if this Token contains any whitespace.public boolean contains(char c)
true
if this Token contains c
.public int getStartOffset()
public void setStartOffset(int offset)
public int getEndOffset()
The end offset is actually 1 past the final column of source text.
To get the source text, you should do the following:
String tokenSource = sourceText.substring(token.getStartOffset(), token.getEndOffset());
public void setEndOffset(int offset)
public int indexOf(char ch)
ch
- a character (Unicode code point).public void setValue(char[] value)
value
.public void setValue(String value)
value
.public void setLength(int length)
public void setValue(CharSequence value)
value
.public void append(CharSequence value)
value
to the end of this token.public void setValue(char[] value, int start, int end)
value
- source array containing token textstart
- the start index in value to start copying fromend
- the end index in value to stop copying at (not inclusive)public void setValue(String value, int start, int end)
value
.value
- source value containing token text.start
- the start index in value to start copying fromend
- the end index in value to stop copying at (not inclusive)public String getText()
public void setText(String value)
public Token subSequence(int start, int end)
subSequence
in interface CharSequence
start
- the begin index, inclusive.end
- the end index, exclusive.public char charAt(int index)
An index ranges from 0 to length() - 1. The first char value of the sequence is at index 0, the next at index 1, and so on, as for array indexing.
charAt
in interface CharSequence
index
- the index of the char value to be returnedpublic int length()
length
in interface CharSequence
public String toString()
NOTE: token.equals( token.toString() ) will always equal true.
toString
in interface CharSequence
toString
in class Object
public boolean bufferEquals(Token token)
token
.public boolean bufferEquals(String token)
token
.public boolean equals(Object other)
protected boolean equals(Token other, long annotationMask)
other
.
Only compare token annotations set on annotationMask
.
public int compareTo(Token other)
compareTo
in interface Comparable<Token>
public int compareToIgnoreCase(Token other)
NOTE: this method does not take Locale into account. To be truely Locale Aware, you should toString() this Token and use a java.text.Collator.
public Token toLowerCase()
WARNING: this method is NOT Locale-aware.
public Token toUpperCase()
WARNING: this method is NOT Locale-aware.
public Token clone()
public void write(StringBuilder buffer)
public void toQueryString(StringBuilder buffer)
buffer
suitable for parsing inside a phrase() operator.public void writeTo(StringBuilder buffer)
buffer
.Copyright © 2018 Attivio, Inc. All Rights Reserved.
PATENT NOTICE: Attivio, Inc. Software Related Patents. With respect to the Attivio software product(s) being used, the following patents apply: Querying Joined Data Within A Search Engine Index: United States Patent No.(s): 8,073,840. Ordered Processing of Groups of Messages: U.S. Patent No.(s) 8,495,656. Signal processing approach to sentiment analysis for entities in documents: U.S. Patent No.(s) 8,725,494. Other U.S. and International Patents Pending.