Interface Tokenizer


  • public interface Tokenizer
    An interface for objects that take String and produce TokenLists.
    • Field Detail

      • TOKENIZER_DEFAULT

        static final java.lang.String TOKENIZER_DEFAULT
        The name of the default system tokenizer.
        See Also:
        Constant Field Values
    • Method Detail

      • getIngestTokenizer

        default Tokenizer getIngestTokenizer​(SchemaField field,
                                             java.util.Locale locale)
                                      throws AttivioException
        Get the underlying tokenizer to use for tokenizing fields in the ingest workflow.

        In general, this method should return this. Tokenizers that route to sub-tokenizers for handling different fields/locales should return the actual tokenizer that will be used.

        Throws:
        AttivioException
      • getQueryTokenizer

        default Tokenizer getQueryTokenizer​(SchemaField field,
                                            java.util.Locale locale)
                                     throws AttivioException
        Get the underlaying tokenizer to use for tokenizing fields in the query workflow.

        In general, this method should return this. Tokenizers that route to sub-tokenizers for handling different fields/locales should return the actual tokenizer that will be used.

        Throws:
        AttivioException
      • tokenize

        void tokenize​(SchemaField field,
                      java.util.Locale locale,
                      TokenList tokens)
               throws AttivioException
        Tokenizes all tokens in tokens.
        Parameters:
        field - the schema field being tokenized (may be null)
        locale - the Locale of the tokens (may be null)
        tokens - the token list
        Throws:
        AttivioException - on an unrecoverable error
      • tokenize

        default TokenList tokenize​(SchemaField field,
                                   java.util.Locale locale,
                                   java.lang.String value)
                            throws AttivioException
        Tokenizes value into a TokenList.
        Parameters:
        field - the schema field being tokenized (may be null)
        locale - the Locale of the tokens (may be null)
        value - the string to tokenize
        Throws:
        AttivioException - on an unrecoverable error
      • tokenize

        Phrase tokenize​(SchemaField field,
                        java.util.Locale locale,
                        SearchTerm term)
                 throws AttivioException
        Tokenizes term into a Phrase for query processing.
        Parameters:
        field - the schema field being tokenized (may be null)
        locale - the Locale of the tokens (may be null)
        term - the SearchTerm to tokenize
        Throws:
        AttivioException - on an unrecoverable error
      • tokenize

        Phrase tokenize​(SchemaField field,
                        java.util.Locale locale,
                        WildcardTerm term)
                 throws AttivioException
        Tokenizes a wildcard term into a Phrase for query processing.
        Parameters:
        field - the schema field being tokenized (may be null)
        locale - the Locale of the tokens (may be null)
        term - the WildcardTerm to tokenize
        Throws:
        AttivioException - on an unrecoverable error
      • tokenize

        Phrase tokenize​(SchemaField field,
                        java.util.Locale locale,
                        TermRange range)
                 throws AttivioException
        Tokenizes range into a Phrase for query processing.
        Parameters:
        field - the schema field being tokenized (may be null)
        locale - the Locale of the tokens (may be null)
        range - the TermRange to tokenize
        Throws:
        AttivioException - on an unrecoverable error