Interface IngestClient

  • All Superinterfaces:
    java.lang.AutoCloseable, java.io.Closeable, DocumentOutputClient, SecurityFeeder
    All Known Implementing Classes:
    ContentFeeder, MockPublisher

    public interface IngestClient
    extends DocumentOutputClient, java.io.Closeable
    Sends content (documents, deletes, index messages) to an Attivio instance. Typical configurations of AIE do not use "Ordered Messaging" and will process documents and messages in no guaranteed order.

    Message Delivery and Waiting For Completion

    The default implementation of IngestClient uses a Message Result Handler that tracks when documents and messages make it to the index. Calling waitForCompletion() will ensure that this method blocks until all previously sent messages have been acknowledged as getting to the index. With "Ordered Commits" (see below) enabled, a call to commit(String...) automatically calls waitForCompletion() internally before sending the Commit message to ensure all previous documents are in the index before the commit gets to the index. A call to waitForCompletion() after the commit(String...) will then, similarly, block until that Commit message has made it into the index, thus ensuring the commit occurred and those documents will return from future searches.

    Ordered Commits

    When enabled (typically the default), ordered commits ensures previously fed documents, deletes and messages all make it to the AIE index before a Commit or Optimize is sent. This means calls to commit(String...), refresh() or optimize() may take a while to finish while waiting for messages to be processed and make it to the index. To guarantee document visibility on queries immediately after a call to commit(String...) or refresh(), users must call waitForCompletion() to wait for the commit/refresh acknowledgment from the engine.

    If message and commit scope is not critical, then either disable "Ordered Commits" or have a different IngestClient feeding data than the one managing Commit and other index messages.

    Message Groups

    Any messages implementing the DocumentOutputClient interface can be essentially "grouped together" by sending them within the same message group. AIE internals guarantee that no Commit messages will get into the index in the middle of a message group. This is vital for bundling operations that should appear in the index in the same commit scope. Calls to commit(String...) or refresh() will throw an exception if called within a message group. A typical code example follows:
     feeder.startMessageGroup();
     feeder.feed(doc1);
     feeder.feed(doc1_extras);
     feeder.delete(doc1_old_extras);
     feeder.endMessageGroup();
     

    Document Batching

    Document processing occurs in document batches of 1 or more documents that are passed through AIE as single messages. Document batch sizes are controlled via setDocumentBatchSize(int) and setMaxBatchSizeMB(int). Document batches will be flushed whenever one of these limits is exceeded. Default configuration for these settings ensure that large documents will result in smaller batches in order to reduce memory overhead and small documents will result in larger batches to increase throughput. Certain operations, like commit(String...) and optimize(), may cause the batch to be sent before reaching the exact size.

    Please contact Attivio before changing the default batching settings.

    Auto Commit and Auto Optimize

    To automatically send Commit or Optimize messages based on a certain document frequency, call setCommitInterval(int) and setOptimizeInterval(int), respectively. If triggered inside a Message Group, the Commit/Optimize will not be sent until the end of the Message Group.

    Thread Safety

    IngestClient is NOT thread safe and thus, each thread must have its own instance.

    If trying to coordinate a Commit across multiple IngestClients, have each IngestClient thread call waitForCompletion() to ensure all previously sent documents are in the index, and then use something like a CountDownLatch to let the committing thread's IngestClient know it can call commit(String...). After the committing thread then returns from waitForCompletion(), all previously fed documents on all threads are known to be in the index and committed for search retrieval.

    Example of typical IngestClient:

     // Create IngestClient
     // IngestClient implements the Closeable interface and should use the try-with-resources idiom
     try (IngestClient feeder = new IngestClient()) {
       feeder.setIngestWorkflowName("some_workflow_name");
       // Feed documents
       IngestDocument doc1 = new IngestDocument("doc1");
       IngestDocument doc2 = new IngestDocument("doc2");
       IngestDocument doc3 = new IngestDocument("doc3");
       ... // Populate doc1, doc2, and doc3
       feeder.feed(doc1);
       feeder.feed(doc2, doc3); // feed a batch of 2 documents
       feeder.commit();
       feeder.waitForCompletion(); // wait for commit to complete to see all docs in next search
     }
     
    • Method Detail

      • getDocumentsSent

        long getDocumentsSent()
        Returns:
        the number of documents sent from this client to AIE.
      • feed

        void feed​(AttivioPrincipal principal)
           throws AttivioException
        Feeds an AttivioPrincipal to AIE. Any existing data for the principal and its associations will be deleted. This document may be batched up into a queue and not immediately sent to the index if getDocumentBatchSize() > 1.
        Specified by:
        feed in interface SecurityFeeder
        Parameters:
        principal - the principal
        Throws:
        AttivioException - if sending of documents fails.
      • sendIndexMessage

        void sendIndexMessage​(IndexMessage message)
                       throws AttivioException
        Sends an arbitrary IndexMessage into the engine.

        If the feeder is batching documents, this method will send any queued documents before sending the index message.

        If isOrderedCommits() is true and this message is a Commit or Optimize, then this method will first ensure all previously sent documents are in the index, before sending this index message.

        Parameters:
        message - the IndexMessage to send.
        Throws:
        AttivioException - if sending of the message fails.
      • put

        ContentPointer put​(java.lang.String id,
                           java.io.InputStream input)
                    throws AttivioException
        Put a resource into the Content Store.

        If no ContentStore is in configured for this ContentFeeder, then a ByteArrayContentPointer is used and returned.

        If input produces less than getContentStoreThresholdKB(), then a ByteArrayContentPointer will be returned.

        Parameters:
        id - the ID of the resource to put into the Content Store.
        input - the input stream for the resource's data.
        Returns:
        a ContentPointer for the resource.
        Throws:
        AttivioException - if putting the resource into the Content Store fails.
      • put

        ContentPointer put​(java.lang.String id,
                           java.io.File f)
                    throws AttivioException
        Put the contents of a File into the content store.
        Parameters:
        id - the ID of the resource to put into the Content Store.
        f - file with data.
        Returns:
        a ContentPointer for the resource.
        Throws:
        AttivioException - if putting the resource into the Content Store fails.
        See Also:
        put(String, InputStream)
      • put

        ContentPointer put​(java.lang.String id,
                           byte[] bytes)
                    throws AttivioException
        Put the contents of a byte array into the content store.
        Parameters:
        id - the ID of the resource to put into the Content Store.
        bytes - data to put.
        Returns:
        a ContentPointer for the resource.
        Throws:
        AttivioException - if putting the resource into the Content Store fails.
        See Also:
        put(String, InputStream)
      • waitForCompletion

        boolean waitForCompletion()
                           throws AttivioException
        Wait for all documents and messages to be fully processed. If the time is exceeded and not all documents have been processed, then errors will be logged for the missing document results. Note, after a successful waitForCompletion, the ID returned by getClientId() will change to a new random UUID. If changing the ID is not desired behavior, see waitForCompletion(boolean changeClientId)
        Returns:
        true if all outstanding documents and messages were processed before the timeout (or audits are disabled).
        Throws:
        AttivioException
      • waitForCompletion

        boolean waitForCompletion​(boolean changeClientId)
                           throws AttivioException
        Wait the specified number of milliseconds for all MessageResults to be processed. Note, after a successful waitForCompletion, the ID returned by getClientId() will change to a new random UUID if the changeClientId parameter provided to this method is true. Otherwise, the ID will remain consistent.
        Parameters:
        changeClientId - whether to change the clientId or not
        Returns:
        true if all outstanding documents and messages were processed before the timeout (or audits are disabled).
        Throws:
        AttivioException
      • waitForCompletion

        boolean waitForCompletion​(long timeout)
                           throws AttivioException
        Wait the specified number of milliseconds for all MessageResults to be processed. Note, after a successful waitForCompletion, the ID returned by getClientId() will change to a new random UUID. If changing the ID is not desired behavior, see waitForCompletion(long timeout, boolean changeClientId)

        A timeout of <=0 means to wait forever.

        Parameters:
        timeout - max time to wait in milliseconds
        Returns:
        true if all outstanding documents and messages were processed before the timeout (or audits are disabled).
        Throws:
        AttivioException
      • waitForCompletion

        boolean waitForCompletion​(long timeout,
                                  boolean changeClientId)
                           throws AttivioException
        Wait the specified number of milliseconds for all MessageResults to be processed. Note, after a successful waitForCompletion, the ID returned by getClientId() will change to a new random UUID if the changeClientId parameter provided to this method is true. Otherwise, the ID will remain consistent.

        A timeout of <=0 means to wait forever.

        Parameters:
        timeout - max time to wait in milliseconds
        changeClientId - whether to change the clientId or not
        Returns:
        true if all outstanding documents and messages were processed before the timeout (or audits are disabled).
        Throws:
        AttivioException
      • setOrderedCommits

        void setOrderedCommits​(boolean orderedCommits)
        Enable or disable "Ordered Commits". Ordered commits ensure that all content fed to AIE is completed before a commit call is executed. This has the effect of making the commit(String...) a blocking call while a call to waitForCompletion() is called.

        Please refer to the documentation on ordered commits the developer network for more information.

        Parameters:
        orderedCommits - set to true to enable this or false to disable it
      • isOrderedCommits

        boolean isOrderedCommits()
        Returns:
        true if "Ordered Commits" is enabled
      • setDocumentBatchSize

        void setDocumentBatchSize​(int batchSize)
        Sets the maximum number of documents that will ever be sent as a part of a single DocumentList.

        This batch size (specified in number of documents) does not guarantee that all DocumentLists will be exactly this size as calls to commit(String...), optimize() or DocumentOutputClient.startMessageGroup() can cause smaller batches to be sent.

        Parameters:
        batchSize - in number of documents
      • getDocumentBatchSize

        int getDocumentBatchSize()
        Gets current documentBatchSize set.
        Returns:
        the maximum number of documents that will ever be sent as a part of a single DocumentList.
      • getMaxBatchSizeMB

        int getMaxBatchSizeMB()
        Gets the maximum size of documents (in megabytes) that will ever be sent as part of a single DocumentList.

        Document size will be estimated based on field values and the size of any ContentPointers included in the document.

      • setMaxBatchSizeMB

        void setMaxBatchSizeMB​(int value)
        Sets the maximum size of documents (in megabytes) that will ever be sent as part of a single DocumentList.

        Document size will be estimated based on field values and the size of any ContentPointers included in the document.

        This batch size (specified in megabytes) does not guarantee that all DocumentLists will be exactly this size as calls to commit(String...), optimize() or DocumentOutputClient.startMessageGroup() can cause smaller batches to be sent. The specified getDocumentBatchSize() will also be used to limit the size of DocumentLists.

      • getContentStoreThresholdKB

        int getContentStoreThresholdKB()
        Gets the minimum size required (in kilobytes) for content to be stored in the content store.
        See Also:
        put(String, InputStream)
      • setContentStoreThresholdKB

        void setContentStoreThresholdKB​(int value)
        Sets the minimum size required (in kilobytes) for content to be stored in the content store.
        See Also:
        put(String, InputStream)
      • getDocumentsQueued

        long getDocumentsQueued()
        Returns:
        the number of documents that are batched up in the queue to be sent to AIE.
      • setCommitInterval

        void setCommitInterval​(int commitInterval)
        Sets the interval on which an automatic Commit will be sent to AIE.
        Parameters:
        commitInterval - in number of documents
      • getCommitInterval

        int getCommitInterval()
        Returns:
        commit interval in number of documents
      • setOptimizeInterval

        void setOptimizeInterval​(int optimizeInterval)
        Sets the interval on which an automatic Optimize will be sent to AIE.
        Parameters:
        optimizeInterval - in number of documents
      • getOptimizeInterval

        int getOptimizeInterval()
        Returns:
        optimize interval in number of documents
      • setIngestWorkflowName

        void setIngestWorkflowName​(java.lang.String wf)
        Set ingest workflow name
        Parameters:
        wf - workflow name
      • getIngestWorkflowName

        java.lang.String getIngestWorkflowName()
        Returns:
        ingest workflow name
      • getSizeToDomain

        java.util.Map<java.lang.String,​java.lang.String> getSizeToDomain()
        Returns:
        return the size to message domain map for this client
      • setSizeToDomain

        void setSizeToDomain​(java.util.Map<java.lang.String,​java.lang.String> sizeToDomain)
        Sets the size to message domain map for this client. Message domains allow AIE to devote differing amounts of processing resources to messages from different sources. This facility is generally used to restrict the resources devoted to sources of large documents so that other sources can continue to make progress. Message domains are take effect only when the AIE cluster is configured with matching message domain names. The domain choice offered by this map is represented as a map of minimum size to domain. All batches of documents whose total size is greater than or equal to the minimum size of a domain 'd' but less than any other minimum size for other domains will be mapped to domain 'd'. Any batch smaller than this minimum size will have no domain mapping. Megabytes are used as the unit for the size
        Parameters:
        sizeToDomain -
      • commit

        void commit​(java.lang.String... zones)
             throws AttivioException
        Sends internal queue of documents and commits all documents sent to AIE.

        If isOrderedCommits() is true (typically the default "Ordered Commits" mode), then wait for all previously fed documents to make it to the index, before committing, thus ensuring all previously fed documents get committed (this may take a while). Additionally, in such a case, the clientId will change for future messages.

        Parameters:
        zones - an optional list of zones to commit - if no zones are specified, all zones are committed
        Throws:
        AttivioException - on failure
      • refresh

        void refresh()
              throws AttivioException
        Sends internal queue of documents and commits all partial update documents sent to AIE.

        If isOrderedCommits() is true (typically the default "Ordered Commits" mode), then wait for all previously fed documents to make it to the index, before refreshing, thus ensuring all previously fed documents get refreshed. (this may take a while)

        Throws:
        AttivioException - on failure
      • optimize

        void optimize()
               throws AttivioException
        Sends internal queue of documents and optimizes the AIE index and associated data structures.

        If isOrderedCommits() is true (typically the default "Ordered Commits" mode), then wait for all previously fed documents to make it to the index, before refreshing, thus ensuring all previously fed documents get optimized (this may take a while). Additionally, in such a case, the clientId will change for future messages.

        Throws:
        AttivioException - on failure
      • getStaticFields

        java.util.Map<java.lang.String,​java.lang.Object> getStaticFields()
        Returns:
        a map of all fields that will be set on every document sent from this feeder.
      • setStaticFields

        void setStaticFields​(java.util.Map<java.lang.String,​java.lang.Object> fields)
        Set a map of fields to apply to all documents.
        Parameters:
        fields - the fields and values to set
        Throws:
        java.lang.IllegalStateException - if any documents have already been fed.
      • addStaticField

        void addStaticField​(java.lang.String name,
                            java.lang.Object value)
        Add a field to apply to all documents.
        Parameters:
        name - the name of the field
        value - the value of the field
        Throws:
        java.lang.IllegalStateException - if any documents have already been fed.