Interface IngestClient
-
- All Superinterfaces:
java.lang.AutoCloseable
,java.io.Closeable
,DocumentOutputClient
,SecurityFeeder
- All Known Implementing Classes:
ContentFeeder
,MockPublisher
public interface IngestClient extends DocumentOutputClient, java.io.Closeable
Sends content (documents, deletes, index messages) to an Attivio instance. Typical configurations of AIE do not use "Ordered Messaging" and will process documents and messages in no guaranteed order.Message Delivery and Waiting For Completion
The default implementation ofIngestClient
uses a Message Result Handler that tracks when documents and messages make it to the index. CallingwaitForCompletion()
will ensure that this method blocks until all previously sent messages have been acknowledged as getting to the index. With "Ordered Commits" (see below) enabled, a call tocommit(String...)
automatically callswaitForCompletion()
internally before sending theCommit
message to ensure all previous documents are in the index before the commit gets to the index. A call towaitForCompletion()
after thecommit(String...)
will then, similarly, block until thatCommit
message has made it into the index, thus ensuring the commit occurred and those documents will return from future searches.Ordered Commits
When enabled (typically the default), ordered commits ensures previously fed documents, deletes and messages all make it to the AIE index before aCommit
orOptimize
is sent. This means calls tocommit(String...)
,refresh()
oroptimize()
may take a while to finish while waiting for messages to be processed and make it to the index. To guarantee document visibility on queries immediately after a call tocommit(String...)
orrefresh()
, users must callwaitForCompletion()
to wait for the commit/refresh acknowledgment from the engine.If message and commit scope is not critical, then either disable "Ordered Commits" or have a different
IngestClient
feeding data than the one managingCommit
and other index messages.Message Groups
Any messages implementing theDocumentOutputClient
interface can be essentially "grouped together" by sending them within the same message group. AIE internals guarantee that noCommit
messages will get into the index in the middle of a message group. This is vital for bundling operations that should appear in the index in the same commit scope. Calls tocommit(String...)
orrefresh()
will throw an exception if called within a message group. A typical code example follows:feeder.startMessageGroup(); feeder.feed(doc1); feeder.feed(doc1_extras); feeder.delete(doc1_old_extras); feeder.endMessageGroup();
Document Batching
Document processing occurs in document batches of 1 or more documents that are passed through AIE as single messages. Document batch sizes are controlled viasetDocumentBatchSize(int)
andsetMaxBatchSizeMB(int)
. Document batches will be flushed whenever one of these limits is exceeded. Default configuration for these settings ensure that large documents will result in smaller batches in order to reduce memory overhead and small documents will result in larger batches to increase throughput. Certain operations, likecommit(String...)
andoptimize()
, may cause the batch to be sent before reaching the exact size.Please contact Attivio before changing the default batching settings.
Auto Commit and Auto Optimize
To automatically sendCommit
orOptimize
messages based on a certain document frequency, callsetCommitInterval(int)
andsetOptimizeInterval(int)
, respectively. If triggered inside a Message Group, theCommit
/Optimize
will not be sent until the end of the Message Group.Thread Safety
IngestClient
is NOT thread safe and thus, each thread must have its own instance.If trying to coordinate a
Commit
across multipleIngestClient
s, have eachIngestClient
thread callwaitForCompletion()
to ensure all previously sent documents are in the index, and then use something like aCountDownLatch
to let the committing thread'sIngestClient
know it can callcommit(String...)
. After the committing thread then returns fromwaitForCompletion()
, all previously fed documents on all threads are known to be in the index and committed for search retrieval.Example of typical IngestClient:
// Create IngestClient // IngestClient implements the Closeable interface and should use the try-with-resources idiom try (IngestClient feeder = new IngestClient()) { feeder.setIngestWorkflowName("some_workflow_name"); // Feed documents IngestDocument doc1 = new IngestDocument("doc1"); IngestDocument doc2 = new IngestDocument("doc2"); IngestDocument doc3 = new IngestDocument("doc3"); ... // Populate doc1, doc2, and doc3 feeder.feed(doc1); feeder.feed(doc2, doc3); // feed a batch of 2 documents feeder.commit(); feeder.waitForCompletion(); // wait for commit to complete to see all docs in next search }
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description void
addStaticField(java.lang.String name, java.lang.Object value)
Add a field to apply to all documents.void
commit(java.lang.String... zones)
Sends internal queue of documents and commits all documents sent to AIE.void
deletePrincipal(AttivioPrincipalKey key)
Delete an AttivioPrincipal from AIE.void
feed(IngestDocument doc, AttivioAcl acl)
Feeds the document with its ACL to AIE.void
feed(AttivioPrincipal principal)
Feeds an AttivioPrincipal to AIE.java.util.UUID
getClientId()
This id will change after any calls towaitForCompletion()
orwaitForCompletion(long)
.int
getCommitInterval()
int
getContentStoreThresholdKB()
Gets the minimum size required (in kilobytes) for content to be stored in the content store.int
getDocumentBatchSize()
Gets current documentBatchSize set.long
getDocumentsFed()
Return the number of documents fed to this feeder since the connect.long
getDocumentsQueued()
long
getDocumentsSent()
java.lang.String
getIngestWorkflowName()
int
getMaxBatchSizeMB()
Gets the maximum size of documents (in megabytes) that will ever be sent as part of a singleDocumentList
.int
getOptimizeInterval()
java.util.Map<java.lang.String,java.lang.String>
getSizeToDomain()
java.util.Map<java.lang.String,java.lang.Object>
getStaticFields()
boolean
isCompleted()
True is all messages sent have been fully processed by AIE.boolean
isOrderedCommits()
void
optimize()
Sends internal queue of documents and optimizes the AIE index and associated data structures.ContentPointer
put(java.lang.String id, byte[] bytes)
Put the contents of a byte array into the content store.ContentPointer
put(java.lang.String id, InputStreamBuilder input)
Stores binary data in the ContentStore.ContentPointer
put(java.lang.String id, java.io.File f)
Put the contents of a File into the content store.ContentPointer
put(java.lang.String id, java.io.InputStream input)
Put a resource into the Content Store.void
refresh()
Sends internal queue of documents and commits all partial update documents sent to AIE.void
sendIndexMessage(IndexMessage message)
Sends an arbitrary IndexMessage into the engine.void
sendQueuedDocuments()
Send the documents in the queue to the enginevoid
setCommitInterval(int commitInterval)
Sets the interval on which an automaticCommit
will be sent to AIE.void
setContentStoreThresholdKB(int value)
Sets the minimum size required (in kilobytes) for content to be stored in the content store.void
setDocumentBatchSize(int batchSize)
Sets the maximum number of documents that will ever be sent as a part of a singleDocumentList
.void
setIngestWorkflowName(java.lang.String wf)
Set ingest workflow namevoid
setMaxBatchSizeMB(int value)
Sets the maximum size of documents (in megabytes) that will ever be sent as part of a singleDocumentList
.void
setOptimizeInterval(int optimizeInterval)
Sets the interval on which an automaticOptimize
will be sent to AIE.void
setOrderedCommits(boolean orderedCommits)
Enable or disable "Ordered Commits".void
setSizeToDomain(java.util.Map<java.lang.String,java.lang.String> sizeToDomain)
Sets the size to message domain map for this client.void
setStaticFields(java.util.Map<java.lang.String,java.lang.Object> fields)
Set a map of fields to apply to all documents.boolean
waitForCompletion()
Wait for all documents and messages to be fully processed.boolean
waitForCompletion(boolean changeClientId)
Wait the specified number of milliseconds for all MessageResults to be processed.boolean
waitForCompletion(long timeout)
Wait the specified number of milliseconds for all MessageResults to be processed.boolean
waitForCompletion(long timeout, boolean changeClientId)
Wait the specified number of milliseconds for all MessageResults to be processed.-
Methods inherited from interface com.attivio.sdk.client.DocumentOutputClient
bulkUpdate, delete, delete, deleteByQuery, endMessageGroup, feed, feed, isMessageGroupInProgress, startMessageGroup
-
-
-
-
Method Detail
-
getClientId
java.util.UUID getClientId()
This id will change after any calls towaitForCompletion()
orwaitForCompletion(long)
. Note thatwaitForCompletion
is implicitly called or when commits or optimizes occur ifisOrderedCommits()
istrue
.- Returns:
- the client ID used for this feeder
-
getDocumentsFed
long getDocumentsFed()
Return the number of documents fed to this feeder since the connect.As a general rule:
getDocumentsFed()
-getDocumentsQueued()
will equalgetDocumentsSent()
.- Returns:
- the number of documents that have been fed.
-
getDocumentsSent
long getDocumentsSent()
- Returns:
- the number of documents sent from this client to AIE.
-
feed
void feed(IngestDocument doc, AttivioAcl acl) throws AttivioException
Feeds the document with its ACL to AIE. This document may be batched up into a queue and not immediately sent to the index ifgetDocumentBatchSize()
> 1.- Specified by:
feed
in interfaceSecurityFeeder
- Parameters:
doc
- IngestDocument to feedacl
- the ACL associated with this document- Throws:
AttivioException
- if sending of documents fails.
-
feed
void feed(AttivioPrincipal principal) throws AttivioException
Feeds an AttivioPrincipal to AIE. Any existing data for the principal and its associations will be deleted. This document may be batched up into a queue and not immediately sent to the index ifgetDocumentBatchSize()
> 1.- Specified by:
feed
in interfaceSecurityFeeder
- Parameters:
principal
- the principal- Throws:
AttivioException
- if sending of documents fails.
-
deletePrincipal
void deletePrincipal(AttivioPrincipalKey key) throws AttivioException
Delete an AttivioPrincipal from AIE. This operation may be batched up into a queue and not immediately sent to the index ifgetDocumentBatchSize()
> 1.- Specified by:
deletePrincipal
in interfaceSecurityFeeder
- Parameters:
key
- the principal key- Throws:
AttivioException
- if sending of documents fails.
-
sendIndexMessage
void sendIndexMessage(IndexMessage message) throws AttivioException
Sends an arbitrary IndexMessage into the engine.If the feeder is batching documents, this method will send any queued documents before sending the index message.
If
isOrderedCommits()
is true and this message is aCommit
orOptimize
, then this method will first ensure all previously sent documents are in the index, before sending this index message.- Parameters:
message
- the IndexMessage to send.- Throws:
AttivioException
- if sending of the message fails.
-
sendQueuedDocuments
void sendQueuedDocuments() throws AttivioException
Send the documents in the queue to the engine- Throws:
AttivioException
-
put
ContentPointer put(java.lang.String id, java.io.InputStream input) throws AttivioException
Put a resource into the Content Store.If no ContentStore is in configured for this ContentFeeder, then a
ByteArrayContentPointer
is used and returned.If
input
produces less thangetContentStoreThresholdKB()
, then aByteArrayContentPointer
will be returned.- Parameters:
id
- the ID of the resource to put into the Content Store.input
- the input stream for the resource's data.- Returns:
- a ContentPointer for the resource.
- Throws:
AttivioException
- if putting the resource into the Content Store fails.
-
put
ContentPointer put(java.lang.String id, InputStreamBuilder input) throws AttivioException
Stores binary data in the ContentStore.- Parameters:
id
- the ID of the resource to put into the Content Store.input
- the input stream for the resource's data.- Returns:
- a ContentPointer for the resource.
- Throws:
AttivioException
- See Also:
put(String, InputStream)
-
put
ContentPointer put(java.lang.String id, java.io.File f) throws AttivioException
Put the contents of a File into the content store.- Parameters:
id
- the ID of the resource to put into the Content Store.f
- file with data.- Returns:
- a ContentPointer for the resource.
- Throws:
AttivioException
- if putting the resource into the Content Store fails.- See Also:
put(String, InputStream)
-
put
ContentPointer put(java.lang.String id, byte[] bytes) throws AttivioException
Put the contents of a byte array into the content store.- Parameters:
id
- the ID of the resource to put into the Content Store.bytes
- data to put.- Returns:
- a ContentPointer for the resource.
- Throws:
AttivioException
- if putting the resource into the Content Store fails.- See Also:
put(String, InputStream)
-
waitForCompletion
boolean waitForCompletion() throws AttivioException
Wait for all documents and messages to be fully processed. If the time is exceeded and not all documents have been processed, then errors will be logged for the missing document results. Note, after a successful waitForCompletion, the ID returned bygetClientId()
will change to a new random UUID. If changing the ID is not desired behavior, seewaitForCompletion(boolean changeClientId)
- Returns:
- true if all outstanding documents and messages were processed before the timeout (or audits are disabled).
- Throws:
AttivioException
-
waitForCompletion
boolean waitForCompletion(boolean changeClientId) throws AttivioException
Wait the specified number of milliseconds for all MessageResults to be processed. Note, after a successful waitForCompletion, the ID returned bygetClientId()
will change to a new random UUID if thechangeClientId
parameter provided to this method istrue
. Otherwise, the ID will remain consistent.- Parameters:
changeClientId
- whether to change the clientId or not- Returns:
- true if all outstanding documents and messages were processed before the timeout (or audits are disabled).
- Throws:
AttivioException
-
waitForCompletion
boolean waitForCompletion(long timeout) throws AttivioException
Wait the specified number of milliseconds for all MessageResults to be processed. Note, after a successful waitForCompletion, the ID returned bygetClientId()
will change to a new random UUID. If changing the ID is not desired behavior, seewaitForCompletion(long timeout, boolean changeClientId)
A timeout of
<=0
means to wait forever.- Parameters:
timeout
- max time to wait in milliseconds- Returns:
- true if all outstanding documents and messages were processed before the timeout (or audits are disabled).
- Throws:
AttivioException
-
waitForCompletion
boolean waitForCompletion(long timeout, boolean changeClientId) throws AttivioException
Wait the specified number of milliseconds for all MessageResults to be processed. Note, after a successful waitForCompletion, the ID returned bygetClientId()
will change to a new random UUID if thechangeClientId
parameter provided to this method istrue
. Otherwise, the ID will remain consistent.A timeout of
<=0
means to wait forever.- Parameters:
timeout
- max time to wait in millisecondschangeClientId
- whether to change the clientId or not- Returns:
- true if all outstanding documents and messages were processed before the timeout (or audits are disabled).
- Throws:
AttivioException
-
isCompleted
boolean isCompleted() throws AttivioException
True is all messages sent have been fully processed by AIE.- Throws:
AttivioException
-
setOrderedCommits
void setOrderedCommits(boolean orderedCommits)
Enable or disable "Ordered Commits". Ordered commits ensure that all content fed to AIE is completed before a commit call is executed. This has the effect of making thecommit(String...)
a blocking call while a call towaitForCompletion()
is called.Please refer to the documentation on ordered commits the developer network for more information.
- Parameters:
orderedCommits
- set to true to enable this or false to disable it
-
isOrderedCommits
boolean isOrderedCommits()
- Returns:
- true if "Ordered Commits" is enabled
-
setDocumentBatchSize
void setDocumentBatchSize(int batchSize)
Sets the maximum number of documents that will ever be sent as a part of a singleDocumentList
.This batch size (specified in number of documents) does not guarantee that all DocumentLists will be exactly this size as calls to
commit(String...)
,optimize()
orDocumentOutputClient.startMessageGroup()
can cause smaller batches to be sent.- Parameters:
batchSize
- in number of documents
-
getDocumentBatchSize
int getDocumentBatchSize()
Gets current documentBatchSize set.- Returns:
- the maximum number of documents that will ever be sent as a part of a single
DocumentList
.
-
getMaxBatchSizeMB
int getMaxBatchSizeMB()
Gets the maximum size of documents (in megabytes) that will ever be sent as part of a singleDocumentList
.Document size will be estimated based on field values and the size of any
ContentPointer
s included in the document.
-
setMaxBatchSizeMB
void setMaxBatchSizeMB(int value)
Sets the maximum size of documents (in megabytes) that will ever be sent as part of a singleDocumentList
.Document size will be estimated based on field values and the size of any
ContentPointer
s included in the document.This batch size (specified in megabytes) does not guarantee that all DocumentLists will be exactly this size as calls to
commit(String...)
,optimize()
orDocumentOutputClient.startMessageGroup()
can cause smaller batches to be sent. The specifiedgetDocumentBatchSize()
will also be used to limit the size of DocumentLists.
-
getContentStoreThresholdKB
int getContentStoreThresholdKB()
Gets the minimum size required (in kilobytes) for content to be stored in the content store.- See Also:
put(String, InputStream)
-
setContentStoreThresholdKB
void setContentStoreThresholdKB(int value)
Sets the minimum size required (in kilobytes) for content to be stored in the content store.- See Also:
put(String, InputStream)
-
getDocumentsQueued
long getDocumentsQueued()
- Returns:
- the number of documents that are batched up in the queue to be sent to AIE.
-
setCommitInterval
void setCommitInterval(int commitInterval)
Sets the interval on which an automaticCommit
will be sent to AIE.- Parameters:
commitInterval
- in number of documents
-
getCommitInterval
int getCommitInterval()
- Returns:
- commit interval in number of documents
-
setOptimizeInterval
void setOptimizeInterval(int optimizeInterval)
Sets the interval on which an automaticOptimize
will be sent to AIE.- Parameters:
optimizeInterval
- in number of documents
-
getOptimizeInterval
int getOptimizeInterval()
- Returns:
- optimize interval in number of documents
-
setIngestWorkflowName
void setIngestWorkflowName(java.lang.String wf)
Set ingest workflow name- Parameters:
wf
- workflow name
-
getIngestWorkflowName
java.lang.String getIngestWorkflowName()
- Returns:
- ingest workflow name
-
getSizeToDomain
java.util.Map<java.lang.String,java.lang.String> getSizeToDomain()
- Returns:
- return the size to message domain map for this client
-
setSizeToDomain
void setSizeToDomain(java.util.Map<java.lang.String,java.lang.String> sizeToDomain)
Sets the size to message domain map for this client. Message domains allow AIE to devote differing amounts of processing resources to messages from different sources. This facility is generally used to restrict the resources devoted to sources of large documents so that other sources can continue to make progress. Message domains are take effect only when the AIE cluster is configured with matching message domain names. The domain choice offered by this map is represented as a map of minimum size to domain. All batches of documents whose total size is greater than or equal to the minimum size of a domain 'd' but less than any other minimum size for other domains will be mapped to domain 'd'. Any batch smaller than this minimum size will have no domain mapping. Megabytes are used as the unit for the size- Parameters:
sizeToDomain
-
-
commit
void commit(java.lang.String... zones) throws AttivioException
Sends internal queue of documents and commits all documents sent to AIE.If
isOrderedCommits()
is true (typically the default "Ordered Commits" mode), then wait for all previously fed documents to make it to the index, before committing, thus ensuring all previously fed documents get committed (this may take a while). Additionally, in such a case, the clientId will change for future messages.- Parameters:
zones
- an optional list of zones to commit - if no zones are specified, all zones are committed- Throws:
AttivioException
- on failure
-
refresh
void refresh() throws AttivioException
Sends internal queue of documents and commits all partial update documents sent to AIE.If
isOrderedCommits()
is true (typically the default "Ordered Commits" mode), then wait for all previously fed documents to make it to the index, before refreshing, thus ensuring all previously fed documents get refreshed. (this may take a while)- Throws:
AttivioException
- on failure
-
optimize
void optimize() throws AttivioException
Sends internal queue of documents and optimizes the AIE index and associated data structures.If
isOrderedCommits()
is true (typically the default "Ordered Commits" mode), then wait for all previously fed documents to make it to the index, before refreshing, thus ensuring all previously fed documents get optimized (this may take a while). Additionally, in such a case, the clientId will change for future messages.- Throws:
AttivioException
- on failure
-
getStaticFields
java.util.Map<java.lang.String,java.lang.Object> getStaticFields()
- Returns:
- a map of all fields that will be set on every document sent from this feeder.
-
setStaticFields
void setStaticFields(java.util.Map<java.lang.String,java.lang.Object> fields)
Set a map of fields to apply to all documents.- Parameters:
fields
- the fields and values to set- Throws:
java.lang.IllegalStateException
- if any documents have already been fed.
-
addStaticField
void addStaticField(java.lang.String name, java.lang.Object value)
Add a field to apply to all documents.- Parameters:
name
- the name of the fieldvalue
- the value of the field- Throws:
java.lang.IllegalStateException
- if any documents have already been fed.
-
-