public interface IngestClient extends DocumentOutputClient, Closeable
IngestClient
uses a Message Result
Handler that tracks when documents and messages make it to the index. Calling waitForCompletion()
will ensure that
this method blocks until all previously sent messages have been acknowledged as getting to the index. With "Ordered Commits"
(see below) enabled, a call to commit(String...)
automatically calls waitForCompletion()
internally before
sending the Commit
message to ensure all previous documents are in the index before the commit gets to the index. A
call to waitForCompletion()
after the commit(String...)
will then, similarly, block until that Commit
message has made it into the index, thus ensuring the commit occurred and those documents will return from future searches.
Commit
or Optimize
is sent. This means calls to
commit(String...)
, refresh()
or optimize()
may take a while to finish while waiting for messages to
be processed and make it to the index. To guarantee document visibility on queries immediately after a call to
commit(String...)
or refresh()
, users must call waitForCompletion()
to wait for the commit/refresh
acknowledgment from the engine.
If message and commit scope is not critical, then either disable "Ordered Commits" or have a different IngestClient
feeding data than the one managing Commit
and other index messages.
DocumentOutputClient
interface can be essentially "grouped
together" by sending them within the same message group. AIE internals guarantee that no Commit
messages will get into
the index in the middle of a message group. This is vital for bundling operations that should appear in the index in the same
commit scope. Calls to commit(String...)
or refresh()
will throw an exception if called within a message
group. A typical code example follows:
feeder.startMessageGroup(); feeder.feed(doc1); feeder.feed(doc1_extras); feeder.delete(doc1_old_extras); feeder.endMessageGroup();
setDocumentBatchSize(int)
and
setMaxBatchSizeMB(int)
. Document batches will be flushed whenever one of these limits is exceeded. Default
configuration for these settings ensure that large documents will result in smaller batches in order to reduce memory overhead
and small documents will result in larger batches to increase throughput. Certain operations, like commit(String...)
and optimize()
, may cause the batch to be sent before reaching the exact size.
Please contact Attivio before changing the default batching settings.
Commit
or Optimize
messages based on a certain
document frequency, call setCommitInterval(int)
and setOptimizeInterval(int)
, respectively. If triggered
inside a Message Group, the Commit
/Optimize
will not be sent until the end of the Message Group.
IngestClient
is NOT thread safe and thus, each thread must have its own instance.
If trying to coordinate a Commit
across multiple IngestClient
s, have each IngestClient
thread call
waitForCompletion()
to ensure all previously sent documents are in the index, and then use something like a
CountDownLatch
to let the committing thread's IngestClient
know it can call commit(String...)
. After
the committing thread then returns from waitForCompletion()
, all previously fed documents on all threads are known to
be in the index and committed for search retrieval.
// Create IngestClient // IngestClient implements the Closeable interface and should use the try-with-resources idiom try (IngestClient feeder = new IngestClient()) { feeder.setIngestWorkflowName("some_workflow_name"); // Feed documents IngestDocument doc1 = new IngestDocument("doc1"); IngestDocument doc2 = new IngestDocument("doc2"); IngestDocument doc3 = new IngestDocument("doc3"); ... // Populate doc1, doc2, and doc3 feeder.feed(doc1); feeder.feed(doc2, doc3); // feed a batch of 2 documents feeder.commit(); feeder.waitForCompletion(); // wait for commit to complete to see all docs in next search }
Modifier and Type | Method and Description |
---|---|
void |
addStaticField(String name,
Object value)
Add a field to apply to all documents.
|
void |
commit(String... zones)
Sends internal queue of documents and commits all documents sent to AIE.
|
void |
deletePrincipal(AttivioPrincipalKey key)
Delete an AttivioPrincipal from AIE.
|
void |
feed(AttivioPrincipal principal)
Feeds an AttivioPrincipal to AIE.
|
void |
feed(IngestDocument doc,
AttivioAcl acl)
Feeds the document with its ACL to AIE.
|
UUID |
getClientId() |
int |
getCommitInterval() |
int |
getContentStoreThresholdKB()
Gets the minimum size required (in kilobytes) for content to be stored in the content store.
|
int |
getDocumentBatchSize()
Gets current documentBatchSize set.
|
long |
getDocumentsFed()
Return the number of documents fed to this feeder since the connect.
|
long |
getDocumentsQueued() |
long |
getDocumentsSent() |
String |
getIngestWorkflowName() |
int |
getMaxBatchSizeMB()
Gets the maximum size of documents (in megabytes) that will ever be sent as part of a single
DocumentList . |
int |
getOptimizeInterval() |
Map<String,String> |
getSizeToDomain() |
Map<String,Object> |
getStaticFields() |
boolean |
isCompleted()
True is all messages sent have been fully processed by AIE.
|
boolean |
isOrderedCommits() |
void |
optimize()
Sends internal queue of documents and optimizes the AIE index and associated data structures.
|
ContentPointer |
put(String id,
byte[] bytes)
Put the contents of a byte array into the content store.
|
ContentPointer |
put(String id,
File f)
Put the contents of a File into the content store.
|
ContentPointer |
put(String id,
InputStream input)
Put a resource into the Content Store.
|
ContentPointer |
put(String id,
InputStreamBuilder input)
Stores binary data in the ContentStore.
|
void |
refresh()
Sends internal queue of documents and commits all partial update documents sent to AIE.
|
void |
sendIndexMessage(IndexMessage message)
Sends an arbitrary IndexMessage into the engine.
|
void |
sendQueuedDocuments()
Send the documents in the queue to the engine
|
void |
setCommitInterval(int commitInterval)
Sets the interval on which an automatic
Commit will be sent to AIE. |
void |
setContentStoreThresholdKB(int value)
Sets the minimum size required (in kilobytes) for content to be stored in the content store.
|
void |
setDocumentBatchSize(int batchSize)
Sets the maximum number of documents that will ever be sent as a part of a single
DocumentList . |
void |
setIngestWorkflowName(String wf)
Set ingest workflow name
|
void |
setMaxBatchSizeMB(int value)
Sets the maximum size of documents (in megabytes) that will ever be sent as part of a single
DocumentList . |
void |
setOptimizeInterval(int optimizeInterval)
Sets the interval on which an automatic
Optimize will be sent to AIE. |
void |
setOrderedCommits(boolean orderedCommits)
Enable or disable "Ordered Commits".
|
void |
setSizeToDomain(Map<String,String> sizeToDomain)
Sets the size to message domain map for this client.
|
void |
setStaticFields(Map<String,Object> fields)
Set a map of fields to apply to all documents.
|
boolean |
waitForCompletion()
Wait for all MessageResults to be processed.
|
boolean |
waitForCompletion(long timeout)
Wait the specified number of milliseconds for all MessageResults to be processed.
|
bulkUpdate, delete, delete, deleteByQuery, endMessageGroup, feed, feed, isMessageGroupInProgress, startMessageGroup
UUID getClientId()
long getDocumentsFed()
As a general rule:
getDocumentsFed()
- getDocumentsQueued()
will equal getDocumentsSent()
.
long getDocumentsSent()
void feed(IngestDocument doc, AttivioAcl acl) throws AttivioException
getDocumentBatchSize()
> 1.feed
in interface SecurityFeeder
doc
- IngestDocument to feedacl
- the ACL associated with this documentAttivioException
- if sending of documents fails.void feed(AttivioPrincipal principal) throws AttivioException
getDocumentBatchSize()
> 1.feed
in interface SecurityFeeder
principal
- the principalAttivioException
- if sending of documents fails.void deletePrincipal(AttivioPrincipalKey key) throws AttivioException
getDocumentBatchSize()
> 1.deletePrincipal
in interface SecurityFeeder
key
- the principal keyAttivioException
- if sending of documents fails.void sendIndexMessage(IndexMessage message) throws AttivioException
If the feeder is batching documents, this method will send any queued documents before sending the index message.
If isOrderedCommits()
is true and this message is
a Commit
or Optimize
, then this method will first
ensure all previously sent documents are in the index, before sending this index message.
message
- the IndexMessage to send.AttivioException
- if sending of the message fails.void sendQueuedDocuments() throws AttivioException
AttivioException
ContentPointer put(String id, InputStream input) throws AttivioException
If no ContentStore is in configured for this ContentFeeder,
then a ByteArrayContentPointer
is used and returned.
If input
produces less than getContentStoreThresholdKB()
, then
a ByteArrayContentPointer
will be returned.
id
- the ID of the resource to put into the Content Store.input
- the input stream for the resource's data.AttivioException
- if putting the resource into the Content Store fails.ContentPointer put(String id, InputStreamBuilder input) throws AttivioException
id
- the ID of the resource to put into the Content Store.input
- the input stream for the resource's data.AttivioException
put(String, InputStream)
ContentPointer put(String id, File f) throws AttivioException
id
- the ID of the resource to put into the Content Store.f
- file with data.AttivioException
- if putting the resource into the Content Store fails.put(String, InputStream)
ContentPointer put(String id, byte[] bytes) throws AttivioException
id
- the ID of the resource to put into the Content Store.bytes
- data to put.AttivioException
- if putting the resource into the Content Store fails.put(String, InputStream)
boolean waitForCompletion() throws AttivioException
AttivioException
boolean waitForCompletion(long timeout) throws AttivioException
A timeout of
<=0means to wait forever.
timeout
- max time to wait in millisecondsAttivioException
boolean isCompleted() throws AttivioException
AttivioException
void setOrderedCommits(boolean orderedCommits)
commit(String...)
a blocking call while a call to
waitForCompletion()
is called.
Please refer to the documentation on ordered commits the developer network for more information.
orderedCommits
- set to true to enable this or false to disable itboolean isOrderedCommits()
void setDocumentBatchSize(int batchSize)
DocumentList
.
This batch size (specified in number of documents) does not guarantee that all DocumentLists will be exactly this size as
calls to commit(String...)
, optimize()
or DocumentOutputClient.startMessageGroup()
can cause smaller batches to be
sent.
batchSize
- in number of documentsint getDocumentBatchSize()
DocumentList
.int getMaxBatchSizeMB()
DocumentList
.
Document size will be estimated based on field values and the size of any ContentPointer
s included in the document.
void setMaxBatchSizeMB(int value)
DocumentList
.
Document size will be estimated based on field values and the size of any ContentPointer
s included in the document.
This batch size (specified in megabytes) does not guarantee that all DocumentLists will be exactly this size as calls to
commit(String...)
, optimize()
or DocumentOutputClient.startMessageGroup()
can cause smaller batches to be sent. The
specified getDocumentBatchSize()
will also be used to limit the size of DocumentLists.
int getContentStoreThresholdKB()
put(String, InputStream)
void setContentStoreThresholdKB(int value)
put(String, InputStream)
long getDocumentsQueued()
void setCommitInterval(int commitInterval)
Commit
will be sent to AIE.commitInterval
- in number of documentsint getCommitInterval()
void setOptimizeInterval(int optimizeInterval)
Optimize
will be sent to AIE.optimizeInterval
- in number of documentsint getOptimizeInterval()
void setIngestWorkflowName(String wf)
wf
- workflow nameString getIngestWorkflowName()
Map<String,String> getSizeToDomain()
void setSizeToDomain(Map<String,String> sizeToDomain)
sizeToDomain
- void commit(String... zones) throws AttivioException
If isOrderedCommits()
is true (typically the default "Ordered Commits" mode),
then wait for all previously fed documents to make it to the index, before committing, thus ensuring all previously
fed documents get committed. (this may take a while)
zones
- an optional list of zones to commit - if no zones are specified, all zones are committedAttivioException
- on failurevoid refresh() throws AttivioException
If isOrderedCommits()
is true (typically the default "Ordered Commits" mode),
then wait for all previously fed documents to make it to the index, before refreshing, thus ensuring all previously
fed documents get refreshed. (this may take a while)
AttivioException
- on failurevoid optimize() throws AttivioException
If isOrderedCommits()
is true (typically the default "Ordered Commits" mode),
then wait for all previously fed documents to make it to the index, before refreshing, thus ensuring all previously
fed documents get optimized. (this may take a while)
AttivioException
- on failureMap<String,Object> getStaticFields()
void setStaticFields(Map<String,Object> fields)
fields
- the fields and values to setIllegalStateException
- if any documents have already been fed.void addStaticField(String name, Object value)
name
- the name of the fieldvalue
- the value of the fieldIllegalStateException
- if any documents have already been fed.Copyright © 2018 Attivio, Inc. All Rights Reserved.
PATENT NOTICE: Attivio, Inc. Software Related Patents. With respect to the Attivio software product(s) being used, the following patents apply: Querying Joined Data Within A Search Engine Index: United States Patent No.(s): 8,073,840. Ordered Processing of Groups of Messages: U.S. Patent No.(s) 8,495,656. Signal processing approach to sentiment analysis for entities in documents: U.S. Patent No.(s) 8,725,494. Other U.S. and International Patents Pending.