public interface IngestionHistoryApi
The IngestionHistoryApi
is used to provide incremental capabilities to ingestion clients. The API can be used to track
the document IDs last ingested by the API, or signatures such as checksums or message digests. All ingestion history is
associated with a namespace which can be the name of a connector or any other client name associated with a repeating,
non-concurrent client. The operations via the API are not thread-safe within the same namespace.
Events tracked by this API are not guaranteed to have occurred unless fault-tolerant ingestion is activated. This is the case
because generally this API is used by ingestion clients that are marking what they are sending into the system. If a subsequent
system failure occurs and fault tolerance is not activated, the visited keys may not have made it to the index. This situation
requires use of the clear(String)
method to guarantee documents are not inadvertently skipped due to considering them
already indexed.
This API is designed to handle the case where an input document (such as a CSV or Avro file) may contain multiple sub-documents
that constitute the actual output to the Attivio system. In this case, the client can call
childCreated(String, String, String)
in addition to the visit method for
the child. This will associate the child with the parent, allowing this association to be retrieved during subsequent ingests
for proper incremental handling.
$Revision$
Modifier and Type | Method and Description |
---|---|
void |
childCreated(String namespace,
String key,
String childKey)
Marks the
childKey as one created by the record associated with key . |
void |
clear(String namespace)
Removes all historical information associated with the
namespace . |
Iterable<String> |
getChildren(String namespace,
String key)
Returns an Iterable of the children that were marked as created via the
childCreated(String, String, String) method. |
Date |
getPreviousStartTime(String namespace)
Returns the start time for the last session for this
namespace . |
byte[] |
getSignature(String namespace,
String key)
Returns the last signature associated with the key or
null if no signature is present. |
Date |
getStartTime(String namespace)
Returns the start time of the current session for this
namespace . |
Iterable<String> |
getUnvisited(String namespace)
Returns an
Iterable of the keys that have not been visited in the current session. |
Iterable<String> |
getUnvisited(String namespace,
Date since)
Returns an
Iterable of the keys that have not been visited since the time since . |
void |
removeDocumentByKey(String namespace,
String key)
Removes a document using the key
|
Date |
startSession(String namespace)
Starts a new session for the namespace.
|
void |
visit(String namespace,
String key,
byte[] signature)
Records the
key as having been visited updates its associated signature. |
Date startSession(String namespace) throws AttivioException
getStartTime(String)
will return the new session start time.namespace
- a namespace to use (e.g., connector or source name)AttivioException
Date getStartTime(String namespace) throws AttivioException
namespace
. The return value is the same as the value returned
by the last startSession(String)
call for this namespace
. If no session has ever been started, returns
null
.namespace
- a namespace to use (e.g., connector or source name)AttivioException
Date getPreviousStartTime(String namespace) throws AttivioException
namespace
. If no previous session exists, returns null
.namespace
- a namespace to use (e.g., connector or source name)AttivioException
void visit(String namespace, String key, byte[] signature) throws AttivioException
key
as having been visited updates its associated signature. The signature is commonly used to detect
whether an object has changed since the last ingestion (by storing a checksum or message digest of the content). The visit
time is also recorded.namespace
- a namespace to use (e.g., connector or source name)key
- a record key or document ID.signature
- an arbitrary value indicating a signature to associate with the key, may be null
.AttivioException
byte[] getSignature(String namespace, String key) throws AttivioException
null
if no signature is present.namespace
- a namespace to use (e.g., connector or source name)key
- a record key or document ID.AttivioException
Iterable<String> getUnvisited(String namespace) throws AttivioException
Iterable
of the keys that have not been visited in the current session. The unvisited keys are those for
which there has been a visit(String, String, byte[])
call at some point, but not one since the last
startSession(String)
call. This allows a connector to record visits each time it runs and get a list of documents
that are not present on subsequent runs. Since these documents have been removed from the source system, the connector may
decide to remove them from the Attivio system.
A call to remove()
on the returned iterator will remove the visit and signature information from the history for the
associated key. All child associations (see childCreated(String, String, String)
of removed keys are also removed.
namespace
- a namespace to use (e.g., connector or source name)AttivioException
Iterable<String> getUnvisited(String namespace, Date since) throws AttivioException
Returns an Iterable
of the keys that have not been visited since the time since
. The unvisited keys are those
for which there has been a visit(String, String, byte[])
call at some point, but not one since the time
since
. This allows a connector to record visits each time it runs and get a list of documents that are not present on
subsequent runs. Since these documents have been removed from the source system, the connector may decide to remove them from
the Attivio system.
A call to remove()
on the returned iterator will remove the visit and signature information from the history for the
associated key. All child associations (see childCreated(String, String, String)
of removed keys are also removed.
namespace
- a namespace to use (e.g., connector or source name)since
- the mininum visit date to consider a record as visited.AttivioException
void childCreated(String namespace, String key, String childKey) throws AttivioException
childKey
as one created by the record associated with key
. All children associated with key
are returned by the Iterable returned by a subsequent call to getChildren(String, String)
.
The expectation is that the child document has not previously been created. If the signature of the
parent document id has changed remove the children documents and add the ones found in the new version
of the parent document. Adding a child that already existed will have undefined consequences.
This does not support children of children, only supports parent document and it's children. Children of children will have undefined consequences.
namespace
- a namespace to use (e.g., connector or source name)key
- a record key or document ID.childKey
- a record key or document ID.AttivioException
Iterable<String> getChildren(String namespace, String key) throws AttivioException
childCreated(String, String, String)
method.
When using the returned Iterable and removing children then the expectation is that all the children are removed.namespace
- a namespace to use (e.g., connector or source name)key
- a record key or document ID.key
AttivioException
void clear(String namespace) throws AttivioException
namespace
.namespace
- a namespace to use (e.g., connector or source name)AttivioException
void removeDocumentByKey(String namespace, String key) throws AttivioException
namespace
- a namespace to use (e.g., connector or source name)key
- a record key or document ID.AttivioException
Copyright © 2018 Attivio, Inc. All Rights Reserved.
PATENT NOTICE: Attivio, Inc. Software Related Patents. With respect to the Attivio software product(s) being used, the following patents apply: Querying Joined Data Within A Search Engine Index: United States Patent No.(s): 8,073,840. Ordered Processing of Groups of Messages: U.S. Patent No.(s) 8,495,656. Signal processing approach to sentiment analysis for entities in documents: U.S. Patent No.(s) 8,725,494. Other U.S. and International Patents Pending.