Interface IngestionHistoryApi

  • All Known Subinterfaces:
    IngestionHistoryApiBatchSupport
    All Known Implementing Classes:
    MockUnclusteredIngestionHistory, MockUnclusteredNoopIngestionHistory

    public interface IngestionHistoryApi
    The IngestionHistoryApi is used to provide incremental capabilities to ingestion clients. The API can be used to track the document IDs last ingested by the API, or signatures such as checksums or message digests. All ingestion history is associated with a namespace which can be the name of a connector or any other client name associated with a repeating, non-concurrent client. The operations via the API are not thread-safe within the same namespace.

    Events tracked by this API are not guaranteed to have occurred unless fault-tolerant ingestion is activated. This is the case because generally this API is used by ingestion clients that are marking what they are sending into the system. If a subsequent system failure occurs and fault tolerance is not activated, the visited keys may not have made it to the index. This situation requires use of the clear(String) method to guarantee documents are not inadvertently skipped due to considering them already indexed.

    This API is designed to handle the case where an input document (such as a CSV or Avro file) may contain multiple sub-documents that constitute the actual output to the Attivio system. In this case, the client can call childCreated(String, String, String) in addition to the visit method for the child. This will associate the child with the parent, allowing this association to be retrieved during subsequent ingests for proper incremental handling.

    $Revision$

    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      void childCreated​(java.lang.String namespace, java.lang.String key, java.lang.String childKey)
      Marks the childKey as one created by the record associated with key.
      void clear​(java.lang.String namespace)
      Removes all historical information associated with the namespace.
      java.lang.Iterable<java.lang.String> getChildren​(java.lang.String namespace, java.lang.String key)
      Returns an Iterable of the children that were marked as created via the childCreated(String, String, String) method.
      java.util.Date getPreviousStartTime​(java.lang.String namespace)
      Returns the start time for the last session for this namespace.
      byte[] getSignature​(java.lang.String namespace, java.lang.String key)
      Returns the last signature associated with the key or null if no signature is present.
      java.util.Date getStartTime​(java.lang.String namespace)
      Returns the start time of the current session for this namespace.
      java.lang.Iterable<java.lang.String> getUnvisited​(java.lang.String namespace)
      Returns an Iterable of the keys that have not been visited in the current session.
      java.lang.Iterable<java.lang.String> getUnvisited​(java.lang.String namespace, java.util.Date since)
      Returns an Iterable of the keys that have not been visited since the time since.
      void removeDocumentByKey​(java.lang.String namespace, java.lang.String key)
      Removes a document using the key
      java.util.Date startSession​(java.lang.String namespace)
      Starts a new session for the namespace.
      void visit​(java.lang.String namespace, java.lang.String key, byte[] signature)
      Records the key as having been visited updates its associated signature.
    • Method Detail

      • startSession

        java.util.Date startSession​(java.lang.String namespace)
                             throws AttivioException
        Starts a new session for the namespace. The current time is used to establish the start time for this session. Subsequent calls to getStartTime(String) will return the new session start time.
        Parameters:
        namespace - a namespace to use (e.g., connector or source name)
        Returns:
        the session start time
        Throws:
        AttivioException
      • getStartTime

        java.util.Date getStartTime​(java.lang.String namespace)
                             throws AttivioException
        Returns the start time of the current session for this namespace. The return value is the same as the value returned by the last startSession(String) call for this namespace. If no session has ever been started, returns null.
        Parameters:
        namespace - a namespace to use (e.g., connector or source name)
        Returns:
        the start time for this namespace
        Throws:
        AttivioException
      • getPreviousStartTime

        java.util.Date getPreviousStartTime​(java.lang.String namespace)
                                     throws AttivioException
        Returns the start time for the last session for this namespace. If no previous session exists, returns null.
        Parameters:
        namespace - a namespace to use (e.g., connector or source name)
        Returns:
        previous start time
        Throws:
        AttivioException
      • visit

        void visit​(java.lang.String namespace,
                   java.lang.String key,
                   byte[] signature)
            throws AttivioException
        Records the key as having been visited updates its associated signature. The signature is commonly used to detect whether an object has changed since the last ingestion (by storing a checksum or message digest of the content). The visit time is also recorded.
        Parameters:
        namespace - a namespace to use (e.g., connector or source name)
        key - a record key or document ID.
        signature - an arbitrary value indicating a signature to associate with the key, may be null.
        Throws:
        AttivioException
      • getSignature

        byte[] getSignature​(java.lang.String namespace,
                            java.lang.String key)
                     throws AttivioException
        Returns the last signature associated with the key or null if no signature is present.
        Parameters:
        namespace - a namespace to use (e.g., connector or source name)
        key - a record key or document ID.
        Returns:
        the signature associated with the key
        Throws:
        AttivioException
      • getUnvisited

        java.lang.Iterable<java.lang.String> getUnvisited​(java.lang.String namespace)
                                                   throws AttivioException
        Returns an Iterable of the keys that have not been visited in the current session. The unvisited keys are those for which there has been a visit(String, String, byte[]) call at some point, but not one since the last startSession(String) call. This allows a connector to record visits each time it runs and get a list of documents that are not present on subsequent runs. Since these documents have been removed from the source system, the connector may decide to remove them from the Attivio system.

        A call to remove() on the returned iterator will remove the visit and signature information from the history for the associated key. All child associations (see childCreated(String, String, String) of removed keys are also removed.

        Parameters:
        namespace - a namespace to use (e.g., connector or source name)
        Returns:
        a Iterable of unvisited keys
        Throws:
        AttivioException
      • getUnvisited

        java.lang.Iterable<java.lang.String> getUnvisited​(java.lang.String namespace,
                                                          java.util.Date since)
                                                   throws AttivioException
        Returns an Iterable of the keys that have not been visited since the time since. The unvisited keys are those for which there has been a visit(String, String, byte[]) call at some point, but not one since the time since. This allows a connector to record visits each time it runs and get a list of documents that are not present on subsequent runs. Since these documents have been removed from the source system, the connector may decide to remove them from the Attivio system.

        A call to remove() on the returned iterator will remove the visit and signature information from the history for the associated key. All child associations (see childCreated(String, String, String) of removed keys are also removed.

        Parameters:
        namespace - a namespace to use (e.g., connector or source name)
        since - the mininum visit date to consider a record as visited.
        Returns:
        an Iterable of unvisited keys
        Throws:
        AttivioException
      • childCreated

        void childCreated​(java.lang.String namespace,
                          java.lang.String key,
                          java.lang.String childKey)
                   throws AttivioException
        Marks the childKey as one created by the record associated with key. All children associated with key are returned by the Iterable returned by a subsequent call to getChildren(String, String). The expectation is that the child document has not previously been created. If the signature of the parent document id has changed remove the children documents and add the ones found in the new version of the parent document. Adding a child that already existed will have undefined consequences.

        This does not support children of children, only supports parent document and it's children. Children of children will have undefined consequences.

        Parameters:
        namespace - a namespace to use (e.g., connector or source name)
        key - a record key or document ID.
        childKey - a record key or document ID.
        Throws:
        AttivioException
      • getChildren

        java.lang.Iterable<java.lang.String> getChildren​(java.lang.String namespace,
                                                         java.lang.String key)
                                                  throws AttivioException
        Returns an Iterable of the children that were marked as created via the childCreated(String, String, String) method. When using the returned Iterable and removing children then the expectation is that all the children are removed.
        Parameters:
        namespace - a namespace to use (e.g., connector or source name)
        key - a record key or document ID.
        Returns:
        an Iterable of childKeys for the key
        Throws:
        AttivioException
      • clear

        void clear​(java.lang.String namespace)
            throws AttivioException
        Removes all historical information associated with the namespace.
        Parameters:
        namespace - a namespace to use (e.g., connector or source name)
        Throws:
        AttivioException
      • removeDocumentByKey

        void removeDocumentByKey​(java.lang.String namespace,
                                 java.lang.String key)
                          throws AttivioException
        Removes a document using the key
        Parameters:
        namespace - a namespace to use (e.g., connector or source name)
        key - a record key or document ID.
        Throws:
        AttivioException