public class RecursiveParserWrapperHandler extends AbstractRecursiveParserWrapperHandler
AbstractRecursiveParserWrapperHandler.
See its documentation for more details.
This caches the a metadata object for each embedded file and for the container file.
It places the extracted content in the metadata object, with this key: AbstractRecursiveParserWrapperHandler.TIKA_CONTENT
If memory is a concern, subclass AbstractRecursiveParserWrapperHandler to handle each
embedded document.
NOTE: This handler must only be used with the RecursiveParserWrapper
CONTAINER_EXCEPTION, EMBEDDED_DEPTH, EMBEDDED_EXCEPTION, EMBEDDED_RESOURCE_LIMIT_REACHED, EMBEDDED_RESOURCE_PATH, PARSE_TIME_MILLIS, TIKA_CONTENT, TIKA_CONTENT_HANDLER, WRITE_LIMIT_REACHED| Constructor and Description |
|---|
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
Create a handler with no limit on the number of embedded resources
|
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory,
int maxEmbeddedResources)
Create a handler that limits the number of embedded resources that will be
parsed
|
RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory,
int maxEmbeddedResources,
int maxWriteLimit,
org.apache.tika.metadata.filter.MetadataFilter metadataFilter) |
| Modifier and Type | Method and Description |
|---|---|
void |
endDocument(org.xml.sax.ContentHandler contentHandler,
Metadata metadata)
This is called after the full parse has completed.
|
void |
endEmbeddedDocument(org.xml.sax.ContentHandler contentHandler,
Metadata metadata)
This is called after parsing an embedded document.
|
java.util.List<Metadata> |
getMetadataList() |
void |
startEmbeddedDocument(org.xml.sax.ContentHandler contentHandler,
Metadata metadata)
This is called before parsing an embedded document
|
getContentHandlerFactory, getNewContentHandler, getNewContentHandler, getTotalWriteLimit, hasHitMaximumEmbeddedResourcescharacters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warningpublic RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
public RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources)
maxEmbeddedResources - number of embedded resources that will be parsedpublic RecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources, int maxWriteLimit, org.apache.tika.metadata.filter.MetadataFilter metadataFilter)
public void startEmbeddedDocument(org.xml.sax.ContentHandler contentHandler,
Metadata metadata)
throws org.xml.sax.SAXException
startEmbeddedDocument in class AbstractRecursiveParserWrapperHandlercontentHandler - - local content handler to use on the embedded documentmetadata - metadata to use for the embedded documentorg.xml.sax.SAXExceptionpublic void endEmbeddedDocument(org.xml.sax.ContentHandler contentHandler,
Metadata metadata)
throws org.xml.sax.SAXException
endEmbeddedDocument in class AbstractRecursiveParserWrapperHandlercontentHandler - local contenthandler used on the embedded documentmetadata - metadata from the embedded documentorg.xml.sax.SAXExceptionpublic void endDocument(org.xml.sax.ContentHandler contentHandler,
Metadata metadata)
throws org.xml.sax.SAXException
AbstractRecursiveParserWrapperHandlersuper.endDocument(...)
in subclasses because this adds whether or not the embedded resource
maximum has been hit to the metadata.endDocument in class AbstractRecursiveParserWrapperHandlercontentHandler - content handler used on the main documentmetadata - metadata from the main documentorg.xml.sax.SAXExceptionpublic java.util.List<Metadata> getMetadataList()
Copyright © 2010 - 2023 Adobe. All Rights Reserved