public class RecursiveParserWrapper extends ParserDecorator
After parsing a document, call getMetadata() to retrieve a list of Metadata objects, one for each embedded resource. The first item in the list will contain the Metadata for the outer container file.
 Content can also be extracted and stored in the TIKA_CONTENT field
 of a Metadata object.  Select the type of content to be stored
 at initialization.
 
If a WriteLimitReachedException is encountered, the wrapper will stop processing the current resource, and it will not process any of the child resources for the given resource. However, it will try to parse as much as it can. If a WLRE is reached in the parent document, no child resources will be parsed.
The implementation is based on Jukka's RecursiveMetadataParser and Nick's additions. See: RecursiveMetadataParser.
Note that this wrapper holds all data in memory and is not appropriate for files with content too large to be held in memory.
 Note, too, that this wrapper is not thread safe because it stores state.  
 The client must initialize a new wrapper for each thread, and the client
 is responsible for calling reset() after each parse.
 
The unit tests for this class are in the tika-parsers module.
| Modifier and Type | Class and Description | 
|---|---|
| static class  | RecursiveParserWrapper.WriteLimitReached | 
| Modifier and Type | Field and Description | 
|---|---|
| static Property | EMBEDDED_EXCEPTIONDeprecated. 
 | 
| static Property | EMBEDDED_RESOURCE_LIMIT_REACHED | 
| static Property | EMBEDDED_RESOURCE_PATHDeprecated. 
 | 
| static Property | PARSE_TIME_MILLISDeprecated. 
 | 
| static Property | TIKA_CONTENTDeprecated. 
 | 
| static Property | WRITE_LIMIT_REACHEDDeprecated. 
 | 
| Constructor and Description | 
|---|
| RecursiveParserWrapper(Parser wrappedParser)Initialize the wrapper with  catchEmbeddedExceptionsset
 totrueas default. | 
| RecursiveParserWrapper(Parser wrappedParser,
                      boolean catchEmbeddedExceptions) | 
| RecursiveParserWrapper(Parser wrappedParser,
                      ContentHandlerFactory contentHandlerFactory)Deprecated. 
 | 
| RecursiveParserWrapper(Parser wrappedParser,
                      ContentHandlerFactory contentHandlerFactory,
                      boolean catchEmbeddedExceptions)Deprecated. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
| java.util.List<Metadata> | getMetadata()Deprecated. 
 use a  RecursiveParserWrapperHandlerinstead | 
| java.util.Set<MediaType> | getSupportedTypes(ParseContext context)Delegates the method call to the decorated parser. | 
| void | parse(java.io.InputStream stream,
     org.xml.sax.ContentHandler recursiveParserWrapperHandler,
     Metadata metadata,
     ParseContext context)Acts like a regular parser except it ignores the ContentHandler
 and it automatically sets/overwrites the embedded Parser in the 
 ParseContext object. | 
| void | reset()Deprecated. 
 use a  RecursiveParserWrapperHandlerinstead | 
| void | setMaxEmbeddedResources(int max)Deprecated. 
 set this on a  RecursiveParserWrapperHandler | 
getDecorationName, getWrappedParser, withFallbacks, withoutTypes, withTypesparse@Deprecated public static final Property TIKA_CONTENT
AbstractRecursiveParserWrapperHandler.TIKA_CONTENT@Deprecated public static final Property PARSE_TIME_MILLIS
AbstractRecursiveParserWrapperHandler.PARSE_TIME_MILLIS@Deprecated public static final Property WRITE_LIMIT_REACHED
AbstractRecursiveParserWrapperHandler.EMBEDDED_EXCEPTION@Deprecated public static final Property EMBEDDED_RESOURCE_LIMIT_REACHED
@Deprecated public static final Property EMBEDDED_EXCEPTION
AbstractRecursiveParserWrapperHandler.EMBEDDED_EXCEPTION@Deprecated public static final Property EMBEDDED_RESOURCE_PATH
AbstractRecursiveParserWrapperHandler.EMBEDDED_RESOURCE_PATHpublic RecursiveParserWrapper(Parser wrappedParser)
catchEmbeddedExceptions set
 to true as default.wrappedParser - parser to use for the container documents and the embedded documentspublic RecursiveParserWrapper(Parser wrappedParser, boolean catchEmbeddedExceptions)
wrappedParser - parser to wrapcatchEmbeddedExceptions - whether or not to catch+record embedded exceptions.
                                If set to false, embedded exceptions will be thrown and
                                the rest of the file will not be parsed. The following will not be ignored:
                                  CorruptedFileException, RuntimeException@Deprecated public RecursiveParserWrapper(Parser wrappedParser, ContentHandlerFactory contentHandlerFactory)
RecursiveParserWrapper(Parser)catchEmbeddedExceptions set
 to true as default.wrappedParser - parser to use for the container documents and the embedded documentscontentHandlerFactory - factory to use to generate a new content handler for
                              the container document and each embedded document@Deprecated public RecursiveParserWrapper(Parser wrappedParser, ContentHandlerFactory contentHandlerFactory, boolean catchEmbeddedExceptions)
RecursiveParserWrapper(Parser, boolean)wrappedParser - parser to use for the container documents and the embedded documentscontentHandlerFactory - factory to use to generate a new content handler for
                              the container document and each embedded documentcatchEmbeddedExceptions - whether or not to catch the embedded exceptions.
                                If set to true, the stack traces will be stored in
                                the metadata object with key: EMBEDDED_EXCEPTION.public java.util.Set<MediaType> getSupportedTypes(ParseContext context)
ParserDecoratorsuper.getSupportedTypes()
 to invoke the decorated parser) to implement extra decoration.getSupportedTypes in interface ParsergetSupportedTypes in class ParserDecoratorcontext - parse contextpublic void parse(java.io.InputStream stream,
                  org.xml.sax.ContentHandler recursiveParserWrapperHandler,
                  Metadata metadata,
                  ParseContext context)
           throws java.io.IOException,
                  org.xml.sax.SAXException,
                  TikaException
 To retrieve the results of the parse, use getMetadata().
 
 Make sure to call reset() after each parse.
parse in interface Parserparse in class ParserDecoratorstream - the document stream (input)recursiveParserWrapperHandler - handler for the XHTML SAX events (output)metadata - document metadata (input and output)context - parse contextjava.io.IOException - if the document stream could not be readorg.xml.sax.SAXException - if the SAX events could not be processedTikaException - if the document could not be parsed@Deprecated public java.util.List<Metadata> getMetadata()
RecursiveParserWrapperHandler insteadjava.lang.IllegalStateException - if you've used a RecursiveParserWrapperHandler in your last
 call to parse(InputStream, ContentHandler, Metadata, ParseContext)@Deprecated public void setMaxEmbeddedResources(int max)
RecursiveParserWrapperHandlerEMBEDDED_RESOURCE_LIMIT_REACHED
 property will be added to the container document's Metadata.
 
 If this value is < 0 (the default), the wrapper will store all Metadata.
max - maximum number of embedded resources to store@Deprecated public void reset()
RecursiveParserWrapperHandler insteadjava.lang.IllegalStateException - if you used a RecursiveParserWrapper in your call
 to parse(InputStream, ContentHandler, Metadata, ParseContext)Copyright © 2010 - 2023 Adobe. All Rights Reserved