Index

C D E G H I M N O P R S U X 
All Classes and Interfaces|All Packages|Serialized Form

C

contains(Charset) - Method in class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
 
contains(Charset) - Method in class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
 

D

DataURIScheme - Class in org.apache.tika.parser.html
 
DataURISchemeParseException - Exception in org.apache.tika.parser.html
 
DataURISchemeParseException(String) - Constructor for exception org.apache.tika.parser.html.DataURISchemeParseException
 
DataURISchemeUtil - Class in org.apache.tika.parser.html
Not thread safe.
DataURISchemeUtil() - Constructor for class org.apache.tika.parser.html.DataURISchemeUtil
 
DefaultHtmlMapper - Class in org.apache.tika.parser.html
The default HTML mapping rules in Tika.
DefaultHtmlMapper() - Constructor for class org.apache.tika.parser.html.DefaultHtmlMapper
 
detect(InputStream, Metadata) - Method in class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
 
detect(InputStream, Metadata) - Method in class org.apache.tika.parser.html.HtmlEncodingDetector
 

E

equals(Object) - Method in class org.apache.tika.parser.html.DataURIScheme
 
extract(String) - Method in class org.apache.tika.parser.html.DataURISchemeUtil
Extracts DataURISchemes from free text, as in javascript.

G

getInputStream() - Method in class org.apache.tika.parser.html.DataURIScheme
 
getMarkLimit() - Method in class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
 
getMarkLimit() - Method in class org.apache.tika.parser.html.HtmlEncodingDetector
 
getMediaType() - Method in class org.apache.tika.parser.html.DataURIScheme
 
getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.html.HtmlParser
 

H

hashCode() - Method in class org.apache.tika.parser.html.DataURIScheme
 
HtmlEncodingDetector - Class in org.apache.tika.parser.html
Character encoding detector for determining the character encoding of a HTML document based on the potential charset parameter found in a Content-Type http-equiv meta tag somewhere near the beginning.
HtmlEncodingDetector() - Constructor for class org.apache.tika.parser.html.HtmlEncodingDetector
 
HtmlMapper - Interface in org.apache.tika.parser.html
HTML mapper used to make incoming HTML documents easier to handle by Tika clients.
HtmlParser - Class in org.apache.tika.parser.html
HTML parser.
HtmlParser() - Constructor for class org.apache.tika.parser.html.HtmlParser
 
HtmlParser(EncodingDetector) - Constructor for class org.apache.tika.parser.html.HtmlParser
 

I

IdentityHtmlMapper - Class in org.apache.tika.parser.html
Alternative HTML mapping rules that pass the input HTML as-is without any modifications.
IdentityHtmlMapper() - Constructor for class org.apache.tika.parser.html.IdentityHtmlMapper
 
INSTANCE - Static variable in class org.apache.tika.parser.html.DefaultHtmlMapper
 
INSTANCE - Static variable in class org.apache.tika.parser.html.IdentityHtmlMapper
 
isBase64() - Method in class org.apache.tika.parser.html.DataURIScheme
 
isDiscardElement(String) - Method in class org.apache.tika.parser.html.DefaultHtmlMapper
 
isDiscardElement(String) - Method in interface org.apache.tika.parser.html.HtmlMapper
Checks whether all content within the given HTML element should be discarded instead of including it in the parse output.
isDiscardElement(String) - Method in class org.apache.tika.parser.html.HtmlParser
Deprecated.
Use the HtmlMapper mechanism to customize the HTML mapping. This method will be removed in Tika 1.0.
isDiscardElement(String) - Method in class org.apache.tika.parser.html.IdentityHtmlMapper
 
isExtractScripts() - Method in class org.apache.tika.parser.html.HtmlParser
 

M

mapSafeAttribute(String, String) - Method in class org.apache.tika.parser.html.DefaultHtmlMapper
Normalizes an attribute name.
mapSafeAttribute(String, String) - Method in interface org.apache.tika.parser.html.HtmlMapper
Maps "safe" HTML attribute names to semantic XHTML equivalents.
mapSafeAttribute(String, String) - Method in class org.apache.tika.parser.html.HtmlParser
Deprecated.
Use the HtmlMapper mechanism to customize the HTML mapping. This method will be removed in Tika 1.0.
mapSafeAttribute(String, String) - Method in class org.apache.tika.parser.html.IdentityHtmlMapper
 
mapSafeElement(String) - Method in class org.apache.tika.parser.html.DefaultHtmlMapper
 
mapSafeElement(String) - Method in interface org.apache.tika.parser.html.HtmlMapper
Maps "safe" HTML element names to semantic XHTML equivalents.
mapSafeElement(String) - Method in class org.apache.tika.parser.html.HtmlParser
Deprecated.
Use the HtmlMapper mechanism to customize the HTML mapping. This method will be removed in Tika 1.0.
mapSafeElement(String) - Method in class org.apache.tika.parser.html.IdentityHtmlMapper
 

N

newDecoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
 
newDecoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
 
newEncoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
 
newEncoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
 
NotImplementedException(String) - Constructor for exception org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset.NotImplementedException
 

O

org.apache.tika.parser.html - package org.apache.tika.parser.html
 
org.apache.tika.parser.html.charsetdetector - package org.apache.tika.parser.html.charsetdetector
 
org.apache.tika.parser.html.charsetdetector.charsets - package org.apache.tika.parser.html.charsetdetector.charsets
 

P

parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.html.HtmlParser
 
parse(String) - Method in class org.apache.tika.parser.html.DataURISchemeUtil
 

R

ReplacementCharset - Class in org.apache.tika.parser.html.charsetdetector.charsets
An implementation of the standard "replacement" charset defined by the W3C.
ReplacementCharset() - Constructor for class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
 

S

setExtractScripts(boolean) - Method in class org.apache.tika.parser.html.HtmlParser
Whether or not to extract contents in script entities.
setMarkLimit(int) - Method in class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
How far into the stream to read for charset detection.
setMarkLimit(int) - Method in class org.apache.tika.parser.html.HtmlEncodingDetector
How far into the stream to read for charset detection.
StandardHtmlEncodingDetector - Class in org.apache.tika.parser.html.charsetdetector
An encoding detector that tries to respect the spirit of the HTML spec part 12.2.3 "The input byte stream", or at least the part that is compatible with the implementation of tika.
StandardHtmlEncodingDetector() - Constructor for class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
 

U

UNSPECIFIED_MEDIA_TYPE - Static variable in class org.apache.tika.parser.html.DataURISchemeUtil
 

X

XUserDefinedCharset - Class in org.apache.tika.parser.html.charsetdetector.charsets
 
XUserDefinedCharset() - Constructor for class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
 
XUserDefinedCharset.NotImplementedException - Exception in org.apache.tika.parser.html.charsetdetector.charsets
 
C D E G H I M N O P R S U X 
All Classes and Interfaces|All Packages|Serialized Form