All Classes and Interfaces
Class
Description
Not thread safe.
The default HTML mapping rules in Tika.
Character encoding detector for determining the character encoding of a
HTML document based on the potential charset parameter found in a
Content-Type http-equiv meta tag somewhere near the beginning.
HTML mapper used to make incoming HTML documents easier to handle by
Tika clients.
HTML parser.
Alternative HTML mapping rules that pass the input HTML as-is without any
modifications.
An implementation of the standard "replacement" charset defined by the W3C.
An encoding detector that tries to respect the spirit of the HTML spec
part 12.2.3 "The input byte stream", or at least the part that is compatible with
the implementation of tika.