Texterity, Inc. announced the release of TextCafe, an automated document conversion service that accepts PDF, Quark, and Word files and converts them to structured XML, HTML and Open eBook formats. TextCafe also supports automated creation of Microsoft Reader, Rocket eBook, and Adobe PDF formats. The service is initially targeted to publishers who have large amounts of trade books that need to be moved into structured XML format. The service is faster, less expensive, and more accurate than alternative solutions, which involve manual labor and proofing. TextCafe provides the following key features: detection of text blocks, including paragraphs, which can be reflowed automatically; dynamic creation of a Cascading Style Sheet (CSS) based on normalized styles used in the document; chapter and sub-chapter recognition and tagging based on document type; recognition and removal of running headers and footers in XML output; recombination of words broken between lines, columns, or pages based on language-specific dictionaries; extraction of embedded images with links created in XML output; and automated generation of XML, HTML, Open eBook (OeB), Microsoft Reader (.LIT), Rocket eBook (.RB), and reflowable PDF. TextCafe works by using artificial intelligence algorithms to deduce the hierarchical structure of documents. This is done based only on the visual cues available in the document and a knowledge of document type, such as novel, white paper, or manual. The result is richly-tagged, pure and valid XML that can be repurposed endlessly. TextCafe is available immediately. Pricing is based on document type and conversion volume. Quotes can be obtained by web-based submission at the Texterity web site www.texterity.com
Leave a Reply
You must be logged in to post a comment.