0.12.7 September 5, 2022
- fixed an ancient bug in EPUB2 pageno handling. Having two children ids in a pageno-class element no longer generates validation errors for EPUB2 files. Yes, it seems odd that a book would have two page anchors in one page number floating element, but it makes sense if you look at the rendered HTML (#501), and it's no reason to mock people.

0.12.6 September 3, 2022
- bad things happen if there's text at the top level. often this is a result of bad html. pre_parse now takes care of this

0.12.5 September 1, 2022
There remain problems converting HTML 4.0 files.
- fix failed txt build when boilerplate is not found
- fix failed txt build when boilerplate marker appears twice

0.12.4 August 31, 2022
- fixed bug in colgroup wrapping
- Ebookmaker will now NEVER break a page in the middle of a table, a list, or a figure.
- PG boilerplate is inserted in EPUB2 files as well
- Fixed issue with special characters in the boilerplate dividers causing txt builds to fail
- recognize previously marked pg-header, etc., such as from rst

0.12.3 August 24,2022
- fixed a mismatch between the classname given to the cover and the corresponding css
- added CSS page breaks before footer and after header
- BeautifulSoup doesn't convert entities in script or style elements because HTML5 specifies these as CDATA. So our code has to handle cases where there are unexpected entities there.
- remove ALL non-default attributes in replaced img, not just alt

0.12.2 August 23,2022
- xml escape headers extracted from text files

0.12.1 August 21, 2022
- drop alt attribute of img elements replaced with span

0.12.0 August 18, 2022
- start EPUB2 playorders at 1 not 0
- wrap bare col elements in colgroup
- instaed of dropping img elements in noimages epubs, replace them with span tags to preserve link targets.

0.12.0b2 August 7, 2022, very possibly final candidate
- Changes to the cli were needed for ebookconverter integration
    - `--notify` and `--validate` are now flags that turn on validation and notification
    - to prevent any issues with picked jobs being sent via stdin to subprocesses, the newer subprocess api is now used to run validators and mobi generators.
- TxtWriter also creates a target directory if it doesn't exist.

0.12.0b1 August 4, 2022. possibly final candidate
- fix windows exception for unpadded date format
- add target directory creation to epubwriter
- remove gaps in playOrder for EPUB2
- don't count the size of the chunk template for chunking
- remove xml:space attributes - not allowed in EPUB or HTML5
- EpubWriter now creates a target directory if it doesn't exist, as HTMLWriter does

0.12.0b0 July 12, 2022. beta, almost for production.

- update to libgutenberg 0.10.0 - much improved logging when run from ebookconverter
- always set the lang attribute on html element
- added `--validate=(true/false)` to CommonCode so that EbookConverter can set/unset it via CLI. option can turn off validation even when a validator is installed -  needed for rebuild script
- added `--notify=(true/false)` to CommonCode so that EbookConverter can set/unset it via CLI.

0.12.0a1 June 17, 2022. alpha, not for production.

- update to libgutenberg 0.9.3 - much improved logging
- fix boilerplate insertion; only replace boilerplate in the first document
- catch errors for each job in a job queue so that the rest of the queue can execute
- fixed disappearing wrapped images
- add a pyproject.toml file. Seems to get rid of the SetuptoolsDeprecationWarning
- moved code to a src directory so as to keep test code out of distributions and play more nicely with new packaging standards.


0.12.0a0 June 14, 2022. alpha, not for production.

With 0.12, Ebookmaker adds EPUB3 and MOBI(KF8) as output formats.

- This version is being tested and deployed on Python 3.8. We will continue to address any issues with  Python 3.7. We no longer support Python 3.5. We have not yet tested on Python 3.9 but we expect it works without change.
- replaces Tidy with Beautiful Soup. Ebookmaker has used HTML Tidy to make sure that source files produced over the course of ~ 25 years can be parsed into a reasonably modern HTML DOM. With the advent of HTML5, Tidy has begun to show its age, and maintenance of Tidy has not kept up with the times. Bugs in Tidy are not being fixed, and we find we can no longer rely on Tidy. To replace Tidy, we are using  Beautiful Soup, a very popular python package widely used for web scraping.

Tidy did some other things that caused Ebookmaker's HTML5 output poorly suited for PG,
    - it reorganized style attributes into css style elements. While this made the CSS easier to manage, it resulted in less readable source code.
    - it normalized whitespace in block elements. In almost all cases, this had no effect of the HTML display, many PG contributors have used this whitespace to reproduce the printed pages in the source code, making it easy to maintain.

Beautiful Soup, by contrast, only changes the source when absolutely needed to make parsable unicode  HTML. We expect the resulting HTML5 files will be more pleasing for PG contributors. Some code was added to the Ebookmaker HTML parser to reproduce some of the functionality that Tidy provided.
    - Beautiful soup required some minor modification in error catching for missing files
    - Incoming DOCTYPE is ignored
    - Tidy provided some conversion of obsolete elements/attributes into xhtml4 elements with added CSS Rules.
        - `font` elements are replaced with `span`s.
        - `center` elements are replace by `div`s. See note below about the CSS3 elements needed to reproduce the behavior of the `center` elements.
        - when elements not permitted in as `body` content are present as a child of the `body element, they are wrapped in `div` elements
    - A special formatter for Beautiful soup enforces Unicode Normal Form Composed.

- Ebookmaker has been somewhat heavy-handed when removing deprecated elements and attributes. With this version of Ebookmaker, we make more of an effort to preserve the formatting of the source document. This will impact EPUB2, EPUB3 and HTML5 produced files.
    - size attributes in `font` tags are translated to css rather than ignored.
    - list styles are translated to css rather than ignored.
    - size and width attributes on `hr` are translated to css rather than ignored.
    - width attributes on `hr` are translated to css rather than ignored.
    - deprecated align attributes on most elements are translated to css rather than ignored.
    - bgcolor attributes on elements other than body are translated to css rather than ignored.
    - values for the attributes align, frame, and rules are changed to lower case

- a customization has been added for the cssutils module to permit us to add selected CSS properties we want to use (the built-in tables are getting old.) We needed to do this because certain conversions for obsolete elements could not be duplicated without using newer CSS properties. In particular:
    - to reproduce the legacy `center` element, we added `display: flex` and `justify-content: center`.
    - `speak` and `speak-as` css properties have been updated.

- for HTML5, a validation hook has been added. As with EPUB validation, add the path of your command-line HTML5 validator to the .ebookmaker config file and set the --validate flag. Tested with the W3C "Nu" validator - https://validator.github.io/validator/
- for HTML5, move col@valign to css
- for HTML5, change 3 letter language codes to 2 letter codes where available
- for HTML5, fill empty title elements
- for HTML5, improve handling of HTML4 table@frame and table@rules
- for HTML5, `article`, `section`, `header`, and `footer` are now allowed as top-level elements in `body`
- fixed crash in text file analysis when number of lines in paragraph exceeds log(max float). 700-ish
- include opentype fonts in EPUB file (.ttf. .otf, .woff), requires libgutenberg >= 0.8.14. fixes #106
- added an EPUB3 writer. In addition to producing valid EPUB3 files, some changes have been made to the produced EPUB.
    - There is only an "-images" flavor. We continue to produce EPUB2 in images and no-images flavors
    - Many changes in the HTML and CSS that were done for compatibility with e-readers are not done for EPUB3. The changes remain in place for EPUB2. For example: 
        - Floats are not removed.
        - CSS absolute units are not changed.
        - Uncommon characters and ligatures are not simplified.
        - <q> elements are not rewritten.
        - Preformatted sections are not reflowed.
        - data elements are not stripped
        - img class="dropcap" are not changed to spans
        - any of the above that prove to be needed can be added back as needed
        - all html4 -> html5 changes are made, no matter the source.
    - it turns out that producers have long used workarounds to adjust for all the changes in support of limited-capability ereaders. For example, drop-caps in the HTML versions used @media(handheld) and `x-ebookmaker` css rules to remove drop-caps that didn't work in EPUB. Now that we are no longer removing floats and the like, we had hoped to undo most of these accommodations for EPUB3. This proved to be too complex. `@media(handheld)` rules are now replaced by `@media (max-width: 480px)` for EPUB3, and `x-ebookmaker is supplemented by `x-ebookmaker-2` for EPUB2 files and `x-ebookmaker-3` for EPUB3 files. Going forward, producers should try to avoid, as much as possible, using the `x-ebookmaker-3` class and instead use media queries so that customizations will also benefit small-screen users of the html files.
    - For EPUB3, we still need to remove CSS rules that use the position property. Apple iBooks only allows the position property for fixed-layout EPUBs; for reflowable EPUBS, it appears to remove any elements that use `position: absolute`. It looks like absolute positioning is used mostly for page number anchors in the PG corpus, so we are retaining the behavior of hiding page number anchors when they use absolute positioning. Producers who want visible page number anchors should use floating elements.
    - For EPUB3, in our initial testing, we found that setting a default body margin hurt more books than it helped, and we are now using different default CSS sheets for EPUB3 and EPUB2.
    - CSS for the EPUB cover is has been updated to better handle small or oddly sized cover images.
- Ebookmaker breaks HTML source into chunks to improve performance on EPUB readers. For EPUB3 files, the chunker treats `section` elements the same way it treated `div.section` elements for EPUB2. Similarly section elements in HTML5 source are converted to div.section elements for EPUB2. In addition, the maximum chunk size for EPUB3 is 300KB compared to 100KB for EPUB2.
- Ebookmaker now supports attributes in the epub namespace {http://www.idpf.org/2007/ops} These can be entered in source file in two ways:
    - any `data-epub-*` attribute in an html or xhtml source file is moved to the epub namespace for EPUB2, stripped for EPUB2, and preserved as-is for HTML5. This option will allow permit validation with the W3C 'nu' validator.
    - any 'epub:*' attribute in a properly namespaced XHTML file will be preserved for EPUB3, stripped for EPUB2, and converted to a `data-epub-*` attribute in HTML5.
- This version expands support for accessibility attributes.
    - the epub:role attribute (see above for using the epub namespace)
    - HTML5 attributes `role`, `aria-label` and `aria-labelledby` help screen readers interpret HTML. see https://idpf.github.io/epub-guides/epub-aria-authoring/ for guidance about how to use these. Ebookmaker will strip these attributes for EPUB2 files.
    - obsolete values of the `speak` CSS property are now updated to current CSS2/3 equivalents.
    - as discussed above, `speak` and `speak-as` css properties are now included in EPUB, EPUB3 and HTML5 files.
- tibetan (bo) added to list of languages for mobi conversion by calibre
- fixed issue where backlinks required an id set on the original element
- HTML5 `wbr` tags (line break opportunity) are removed for EPUB2
- HTML5 and EPUB3 files no longer duplicate the lang attribute in xml:lang
- Ebookmaker is phasing out the use of Kindlegen, which has been unsupported for a while by Amazon. While kindlegen can still be specified as the converter app in the config file, Calibre is now the default conversion app. the generated EPUB2 file is used as the source for MOBI (version 6) files, while EPUB3 files are used as the source for MOBI (KF8 format) files.
- fixed bug where dangling references were created by `x-ebookmaker-drop`
- for EPUB2, added the required summary attribute on table elements.
- for EPUB2, when an x-ebookmaker-page element is added, a `div` is made instead of an `a` when the element is a direct child of `body`
- for EPUB2 and EPUB3, when an x-ebookmaker-drop element containing an `id` is removed, a `div` is added instead of an `span` when the element was a direct child of `body`.
- for EPUB, fixed bug for irregular heading hierarchies
- work around bug in lxml >= 4.7 causing parse failures for rst conversions
- restored newlines in validation logging to make vaidation issues readable
- for conversions from RST: removed invalid 'classes' attribute
- for conversions from RST: added pg_boilerplate to generated headers
- for conversions from RST: stop printing the encoding as metadata
- for EPUB2 and EPUB3: Ebookmaker no longer makes an invalid reference when 'mailto:' links are present
- for EPUB2 and EPUB3: Adds a MIN_CHUNK_SIZE to avoid empty chunks when `body` begins with a section.
- when HTML or TXT source files are parsed, we attempt to identify Project Gutenberg "Boilerplate". When detected, these sections are wrapped in `section` tags for HTML and `pre` for TXT, with appropriate ids. three types of boilerplate identified are:
    pg_header
        usually a title and license declaration
        sometimes, title, book number, release date, authors, language, encoding, credits
        when detected, metadata will be parsed and enclosed in a pg_metadata_raw sub-section
    pg_footer
        usually the trademark license
    pg_smallprint
        on older books, this will contain license-ish language and other material. it's usually 
        found at the top of the text, and is often comically dated.
- for HTML5 and EPUB3. replace old boilerplate with up-to-date, generated Boilerplate!!!


0.11.30 December 10, 2021
- for EPUB, down-convert HTML5 tags to divs so the files validate as EPUB2. The new div elements will add a class named the same as the html5 tag, so `<section>` becomes `<div class="section">`. Other attributes are preserved. In addition CSS selectors involving these elements will be transformed accordingly: for example `section` becomes `div.section`
    - `section`
    - `figure`  (initial style set to "margin: 1em 40px;",  copying from Firefox internal stylesheet.)
    - `figcaption`
    - `header`
    - `footer`
Users of these HTML5-only tags need to check that their CSS does not conflict with the added classes or changed CSS. In almost all cases, avoiding HTML5 element names for CSS classes will prevent any conflict. Users of HTML5 input may still encounter unresolved issues with other parts of the DP/PG tool chain; please examine output files carefully for unexpected behavior.
- for EPUB, move 'tfoot' elements to before 'tbody' (the order used in HTML4)
- for EPUB, remove any 'meta' elements using the 'property' attribute.
- add 'CRITICAL' notification for 'too-deep' errors
- reset parsers after txt jobs. fixes a bug when the plain text source file is linked from the html.
- EPUBCheck validation was broken. To use EPUBCheck validation, first download and install EPUBCheck from https://www.w3.org/publishing/epubcheck/. If the command to invoke it is  `java -jar /Applications/epubcheck-4.2.6/epubcheck.jar`, then add this line to ~/.ebookmaker or /etc/ebookmaker.conf: `epub_validator: java -jar /Applications/epubcheck-4.2.6/epubcheck.jar` then turn on validation by adding `--validate` to Ebookmaker's command line invocation or by setting validate to true in ~/.ebookmaker


0.11.29 November 30, 2021
- for HTML5, remove Content-Language metas
- when converting a presentational attribute to css in a style attribute, put the added css *before* existing content of the style, so as not to override it. this mimics browser behavior for cases when the two styles conflict. This won't do much good right away because tidy strips the styles into named classes.
- stop adding a viewport meta tag. it turns out this interferes with good HTML5 designs for mobile.


0.11.28 November 24, 2021
- fix #100. the behavior of --output-file has changed. a string passed using this argument is used to name the file where the Gutenberg ID would be. Previously it would be just the name of the output file, no matter the file type, except for Kindle, PDF and TeX. File naming for kindle was broken completely. In the past (version <0.11) --title would override the parsed or looked-up title. Title would be used in the file name if there was no Gutenberg id, or --outputfile.  
- docutils rst conversion introduced a typo in 0.18 resulting in some css problems
- added exception handling in ImageParser for broken images
- don't select cover until it's needed. Ebookmaker has been generating unneeded covers in the txt step because it hasn't parsed an html file.
- for HTML5, fixed a css syntax error in the css added for the table@cols attribute
- for HTML5, make sure lang and xml:lang attributes are in sync; put invalid langs in data-invalid-lang attribute.
- for HTML5, remove height or width attributes that are 0 or empty


0.11.27 November 18, 2021
- one more fix for docutils 0.18+

0.11.26 November 11, 2021
- fixed a problem with covers selected from linked images based on the file name. (The image file would be added twice).
- fixed a problem with linked images being omitted if they also were used as the cover image
- cover images are stripped from the flow because they are re-added to the flow in the coverpages. This behavior can now be over-ridden with the x-ebookmaker-important class (as has been advertised).
- added `--config-dir` command line argument to help guiguts integrate the included tidy config file.
- update docutils to 0.18+
- fixed a problem with noimages files with caused by the parsing for .images jobs. Build order reversed!
- fixed a problem with noimages files due to broken link removal
- changed file naming methods so that Calibre file checking no longer complains.
- 0.11.25 was not deployed due to test failures.


0.11.24 November 8, 2021
- compatibility with docutils 0.18+. docutils' node traversal was changed from a list to an iterator.
- fixed duplicated generated cover bug. This caused errors in epubs generated by Online Ebookmaker or when the source directory is the same as the target directory.
- fix entities in generated CSS. When we generate HTML from txt or rst we must not entify '>'
- fix problem when an html source file links to a text file. Ebookmaker was trying to convert these files to html, and including them in the ebook reading order. Now, linked plain text files are only converted to utf8, nothing more.
- make sure that every img element has an alt attribute
- replace obsolete attributes for HTML5: td@background, td@bordercolor, tr@bordercolor, table@bordercolor, table@height, table@background background (the last one was never a thing!)
- remove blink elements
- add missing dd elements
- make sure that lang attribute == xml:lang attributes, everywhere.
- fix issue with <CR> in metadata when setting title for conversion from text files
- update to libgutenberg 0.8.12 to fix issue with control characters in dc.title meta tags


0.11.23 October 28, 2021
- moves tfoot to end of tables for HTML5
- removes superfluous span attributes in tables for HTML5
- replaces frame and rules attributes in tables with equivalent css for HTML5
- checks all values of the lang and xml:lang attributes for validity, fixes common invalid values for HTML5
- fix thead@align, tfoot@align, thead@valign, tfoot@valign for HTML5

0.11.21 October 23, 2021
- fixed file scanner to not scan parent directory when asked to scan a directory

0.11.20 October 22, 2021
- fixed file scanner used to find covers

0.11.19 October 21, 2021
- tt elements replaced by span with monospace font in HTML5
- newlines properly escaped in meta attributes
- removes meta elements with scheme attribute
- fixed missing coverpages in epub


0.11.18 October 19, 2021
- fixed reversion in TeX conversion

0.11.17 October 18, 2021
- fixed missing subtype in link rel setter


0.11.16 October 15, 2021
- fixed cover setting issues in 0.11.15
- parser now converts unicode to Normal Form C. So "A"+"combining-`" -> "À"
- CSS serializer now omits invalid CSS properties in the derived HTML5. CSS profile used in CSS2, and includes a small number of properties that are marked as errors by the HTML5 validator because it considers them deprecated by CSS3 (for example, the "speak" property, which is replaced by "speak-as") We'll need to move to CSS3 eventually, but for now we need to also target EPUB2 and ereaders that don't do CSS3 yet. The supported CSS Properties are defined by profiles in the cssutils module in python; CSS3 properties are somewhat modular, and there needs to be discussion around which properties we should be using in PG files.


0.11.15 October 14, 2021
Not deployed due to test failures
- fix syntax and position of style element replacing HTML5 deprecated elements
- remove xml:space from HTML5 pre and style elements
- make removal of http-equiv meta elements case-insensitive
- try to remove problematic carriage returns in meta tags
- tr@align, tr@valign, tbody@align, tbody@valign changed to css equivalents
- remove img@longdesc; it never did anything
- add title to files produced from txt
- to allow for better HTML5 validation the preferred mechanism for denoting a cover image has been changed from `<link rel="coverpage" type href="a_relative_url.jpg" />` to `<link rel="icon" href="a_relative_url.jpg" type="image/x-cover" />`. type is optional unless there is more than one link@rel=icon. The issue is that "coverpage" is not an HTML5-registered link relation. The registered "icon" relation is described as "Imports an icon to represent the document." The "coverpage" mechanism will continue to be supported and does not need to be changed, especially for XHMTL source files.
- Ebookmaker now looks for unlinked cover files if there are none linked. Cover file names must contain the string "cover" and must an extension in '.jpg', '.jpeg', '.png', or '.gif'. Cover files must be in the same directory as the source file or one of its subdirectories. At some time in the past, cover files for display on the website were similarly identified by name. Some covers in the backfile were replaced by generated files when Ebookmaker added the capability of generating cover files. With Ebookmaker now identifying cover files, many of the unlinked covers should be restored. When a cover file is supplied, it is still a best practice to use a link element in the html file.
- adds utility functions CommonCode.dir_from_url and CommonCode.find_candidates to refactor directory walking and url to path conversions


0.11.14 October 11, 2021
- fix error when a style element is empty
- fix error when style contains non-ascii text
- fix issue with aux files not being copied to html destination
- use secure version of Pillow


0.11.13 September 30, 2021
- fix reversion in 0.11.10 leaving out added meta elements
- remove encoding meta element originating in HTML5 source from EPUB2 files, as it was causing validation failures.
- remove @media handheld rules for HTML
- move content of table summary attribute to data-summary attribute
- width attribute on table and col is converted to css in a style attribute
- non-integer width or height on img is converted to css in a style attribute
- fixed epub builds for books with images in external css sheets
- the 'big' element is obsolete; changed to <span class="xhtml_big">. css is altered or added as appropriate
- a number of table attributes are obsolete in html5 and changed to the corresponding css styles: table@width, col@width, table@cellpadding, table@cellspacing, table@border, td@align, th@align, td@valign, th@valign
- html5 doesn't allow elision of dd in dl. Where they are missing, we add empty dd
- html5 doesn't permit carriage returns. these are replace by newlines when represented as numeric entities.
- libgutenberg dependency updated to 0.8.11
    - fix issue of polymorphism in dc.languages. Without a db, it's a list of structs; with a db, its a related collection.
    - ebookmaker will no longer ignore xml:lang or DC meta attributes
    - fix windows path comparison - ebookmaker will behave properly when input file is in outputdir
- fixed style element bug in unreleased 0.11.12


0.11.11 September 22, 2021
- remove encoding meta element originating in HTML5 source from EPUB2 files, as it was causing validation failures.
- fix bug when an XHTML source file set an xml:lang attribute


0.11.10 September 20, 2021
- use tidy for _all_ html source files
- addressed long-standing issue where images referenced in css were not included in epubs. This issue surfaced because the images were also missing from generated files.
- addressed some simple issues preventing derived HTML5 files from validating. More complex issues involving incompatibilities between XHTML and HTML5 have been enumerated and will be addressed in subsequent updates.
    - removed http-equiv meta elements for Content-Type and Content-Style-Type
    - set lang attribute when xml:lang attribute is present
    - removed duplicate encoding meta elements introduced by HTML5 source.
    - removed type attribute from style elements
- update requirements so that stand-alone installs will work better


0.11.9 September 3, 2021
- more aggressive session closing

0.11.8 September 2, 2021
- fix crash when source document contains html comments

0.11.7 September 2, 2021
- Using libgutenberg 0.8.7, which includes the type of meta tags used in HTML5.
- Ebookmaker was not saving the derived HTML files if the main source file was in the output directory. This prevented online Ebookmaker to from displaying the files. Now, Ebookmaker will put derived files in an "out" directory. This turned out to require some code restructuring.
- The pseudo-xhtml files produced by 0.11.5 were cause problems with browser compatibility, most noticeably by doubling break elements. It turns out that the quirky output from lxml was caused by xml namespacing of elements. when xml namespaces were removed, the html output method worked as desired, resulting in files that in many cases validated as html5. This solved a number of problems for us, and puts us in a position to start remediating problem files in the backfile in preparation for EPUB3.
- The encoding for all python source files was changed to UTF8. A mis-encoded python file caused a problem with mdashes in titles.
- Sessions are now closed after every set of jobs. Ebookconverter was running out of Databse connections.
- Some superfluous logging was removed.
- There is documentation for the changes introduced in version 0.11

0.11.5 August 28, 2021
- fix bug in stand-alone kindle generation

0.11.4 August 26, 2021
- one more change. can't use xml write mode for html, because Chrome and Safari no longer support self-closing tags.

0.11.3 August 18. 2021
Adds notification support
- Add queue_notifications method in CommonCode, usable by both EbookConverter and EbookMaker
- configure notifications for missing file problems
- remove parsers for missing files - bug exposed by html generation
- fixed regression in WrapperParser
- add coverage by test file 
- enhanced log formatting
- refactored log setup for use with 
- started using CRITICAL logs to trigger notifications

0.11.2 August 4. 2021
Bugfixes for stand-alone use. We should not have released 0.11.1  on pypi.

- Ebookmaker 0.11.1 did not work without psycopg2 and the PG database. Neither did libgutenberg 0.7.2. This version, with libgutenberg 0.8.1, works without the PG database or psycopg2.
- uses libcountry for language name lists.
- uses old dc object if database not present


0.11.1 July 19. 2021
Fixes for Ebookconverter compatibility. 0.11.0 was never deployed.

- stop using old dc object, no use only ORM for db access
- since the ORM dc object contains a session, it can't be pickled. But EbookConverter sends a pickled job queue to EbookConverter to process, presumably to enable processing on multiple servers. So job queues no longer can contain dc objects. EbookMaker now gets a new (ORM) dc object for every job. 
- when making txt output, EBM was relying on not also generating html except for rst. so now we check directly for txt source when creating.
- assorted delinting



0.11.0 June 30. 2021
Ebookmaker version 0.11 makes enhanced HTML files for all types of input, including HTML source files. Here are the improvements and other changes made to HTML source:

- all HTML files are cleaned by HTML Tidy. Tidy does the following:
    - converts all HTML to well-formed UTF8-encoded XHTML files. This will allow the PG server to add encoding to MIME headers, improving browser compatibility and accessibility.
    - LF is used as the newline character for all files (unix standard)
    - html entities such as "`&rsquo;`" `&Aacute;` etc. are converted to unicode characters
    - correct badly formed HTML, improving browser compatibility and standards conformance.
    - Because the files are now guaranteed to be well-formed, DOM manipulation can be done reliably by browser plugins, mobile apps, proxy servers, accessibility tools and PG's own file processors.
    - inline style attributes are moved to a generated inline stylesheet for better rendering performance.
    - a doctype declaration for XHTML+RDFa 1.1 is used for all files to allow validation with included RDFa metadata.
    - tags are now uniformly lower case
    - some legacy presentational tags (`<i>,` `<b>`, `<center>` when enclosed within appropriate inline tags, and <font>) are replaced with CSS <style> tags and structural markup as appropriate.
    - empty paragraphs are discarded.
    - any text in the body element is wrapped in a `<p>` element.
- added RDFa data, Dublin Core, and schema.org metadata to head element of HTML for better SEO and facebook unfurls. Changes in the metadata are now reflected in the HTML presentation

Some incidental changes were necessary to make this possible:
- Because the generated html is moved to a new directory, linked files also needed to be moved.
- Because the generated file has a different name, back-links needed to be changed

It is possible that rendering of the HTML is changed by this additional processing; however, the changed rendering would be aligned aligned with what has long existed in PG EPUB files. 

Note that the unprocessed source files will continue to be available without URL change on the PG web site.

- Don't stop generating html with first html file.
- Don't generate wrapper files when spidering to generate html
- Move @media handling to EpubWriter, not in parser.
- Also copy css and images to target directory
- Don't rewrite urls on output; they're already relative
- Let Spider follow "nofollow" links; instead have EpubWriter remove the nofollow links and corresponding files
- added USAGE.md to provide better documentation for html authors preparing files for Ebookmaker
- removed data-* attributes for epub because these attributes are not allowed in EPUB 2.0.1 and files were thus failing EpubCheck
- add RDFa data and schema.org metadata to head element of generated HTML for better SEO and facebook unfurls
- now using the doctype declaration for XHTML+RDFa 1.1 for generated HTML from libgutenberg >= 0.7.1
- added a tidy config to eliminate dependence on system configured tidy and to turn off drop-empty-elements, an option not available at the command line. Dropping empty spans/divs was having unexpected effects on css rendering; easily worked around, but confusing for producers.

Boilerplate generation will follow in v0.12


0.10.4 April 6, 2021
- add a minimal css stylesheet to the html generated for txt files
- delint

0.10.3 February 25, 2021
- added rendering for <q>: ebookmaker will now change all <q> tags to <span> for epub builds, keeping any attributes on the tags. curly quotes will be added inside the spans, double for top level q and single for all q nest in other qs.

0.10.2 January 18, 2021
- corrected text in PG footer for RST - thanks Roger Frank. Note that boilerplate generation is being revised for v0.11
- when reflowing pre, don't make it one long line. This was causing problems in the Kobo reader.
- don't drop a heading that starts with "by " if class 'x-ebookmaker-important' is on it.
- also log headings that are dropped because they start with "by "
- fix bug in anchor fixing where if an <a> tag had both id and name attributes, both were deleted.
- delinting

0.10.1 November 25, 2020
- fixed minor issue where "too deep" errors were emitted for self-links. Thanks to rfrank for the error report.
- fixed deprecation warning from Docutils; should be ok for Docutils > 0.1 

0.10.0 November 2, 2020
- SVG files are now considered images and included in EPUBs. They were being discarded. SVG files are not scaled or compressed by ebookmaker - the renderer should be able to auto-scale. This appears to fix kindlegen failures associated with svg images.
- fixed the rst test


0.9.7 September 14, 2020
- changed font for rst conversion. Linux Libertine was unmaintained since 2012 and no package was available for CENTOS8. We switched to the closest replacement, Libertinus https://github.com/alerque/libertinus
- added documentation about configuration for rst conversion
- the deprecated 'handheld' @media query was being used to prevent ebookmaker from stripping floats. to preserve this feature, ebookmaker no longer strips floats when the css selector contains the x-ebookmaker class. Most likely, float stripping was originally needed because html pages were designed before the advent of EPUB. Today, we can assume that if the html designer uses the x-ebookmaker class, they've considered the impact of the float on the generated EPUB.
- assorted delinting

0.9.6 September 8, 2020
- added 'x-ebookmaker' class to epub body elements. There are now 4 "x-ebookmaker" classes
    - css can now apply styles that are triggered by being a descendent of .x-ebookmaker. This addition is meant to replace the 'handheld' @media query that is deprecated in HTML 5
    - the 'x-ebookmaker-important' class on on image element tells ebookmaker not to remove the image, even in no-images builds.
    - the 'x-ebookmaker-drop' class tells ebookmaker to remove an element and its descendents from ebook builds.
    - the 'x-ebookmaker-pageno' class is applied to some span elements whose content has been stripped because they use a class that indicates they represent page numbers: pagenum pageno page pb folionum foliono
- added mayan as a language not supported by kindlegen
- typos fixed in README - thanks Joseph Koshy

0.9.5 July 6, 2020
- fixed minor issue where the spider was getting confused when iterating on WrapperParsers.

0.9.4 June 30, 2020
- handle invalid quantization table in jpeg files when using quality 'keep'
- respect rel="nofollow" attribute: this allows authors to link to an alternate version file in html without duplicating content in the EPUB file.
- set wrappers to nonlinear in spine.
- fixed bugs and ugliness in toc generation
  - when the same header level is consecutive, only one toc item is generated. (We see this used to make multiline heads or titles)
  - toc normalization made a hash of the toc. Now the toc is normalized in the epubwriter, not in the parser.
  - add display:block to standard css sheet to prevent hidden headings from breaking kindlegen
- added a configuration option to use calibre (or whatever!) for non-supported in kindleget languages.

0.9.3 June 23, 2020
- Fix reversion in 0.9.2 which caused CoverWriter in EbookConverter to fail. (It uses ImageParser to convert images from PNG to JPEG
- add a test to check this

0.9.2 June 19, 2020
- note that EbookMaker is no longer installable in Python 2.7 (thanks cpeel)
- clean up pipfile and gitignore pipfile.lock (thanks cpeel)
- fixed bug where filepaths need escape in HTML
- fixed issue where compression was expanding compressed jpegs (thanks choward)
- adde EBOK flag in MOBI when using ebook-convert (thanks rfrank)

0.9.1 June 11, 2020
Minor bug fix and optimization.
- picsdir builds weren't recognizing when it was copying files to themselves
- "broken" images are now inserted when an image is missing
- title attribute in wrappers needed escaping
- build times are now reported to logs
- small optimization with preparse on image parsers

0.9.0 June 2, 2020
Image handling has changed starting with 0.9
- linked images (image files as targets of links) are now wrapped in html, fixing display in ADE.
- linked images are compressed to 1MB if possible (changed from 128K)
- inline images are compressed to 256K if possible (changed from 128K)
- all images are limited to 5000x5000 pixels (was 800 x 1280)
- PNG images are scaled to meet the image filesize targets (previously no scaling of PNG images)
- L format JPEG images (greyscale) are no longer converter to RBG (thanks rfrank)
- generated covers are now 1200 x 1800
- covers and "important" images in noimages builds are scaled to 64K (previously no scaling)

General bug fixes
- eliminated double parsing due to first pass using raw paths
- when tidy fails, the (huge) error trace is only logged once.
- generated html is no longer overwritten by empty results


0.8.12 May 5, 2020
- corrected the exception to catch for missing files

0.8.11 May 4, 2020
- It turns out that ebookmaker gets called both with bare paths and file: urls depending on ebookconverter config files. So 0.8.10 broke on the production machine, though it seemed just fine on mac and windows. So we went back to the drawing board to figure out how to support posix and windows, with or without windows mount points (not sure if that's the right term), file:/// urls and bare paths. We also figured out some issues involving spaces in paths.

0.8.10 April 30, 2020
- fixed numerous file path nits on Windows
    - kindlegen
    - figsdir
    - cover
    - pdf
- PEP8 delinted:
    - EbookMaker.py
    - ParserFactory.py
    - Spider.py
    - parsers/__init__.py
    - writers/EpubWriter.py

0.8.9 April 13, 2020
- fixed issues preventing successful deployments on Windows
- added logging for books deeper than max_depth
- improved documentation for tidy and cairo prerequisites
- try to catch  and report exceptions when tidy and cairo aren't installed
- added install-on-windows notes

0.8.8 March 6, 2020
- fixed issue preventing chunking when text file is latin-1
- fixed failure when source links to a directory
- improved parse error message

0.8.7 February 10, 2020
- fixed issue causing failed build when file encoding doesn't match plateform default
- fixed issue where cover set on command line is excluded in build.

0.8.6 February 6, 2020
- fixed issue where covers aren't set when source is txt

0.8.5 January 27, 2020
- set "huge_tree" attribute for the xml parser to keep big files from blocking a build

0.8.4 January 23, 2020
- fixed bugs in rst->tex conversion
    - class = 'medium' extra '}'
    - removed '%' in noindent

0.8.3 January 21, 2020
- fixed bug exposed by setting cover with a command line argument

0.8.2 January 16, 2020
- refined embedded media error message to include referrer
- added external link warning

0.8.1 January 15, 2020
- Ebookmaker has been downloading and including in epubs embedded media from arbitrary websites, for example, images. This has caused build errors when a remote site goes away. ebookmaker has command line parameters that allow include and exclude urls that have governed document files; these same rules now apply to media files.

  In general, PG books should not embed media from other sites.

0.8.0 January 10, 2020
- support ebook-convert tool from Calibre as alternative to kindlegen

    To use calibre instead of kindlegen

    1. install calibre
    2. change MOBIGEN setting to 'ebook-convert' or the path to ebook-convert

0.7.10 January 9, 2020

- build failures on our production system suggest that the parser's output for empty style elements may be os-dependent. Parser now handles style elements with null text.
- allow the build to succeed even if a css file is missing
- allow px only in border properties. see discussion below
- fix false encoding error (complaining about "Klingon") when content file is missing

Ebookmaker has been removing any css rules that use 'px' measurements. 'px' measurements are discouraged in css styles for epub because of poor scaling when user changes font size. However fixed lengths are still useful in properties such as table borders. see discussion on DP forum:
https://www.pgdp.net/phpBB3/viewtopic.php?f=3&t=41237&p=1188773&hilit=windymilla#p1188746

0.7.9 January 6, 2020

- Ebookmaker was failing to make epubs if a linked file was missing. Now it emits an Error message to the log with url of missing file, but goes ahead and makes an epub without it.

0.7.8 January 4, 2020

- updated libgutenberg to 0.5.1
- added warning when mediatype of a linked file cannot be determined

0.7.7 December 19, 2019

- fixed bugs in ParserFactory that raised exceptions for parsers built from package-supplied
  resources
- fixed bug that raised exceptions when a cover was rejected for being too small

0.7.6 December 2, 2019
- disabled html to txt conversion because no conversion was being done
- changed a truth eval of an etree element to silence deprecation warnings
- corrected documentation of config behavior
- better install documentation

0.7.5 October 22, 2019

- deconflicted an overloaded "parsers" variable
- fix parse failure when an htm file links to valid xml (not xhtml)
- updated libgutenberg to correctly handle music files.

0.7.4 October 21, 2019

- fixed bug where a cover is always generated during conversion to kindle
- epub generated files are now only saved to log for -v -v -v

0.7.3 October 9, 2019

- fixed bug where if generated cover can't be written, ebookfile is not made
- when outputdir is specified, cover is written to outputdir
- updated libgutenberg to support 'aut' in marcrel covers 
- unloaded outputdir from job queue, as it had been passing in options

0.7.1 September 30, 2019

- fixed bug for pickled jobs; dc was previously passed silently via builtins in options 
- fixed bug in cover generator when url is a file: url 

0.7.0 September 20, 2019

- config files can now be used to provide defaults for most command line arguments.

0.6.4 September 16, 2019

- added updates so that gutenberg canonical url is always https
- updated libgutenberg requirement

0.6.3 September 12, 2019

- fixed parsing of rst files in MyDocUtils
- fixed undefined options in TxtWriter and PdfWriter

0.6.2 May 10, 2019

- updated libgutenberg requirement. itd DublinCore class depended on the installation of '_' that was removed in ebookmaker 0.4.1a1

0.6.1 May 9, 2019

- added strip_links command-line option. turn it off to stop EbookMaker from stripping links in EPUB and MOBI output
- the test_htm not longer tries conversion to pdf or rst
- fixed conversion failure when html contains '<pre><code /></pre>'
- corrected install directions
- readme is now markdown
- fix python 3.6.7 compatibility issue when dbm_gnu not present
- added tests for loading parsers, loaders and packagers

0.6.0 February 14, 2019

- command line additions
1. set cover url
2. generate cover flag

- cover should be first in reading order
- covers can be png
- added tests
- added travis
- fixed some import issues exposed by testing
- expected conversion skipping is a warning, not an error
- fixed issue when there's no stylesheet -- thanks to ray
- stop putting silly tag attribute into html


0.5.0 January 10, 2019

Moved Borg options class to libgutenberg
distutils --> setuptools


0.4.1a1

don't be using builtins as a backdoor global
updates for distutils 0.14 compatibility

0.4.0a2

Fix legalese in PG boilerplate.

0.4.0a1

Port to Python3.
Lots of refactoring.
Code cleanup as suggested by 2to3 and pylint.
Package renamed to ebookmaker.

0.3.20

Do not make special kindlegen epub anymore. Requires kindlegen 2.7+.
Better coverpage handling.
Works with docutils 0.11+.

0.3.19

0.3.19b6

Floats now support 'here'.

0.3.19b5

Fix typo in license text.
Fix "strip_links" debug message crash.
Extend styles directive.
- Add display option to hide the element.
- Allow for negative matches.
Don't use \marginpar for page numbers in TeX.

0.3.19b4

Style directive extended.
Now preserves all trailing whitespace except U+0020.
Added "table de matières" to auto toc detection.
Convert U+2015 to single hyphen in plain text.

0.3.19b3

Fix keyerror hrules and vrules.
Fix unescaped characters in html meta attribute values.
Fix default block image alignment.
Fix use numeric entities in xhtml writer.

0.3.19b2

Fixed text-indent in page nos (made pagenos disapper in line blocks).
Fixed whitespace collapsing in <pre> nodes.
Fixed: honors newlines in metadata fields.
Internal fix: correct format name is: "txt.utf-8".
Can use docinfo in addition to meta directive.

0.3.19b1

New formats: html.noimages and pdf.noimages.
No-image builds use a placeholder 'broken' image instead of nothing.
Figure directives without a filename create a placeholder 'broken' image.
New option :selector: in lof and lot directives for filtering.
Turn off italics with class no-italics (and bold with no-bold).
nbsp now works in ascii txt, soft hyphens now removed from ascii txt.
Insert line numbers with [ln 42] and [ln!42].
Works with kindlegen 2.0.

0.3.18

Allow unicode line separator U+2028 as line feed.
Fix XetexWriter bug with tables without explicit width.
Add language support in XetexWriter.
Works with docutils 0.8
Support docutils-0.8-style :class: language-<code>.

0.3.17

Fix line height of large text.
Fix images with spaces in src attribute.

0.3.16

Add image_dir to Xetex writer.
Use quotation environment instead of quote.
Don't automatically insert \frontmatter.
Page nos. for kindlegen 1.2.
Call kindlegen.
Integrate changes into PG environment.

0.3.15

Reduce vertical margin of images to 1 in TXT.
Fixed link targets in NROFF, PDF.
Report error on xetex errors.
Escape characters in PDF info.

0.3.14

Fixed crash on HTML comments in Kindle writer.

0.3.13

Start on Kindle writer.
Fix spurious space in PDF literal blocks with classes.
Fix `flat´ TOC.
Thin spaces between quotes made optional.

0.3.12

Add more front- and backmatter classes.
Insert thin space between quotes.
Generated List of Tables.
Generated List of Figures.
Emit warning instead of error on groff warnings.
Fix crash when last cell in row spans rows.
Add option vertical-aligns for tables.
Default width of image calculated assuming 980px window.
Fix docutils indentation bug in poetry.

0.3.11

Add option widths to tables.
Add option aligns to tables.
Add class norules for tables.
Generate typographically correct tables.
Don't overwrite images if src dir == working dir.

0.3.10

Bug fixes.

0.3.9

A different fix for figure and image centering on ADE.
  (Calculate explicit left margin).
More work on PDF (Xetex) writer.
Added directives for pagination control.

0.3.8

Fix empty poetry lines on ADE.
Fix figure and image centering on ADE.
Fix thoughtbreak centering on ADE.
For push, zip RST into subdir with images.
Start implementing PDF (Xetex) writer.

0.3.7

Integrate changes into PG environment.
Fix more CR/LF issues on windows.
Fix cover image format conversion.
Zips a pushable file for the WWers.

0.3.6

Code cleanup.
Different CSS templates for RST -> HTML and RST -> EPUB.

0.3.5

Zips files up for PG.

0.3.4

Tell Tidy not to merge divs and spans.
More fixes to plain text encoding.

0.3.3

Implemented coverpages for Adobe ADE.
CSS changes because Adobe ADE chokes on !important.
RST dropcap directive: don't use image in EPUB.

0.3.2

Packaging changes.
