Epubzilla Documentation

"Reading maketh a full man; conference a ready man; and writing an exact man."

Epubzilla is a Python library for extracting data from EPUB documents.

Currently, the only version supported is EPUB 2.0.1. There are grand plans to support EPUB 3.0 in the near future.

This project is released under GPLv3


The source is available on bitbucket

Clone the repository: https://bitbucket.org/odeegan/epubzilla.git

Install using pip: pip install epubzilla

Upgrade to the newest version: pip install -U epubzilla

Getting Help

If you have questions about epubzilla, send an email to odeegan @ gmail . com


  • Python 2.6+
  • lxml version 2.3.5 or later is required


>>> from epubzilla.epubzilla import Epub
>>> epub = Epub.from_file('Manly-DeathValley-images.epub')
>>> epub.author
'Manly, William Lewis'
>>> epub.title
"Death Valley in '49"

Additional Shortcuts

Get the epub cover image:

# {u'href': 'cover.jpg', u'id': 'item26', u'media-type': 'image/jpeg'}

# <type 'str'>

There is no explicit cover element. There is a meta element in the metadata that points to an image in the manifest:

<meta content="item26" name="cover"/>

This indicates that the image with id “item26” is the cover image. Which corresponds to an item element in the manifest:

<item href="cover.jpg" id="item26" media-type="image/jpeg"/>

Which points to an actual file in the EPUB zip file. See above for an overview of the File Layout.

You can skip the package element when navigating the datastructure:

# class <Epub.Manifest>

# class <Epub.Manifest>

It’s a bit of a cheat, but each of the major elements is a subelement of package, so it saves you a few characters.

The following methods for accessing tag attributes are equivalent:

print epub.cover.tag.attributes['id']
# item26

print epub.cover.tag['id']
# item26

EPUB Validation

EPUB tools does not perform validation. It brazenly assumes it’s being fed a valid EPUB 2.0.1 file.

In the meantime, here are two free tools.

Creating EPUBs from scratch

coming soon...