Inside the ePub format: The Basics

Writing a book is certainly not easy, but if you are a passionate writer, you can enjoy the process.

Sadly, there is little joy in formatting it for electronic distribution as an eBook. While many applications offer native save to ePub options, it is rare you can just take what it generates and ship it out to a reseller.

If you do not have a big publishing house behind you and you are self-publishing, you will likely find yourself having to dive into the ePub format to do some cleanup work.

I am writing this blog series to help you better understand the ePub format. In this post, I will share some of the ePub basics, so let’s get started!

What is the ePub format?

EPUB_logo.svgThe International Digital Publishing Forum (IDPF) maintains the specifications for the ePub format. On their ePub site, they describe the ePub format as follows:

 

“EPUB is the distribution and interchange format standard for digital publications and documents based on Web Standards. EPUB defines a means of representing, packaging and encoding structured and semantically enhanced Web content — including XHTML, CSS, SVG, images, and other resources — for distribution in a single-file format.  EPUB allows publishers to produce and send a single digital publication file through distribution and offers consumers interoperability between software/hardware for unencrypted reflowable digital books and other publications.”

The jargon starts pretty quickly here, so let me explain it as follows:

ePub is a group of web page files, images, and other resources that display in an eBook reader. The eBook reader is nothing more than a modified browser that displays web pages as if they look like a book.

That’s right; the eBook reader is nothing more than a web browser that is tailored to display the web pages, images, and other files like they are a book.

What about Amazon’s Kindle format?

Amazon_Kindle_logo.svgAs you may or may not know, Amazon has its file format, and they call it Kindle Format (also referred to as Mobi or PRC). Despite what you may read, this format follows most of the standards the IDPF maintains. As a matter of fact, most ePub files can be converted in just minutes to a functioning Kindle Format file.

As of this writing, the major difference is Kindle does not support all the features IDPF publishes for scripting. There may be some other minor differences but unless you are writing a really advanced interactive book, there are no other differences.

Let’s peek under the covers

In this section, I walk you through the basics of the ePub file.  In future posts, I will walk you through how each of these files works.

Here is an ePub file I created for my eBook. This is just a screen capture from my desktop:

epub_file_icon

Note how the filename is .epub. If I double-click on this icon, it will open in my default eBook reader (in my case it is iBooks).

The ePub file is nothing more than a container. Specifically, it is a zip file. A zip file allows you to package up a whole bunch of files into one single file. To see what is inside this book, I rename the .epub file, so it has a .zip extension, like this:

epub_file_zip_extension

After I extract the zip file into a folder, you can see what the ePub file looks like:

ePub_files

Here is a breakdown of the files you are looking at:

  • book_cover.xhtml: A web page file that contains nothing more than the photo for the cover image of the book.
  • css_toc.css: A css file describes how text and images look when they display in the eBook reader. This particular css file tells the eBook reader how to layout the navigation document (more on that in a minute). For example, it defines whether the title is bold and centered or bright blue and left aligned.
  • css.css: Just as the toc css file defines what the layout and look-and-feel of your entire book.
  • All the FUNDxx.xhtml files: In this case, I wrote my book in Microsoft Word. It is a large book, so I broke each chapter (module) into its individual files. In Microsoft Word, I named each file. For example, my book is titled Microsoft Project 2013 & 2016 Fundamentals so I the title for chapter 1 is FUND01.docx in Microsoft Word. After I convert the Word document to a format ePub understands, it has the extension .xhtml.
  • All the folders with a _files suffix: Many books contain images. My book is a technical book with many screen captures. For every file that references an image, it has a related folder. In this example, you will see a file with the name FUND02_UI.xhtml that file does not contain images; it just references the images in the FUND02_UI_files folder.
  • All the png files: A png file is a bitmap image that displays in the book. For example, my book contains many screen captures, so each one of those items is a png file.
  • FundCover.jpg: A jpg file is another bitmap image format. This particular file is the cover image for my book.
  • META-INF folder: This required folder contains a required file called container.xml. An eBook reader uses an operating system to open the book. The operating system has to understand what is contained within the ePub file, so the container.xml file describes the file structure of the book. It sounds more complicated that it is and is a small little file.
  • mimetype: This is a very tiny text file that contains a single line containing the text application/epub+zip. This line tells the operating system that the ePub file is a zipped ePub file. Obviously 🙂
  • ncx.ncxThe purpose of the ncx file was to provide a navigation structure for your eBook. The newer ePub 3.x specifications no longer require an ncx file. I recommend you include one anyway so your book works well on older eReaders that may not support ePub 3.x.
  • opf.opf: The opf file is what pulls your entire book together. All the files you create for your book, including all the images are all listed in this file.
  • tocnav.xhtml: The navigation document for your eBook. The navigation document is not only to provide a user interface to the reader so they can locate a particular chapter or section, but it also informs the eBook reader as to what files to display first, second, third, and so on. Older eBook readers (pre ePub 3.x), use the ncx file for the navigation, but newer eBook readers use the navigation document.

What I just shared with you are the core components you will see in most books. Your ePub may also contain audio, video, font, and other files to embed into your eBook.

Pro tip: Nearly all the files I share can have any name you want. The only items that must have unique filenames are mimetype, the META-INF folder, and the container.xml file contained within the META-ING folder.

Conclusion

In this article, I set out to show you what an ePub file is and how it works at a very basic level. In future articles, I will dig a little deeper into each of the files I mention in the previous section.

Attribution

2016-10-21T13:42:39+00:00

Leave A Comment