Why eBook Formatting Matters – A Case Study

This is an image heavy post. I used my iPad3 to illustrate the problems.

Typesetting is a visual art. And, ironically, the best typeset books are ones where the art gets out of the way.

Berkley Books published my 2009 award winning historical Indiscreet. They did a fantastic job on the cover (Paul Marron!!!) and I love my editor. The entire Berkley team did a great job on editing, copy editing and proofreading. Berkley also eventually did an eBook. Since Berkley only has North American rights, I just recently issued an eBook outside the US where I have rights. And that means I have a case study about why eBook formatting matters and where the Berkley eBook falls very, very short. I heard from readers months ago that the eBook was practically unreadable. A while back I purchased the Kindle eBook and discovered it was true.

Some of you may know that I used to be a web developer in a previous job. That means I am very good with hmtl and css and, compared to the average author, quite knowledgeable about the guts of eBook technology. Am I bang up good at it? Not anymore, though I probably will be again shortly as there are lots of good reasons for me to brush up on my skills with respect to eBook formatting.

Here’s a screen cap of page 1 of Chapter 26 of the Berkley eBook for Indiscreet:


This looks OK, at first. But there are two HUGE typographic issues, one of which is better illustrated in the next image. The typographic errors are caused by the underlying html and css by the way, so to be accurate it’s not so much typography errors as coding errors that result in a poor reading experience. But typography gives us language that describes the problems.

In Indiscreet, I made the choice of opening many chapters with text that is descriptive of the chapter contents. It’s meta-text, in a way. Above, you can tell that first paragraph is indented text. You can tell that because there is text below that is not indented.

But this is an iPad screen that’s bigger than a lot of others. You can see more of the text per screen page. But what if you were reading this on an iPhone or other smaller screen? What visual clues exist to tell you that this is meta-text and not the actual writing? The indents are kind of a clue, but the white space issues make those indents less effective as a signifier. Notice the large spaces between paragraphs.

On the iPhone, the problem is even worse. You can only see a portion of any given paragraph.

So, the bad hmtl/css creates too much white space and that lessens the visual clue of indented text. Pages need white space. Too little white space is as serious a problem as too much.

Look at this image and you’ll get a better idea of the problem:

Landscape mode shows what happens with smaller screen real estate. On the left column, the indenting more or less vanishes. There’s no other text to show that this is an indented passage. What if your reading screen consisted ONLY of what you see in the left column?

The meta-text becomes indistinguishable from the actual text. The fact that the actual chapter text begins with CAPS isn’t a sufficient signifier, because, again, it signifies only when you can compare it to something else. A good typographical solution would embed the signification in the text itself. A different font entirely, for example.

Since this is the digital world where color doesn’t cost the publisher money, we should all be thinking about whether color might ever be of assistance. But that’s a digression. I’m just saying that in eBooks, black and white thinking might be limiting us in arriving at elegant solutions. Not that I don’t also recognize the peril of color in the hands of people who don’t understand color theory.

Anyway, if you are trying to read Indiscreet on a smaller screen, you will spend time dropping in and out of the story as the poor typography makes you struggle to figure out what is meta-text and what is regular text. As another aside, my meta-text authorial decision itself pulls the reader slightly out of the story flow and, boy, the bad presentation only exacerbates that problem.

Here’s what I did in my version:

Typography solved this issue centuries ago.

Notice the italics. It’s obvious that the italicized text is meta-text. In fact, this is a standard reason to use italics. With italics, it doesn’t matter if the reader sees only the meta-text. Italics alone tells the reader that they are not in the actual text.

Here’s an interior page from the Berkley eBook that shows why the formatting errors make their version of Indiscreet a chore to read:

Even on the iPad, with its bigger screen real estate, all that white space between paragraphs destroys the reading flow of the story. As you can see, it’s particularly bad when there are short paragraphs.

Here’s my version:

It makes a difference to the reader. A big difference. Page after page of paragraphs sitting in an ocean of white makes reading a chore.

You’ll notice that I don’t fully justify the text. I’m open to persuasion on this issue, but my current position is that when text flows to fit a changing container (landscape vs. portrait, iPad vs iPhone vs. Kindle Fire etc…) full justification will inevitably and unpredictably lead to lines of text that stretch in ways that slow down the eye and therefore the reading because of the insertion of white space. So, I left justify.

I also don’t include those pretty dingbatty things that start out chapters, even though it would be dead easy to do. Why? Mostly because the iPad background is actually not completely white and the dingbats show up on a whiter rectangular background and that bugs me. You can see the issue in the Berkley Chapter 26 image. That pretty sideways spade shows up on a whiter background. And it STILL bugs me. I’ve been mulling over various solutions to the problem, but I haven’t reached the point where no-dingbat vs. time to create perfectly invisible dingbat background has driven me to set aside time for a solution. I have a few ideas.

So. There you go.

I worked very hard to write a book readers would enjoy. And in the case of this eBook, it’s too much of a chore for many readers. They can’t read the story because of the presentation and that makes me sad.

Carolyn Talks Techie

Both P and DIV are block level elements that include padding by default. You must create and apply a style to control that padding if you don’t want that white space to appear.

p {
font-size: 100%;
margin: 0em 0em 0em 0em;
text-indent: 1.5em;
text-align: left;

Will create paragraphs that don’t include space before or after and that indents the first line

<p>Your fantastic text.</p>

Will look more like a book page than

<DIV>Your fantastic text.</DIV>

Which will actually render really badly.

DIV has more padding than P and when you stack a bunch of DIVs you get white space below the first div, then white space above the following div. Double the white space. The default padding for P is smaller than for div.

Therefore, if your program to convert your Word doc or PDF to html is creating DIV tags instead of P tags to contain your paragraphs of text it is breaking the html specification. If it also doesn’t even create a style sheet to fix the problem, your eBooks will suck.

And Now I Call Publishers out

Why are publishers using tools that create CRAP eBooks?
Why haven’t they hired anyone to fix it?
If they are out-sourcing this work and paying for it from the gross income, they’re getting ripped off. So is the author.

C-Level employees of publishers can be found all over the place talking about the ART of publishing and how much they care about the physical beauty of their product and how much they care about the reading experience.



Tags: ,

26 Responses to “Why eBook Formatting Matters – A Case Study”

  1. willaful says:

    I think the paragraphs that bleed together are the absolute worst. Really messes with the rhythm of the book. Thanks for fighting the good fight!

    • I have seen books with problems far worse than the ones I see in Indiscreet. And yes, paragraphs that aren’t properly spaced are a nightmare.

      Although html and css are not hard, what is hard work (currently) is getting a source document (Word or PDF) into text that’s clean enough to appropriately style via html and css. And even so, it’s just not that hard.

      I was just having an interesting twitter discussion about FULL vs. LEFT justification and one conclusion was that it would be awesome if our discussions about eBook formatting were about stylistic choices and not readability issues. At least the justification question has pros and cons either way, as opposed to the issues that are the result of a crappy implementation.

      Thanks for stopping by!

  2. Reina says:

    Wow! Thanks for the visuals. I still don’t really understand the techie stuff (nor did I at the last SFA meeting), but I still appreciate you sharing your knowledge. My Kindle books look fine to me, but I’ve only looked at them on 2 devices…and when I outsourced for PubIt, it looked worse (more white space) than my conversions for KDP. But this makes me want to go back and recheck how everything looks.

    • Reina, there’s no getting around the need to look at the output on as many devices as possible. The problem with the white space issue in your ePub should be addressed with whoever you outsourced it to. They’ve got some html or css setting that’s not quite right. Good luck!

  3. T.K. Marnell says:

    You can easily adjust the default padding and margins on divs in the CSS, but it would be a semantic slap in the face to “techies” everywhere to use them instead of paragraphs. Like using and instead of and or using spans with styles instead of nested headers, the visual result will be the same but it’s meaningless babble. Content should be well structured, and CSS should handle anything to do with styling. And I don’t know how e-readers work for people with disabilities, but if they’re anything like screen readers you’d be slapping blind users in the face, too.

    I suspect publishers are just using the Filtered HTML spit out by Word. If you aren’t using defined styles, the code comes out a horrible mess, and even if you are it does stupid things like put spans around emphases with the specific font family of your “Normal” class, and it leaves in all kinds of characters I twitch to see (em dashes, curly quotes, ellipses…). I got so tired of adjusting it all by hand that I wrote a PHP program with regular expressions to do it for me. Now I just pop it in, clean it up to UTF8 with proper HTML entities, and save it. You’d think the publishing companies with deep pockets could afford to get a CS college student to do the same in half an hour, but they probably think they have higher priorities.

    As for your dingbatty thing, I wonder if a flattened PNG (32 bit) or GIF with a transparent background would work on iPads and other devices. I’ve never tried it myself. I avoid putting images in my ebooks to keep the delivery cost low, but little things like that probably wouldn’t push it up much.

    • T.K. Marnell says:

      Oh, bugger. I didn’t think HTML tags would be accepted. Before the formatting went haywire in that comment, I meant to say it would be “like using <i> and <bold> instead of <em> and <strong>.”

      You see? This is why HTML entities are important!

  4. Hold on…what? Isn’t it elementary that you choose only one paragraph style, the indent or the block, and the block (with the spaces) is typically for non-fiction?

    I like your ragged right, and your ragged right rationale.

    Hey, thanks so much for this post and the code examples. I love those dingbats, too, but not in a block of white. I was thinking, what if you put a little frame around them? But that would still look dumb, and not right for a historical (though it has given me an idea to craft a modern dingbat for my own modern pnr that uses the white.)

    • If only publishers were thinking that far ahead! Courtney Milan is right, they’re converting books and considering themselves done, the resulting mess be damned. I think a frame around a dingbat, done correctly, might look quite nice.

  5. Shelley says:

    I’m actually doing research for a new online e-publisher and I thought that the formatting was already taken care of by most services? What about Smashwords and CreateSpace? Is it only the traditional publishers that struggle with proper formatting?

    • I think EVERYONE struggles with formatting. Vendors want us to believe that a Word doc can be uploaded and cleanly converted but the truth is, it can’t be. The price for that ease of upload is some very bad formatting and bloated file sizes.

      It’s like the bad old days of the web when every browser had proprietary renderings of html and web developers were hacking up fixes for all that. Every eReader device or app has its own set of things that don’t work as they should. Smashwords gets the job done as long as you don’t want pretty or elegant. CreateSpace is Print on Demand, so although there are issues there, too, they’re not applicable the html rendering discussion.

      • Marc Cabot says:

        Late to the party, I know, but: This.

        I just published my first work on Smashwords three days ago. Despite having an absolutely clean .doc there were major formatting issues. I had to do what they call the Nuclear Option, reformat, republish, download the RTF which still had formatting issues, clean that up by basically re-entering the misformatted material in continuity with what was working, save as a new .doc, and reupload. NOW it works.

        Similarly, when I copied the epub generated by Pages to my Kindle and my iPad, it worked perfectly. When I uploaded it to KDP, more formatting issues. To be perfectly honest I’m not sure what I did to fix them, but after fiddling with the epub it appears to be working properly on iPad Kindle and my Kindle 2.

        I am dreading it going live on PubIt and doing it all. Over. Again.

        • Marc:

          I feel your pain. I have someone prepare a SW ready file for me since I don’t write in Word. The SW instructions only work for the PC anyway. I live for the day when SW accepts ePub.

          Pages, unfortunately, does not generate a clean ePub. It will work at Apple, but will break elsewhere. I’m a little worried that you aren’t sure what you did to clean up the ePub. Typically, an error free ePub requires that you know a lot of pesky details. You should put your ePub through a 3rd party validator — if it comes back with no errors you can be fairly sure your file will upload cleanly everywhere. Once you have a validated ePub, you can save with another file name and make any link customizations. Or, use generic (to your website) links and upload the same file everywhere — except SW of course.

          I have discovered that Apple’s Book Proofer app is AMAZING. Oh my gosh, I wish that tool worked for non-i devices. Bascially, you hook your iThings to your computer via USB, open the iBooks app your device and then tell the app to open up your ePub. From there, you can make changes in your ePub and the devices will display your book with the changes. You can instantly see the effect of any css or html changes. If you have a mac, get this app.

          Otherwise, your goal should be to first create a very clean ePub that will pass validation at a 3rd party site. That’s your base file.

          Good luck!

  6. I do use pretty dingbats, and the non-white backgrounds drive me nuts… but the reason I use dingbats is that I’ve been convinced that dingbats are necessary to use for scene breaks–simply because some readers *cough*Kobo*cough*stanza*cough* do not render CSS-embedded scene breaks.

    Which means that the reader literally can’t tell where scene breaks are. I’ve tried a bunch of different things, but you’re either stuck with inserting an image for scene breaks or using some kind of character to signify them.

    Character scene breaks look messy and unprofessional to my eye, and so that leaves images. And for consistencies sake, dingbats on scene breaks without dingbats at chapter breaks looks wrong.

    I’ve been told that transparent image rendering support is coming to eReaders and that will hopefully solve the background whiteness problem on the Kindle/iPad end of things.

  7. Which reminds me: all meta-text in ebooks MUST be implemented using italics at this juncture, precisely because you still have readers like Kobo that strip paragraph-level formatting. There’s no excuse for that on a software level–their software is crap–but the user doesn’t know the software is crap.

    (Try it–install the Kobo reader on your iPad and pull up your ebook.)

    At this point, I try to have things like scenebreaks signified in multiply redundant ways, some of which (initial four-word capitals) are not dependent on proper CSS rendering.

  8. Sorry about the multiple posts.

    All of this leads me to wonder, why aren’t publishers examining their books on multiple platforms?

    Because it’s clear that they aren’t even looking at their books ONCE on ONE platform, but it’s also clear that they aren’t checking to see how the same book looks on Kindle, Kindle for iPad, B&N, nook for iPad, iBooks, Kobo, Kobo for Ipad, stanza… and I do all of these for EVERY book, and every time I’ve skipped a format something has gone wrong with it. But it’s clear that they aren’t even looking at their basic template.

    Re dingbats: I’m wondering if Carolyn Crane doesn’t have the right of it–that the dingbat should be something that effectively has no background to be rendered, thus giving the best possible outcome. Something like this: http://www.courtneymilan.com/themes/general-images/text2985.png

    • Carolyn C’s suggestion was one of the thoughts I’ve been mulling over. I’m VERY fortunate (in a really bizarre way) that I don’t do scene breaks in my writing anymore. I did in the first two books I wrote, but never since, so I’ve only had to decide not to do dingbats at chapter headings. I was wondering if a light, graduated gray background would work, such that would be a background that was part of the dingbat.

      But as I said, no time to play. In the short run, I think I’d just live with the iPad background issue if I ever had the need for a scene break. Although, come to think of it, I would almost surely do the CAPS thing.

      I remember hearing some outcry a while back that Kobo was INSERTING html and css into eBooks that was not needed, invalid, and effectively ruined the readability of the books. At the moment, I’m using SW to get to Kobo, and I’ve given up hope with SW and their formatting.

  9. Shelley says:

    I totally understand what you guys are saying, but then isn’t it also an issue for the technology, more so than the publishers themselves? The publishers don’t have control over how the devices handle the work. Although, when I think about it, that’s also like saying that a printing press technology is out of the hands of the publisher as well. But shouldn’t some of the onus fall on the makers of the technology? And isn’t that why Kindle has become the most prevalent?

    • It’s absolutely also an issue of technology. If a publisher’s website looked like crap in, say, Opera, you can bet they’d have a web team to fix it.

      The point, really, is that publishers, Big 6 or Indie, are operating in the same technology environment and of the two one of them has a far bigger budget for getting it right in whatever the environment is. And the party with more money and resources is not getting it right and, what’s more, they are giving every appearance of not caring about getting it right.

      The issue of vendors, standards and proprietary forks off standards is too big for this blog. However, I can say that Amazon’s market dominance is directly related to the fact that I can FIND a book I want to buy, buy it and be reading it 30 seconds later. And if you have a problem, their customer service is amazing. Readers feel safer buying from Amazon.

      B&N isn’t providing that level of service. Neither was Borders.

      Amazon is all about the customer/reader. If you want to understand their market dominance, ask yourself what publishers are all about. It’s not the readers. Their view of the market got skewed by them privileging big vendors over readers. That’s a market position that works until the vendors start going away (Borders) or ordering even fewer books (Walmart, Costco…)

      • Shelley says:

        I get what you’re saying about the technology, it’s like we need a publisher/technology summit so everyone can figure out how to provide the best reading experience as intended by the author.

        You are so right about Amazon. Our home has three Kindles (and one iPad, which we don’t read on at all) and additionally we buy so many gifts and especially video games that we can’t find when going to every store in the city.

        Thanks for the thought-provoking post!

  10. Oh, I love this new dingbat thinking – both the fully inked box with gray leaves of Courtney’s and Carolyn J’s graduated gray. I do love a little flourish in my books.

    For my dingbat, my new series features magic that comes from 1980s computers and I’m going to use the white for the computer screen, so that it pops and looks intentional, and ink the rest.

    This is such a good discussion. I am also now going to download kobo for my mac. I’ve never looked at my stuff on Kobo – I have had a good Kobo report, but I never thought to press my Kobo informant on the details. Trembles.

  11. handy finden says:

    Hi There Carolynjewel,
    I take your point There’s a question that would never have been raised just a few years ago. A few years ago, eBooks were sold via your own website. And the market just wasn’t that big.
    Great Job!

  12. […] if there’s much room for argument about the seeming ubiquity of production lapses (Carolyn Jewel has an interesting post about the digital formatting of her own books). And still, despite all the complaints, these books […]

  13. Janusz Buda says:

    I wish I could say something smart about the formatting issue. I’ve been hand-coding web pages since the days when Mosaic was the only game in town and can remember the trauma of the browser and standard wars that followed. One day, running from computer lab to computer lab in an attempt to view my pages on as many platforms as possible, I decided enough was enough and stopped caring. I follow W3C standards and refuse to litter my CSS files with browser-specific patches. And yah-boo-sucks to anyone whose browser or PC can’t display my pages properly.

    As a reader, I get furious when I pay $6 or $7 dollars (sometimes much more) and download an ebook set in Arial with straight quotes, double-hyphen dashes, and sometimes blocks of missing text. Many’s the time I’ve converted epub, lit, pdf or mobi to rtf then reformatted the entire text to my own liking. That works because I do almost all of my ebook reading on a 24-inch monitor at home.

    I’ve tried making direct changes using an epub editor such as Sigil, but it’s too much of a hassle and besides, seeing the underlying code makes me want to throw up.

    Forgive the negativity.


    • Lots of people share your frustration! There was a time when coding for all those quirks was a PITA. With the demise of IE 4 through 6, most, but not all, of that has gone away. Most people now do what you did and simply don’t code for the older browser versions.

      Readers are entitled to be angry with paying $6 and more for a book that is badly formatted and trying to clean those up is a chore and rather disheartening. But I hope that will change as publishers bring on the right staff or demand more from the companies they’re hiring. The money publishers spend on formatting vendors come from the books GROSS earnings, not the net on which they calculate royalties. It makes me angry when I see a crappy product on top of that.

  14. […] of an ebook is important. To see an example of what I mean, see Carolyn Jewel’s post, Why eBook Formatting Matters – A Case Study. She includes examples of how her ebook looks on different devices, as well as what a […]