Looks like I can answer my own question from the previous post, i.e., how would iBooks on the iPad handle an ePub file with stupidly small fonts and wide margins?

The answer being "just fine, thanks!".

Nabbed an iPad on the way home from work today. Just the 16GB WiFi model. Primarily for use as a liseuse, though we'll see how other use-cases work out. For my specific needs it so far works well, it's heavier than the Kindle (of course), but because it's also larger and I need to hold it fairly close to my face to see without the fonts completely maxed out holding it in both hands doesn't appear to be much of a problem for me.

Removed DRM from the book I tried on the Kobo last night, synced it over to iBooks, and blammo it just works as I'd want. This will likely encourage me to pick up some books I'd been wanting but had only been able to get in ePub due to wacky regional restrictions.
There's this book. I own a copy, but I want to read it as an ebook because that's more practical given my vision problems. I'm willing to pay to buy it again as an ebook.

It is of course not available.

Well, not legally. I can find a PDF of it via random torrent sites. While the PDF looks like it might be a scan -- it has all the formatting "just so" -- it doesn't seem to be, as it contains text rather than images. One suspects that this was leaked by someone at one of the publishers, or that it the book was once available as a PDF ebook so now this cracked copy is floating around the Internet.

Two strikes against the publisher, then: one for releasing in such a flawed format as PDF, and one for not keeping it available. It was originally published in 2003 so it's not that old, and keeping ebook back-catalogues available is nothing like as difficult or expensive to do as with physical books. Hell, this is a well-regarded non-fiction book, considered by many to be pretty important in its field. We're not even talking random crappy alt-history, like my usual reading material!

(My really wild guess? Probably released during Amazon's previous ill-fated attempt at ebooks, which they did with PDF and canned unceremoniously.)

Why's PDF a flawed format for ebooks? Well... It probably doesn't have to be. Yes, it's an electronic representation of print, and that's less than ideal for display on a medium where pages are irrelevant. But it does contain text and formatting information, and it can also contain hyperlinks. Those can be used by ebook publishers for things like table-of-contents links, links to footnotes, and all the other things they're used for in an ePub book.

No, the problem is that just about every implementation of a PDF reader sucks, particularly those that are on small-screen devices like, oh, ebook readers or mobile phones.

My Kindle can read PDF, so I threw this on there. Some small parts are displayed as text -- which means that the Kindle can reflow those bits, change the text size, that sort of thing. But the bulk is rendered as images. And not because that's what's in the PDF! Calibre has the same problem trying to convert this to a Mobipocket file -- the result is a little bit of text and a lot of images.

My best guess -- given that I don't really have the tools to dig into the PDF format -- is that there are fonts embedded in it, and because the PDF conversion/rendering libraries these tools use don't have access to those fonts for use in a text-based format, they render the page as an image.

Here's where it gets funny:

Open the file in Apple's Preview app on the Mac. Hit meta-A, meta-C. Switch to a text editor, hit meta-V. Save the file. Push it through this tiny bit of Perl:

while (<>) { print $_ . "\n" }

(which I am sure can be done in sed or awk or whatever else you like, but Perl is what I know best)

Now you have a completely readable text file. It's not perfect: there are no links, obviously. Page-ends are still apparent in the text, with the page headers/footers as in the PDF. Flow isn't flawless, as any place where the PDF had a hyphenated line-break now has "wo- rd" instead of "word". These things could be cleaned up, and a lot of it automatically, probably. But it is, essentally, readable at whatever font size one wants to use.

There's probably no good technical reason why an ereading device like the Kindle couldn't do most of this itself with no help from the user. My best guess is that they're using a library from Adobe to do the rendering, though, and Adobe tend not to be keen on such things. If they are instead using the open source tools that Calibre uses, then that's a different story and there's scope for improvement, if only they gave a damn.

None of this is an issue with a real ebook. Those have other problems, and when my Kobo is delivered I'm sure I'll rant a bit about the shortcomings of ePub for the vision-impaired reader.


Abort, Rephrase, Ignore?

October 2011

2 345678


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags