Copy text from Kindle e-books (read.amazon.com)

I recently bought an e-book on Amazon. It had been a while since I had last done it, so I was surprised to see they had an online viewer for the book at https://read.amazon.com/. I had wanted this years ago. Most e-book formats are more or less just a zipped HTML file. The fact that I couldn't just open it up on a web browser and view it on any device was a major drawback.

When I was using the online reader, I came across a section I wanted to share. I tried to copy it to my clipboard like I would any other text on my computer but Amazon had this specifically disabled. It seemed kind of absurd considering how open web technologies are.

First, I tried to see if I could just capture the text I needed by intercepting the web calls to Amazon's servers. It turns out the app sends each section as binary data, compressed and encrypted, with JSONP. I was able to find out where in the source this was decrypted and decompressed, but I was unable to associate the text to a position in the e-book.

It dawned on me that rather than digging around in unminified JavaScript, there was an easier way: the text was right in front of me by the mere fact it was already in the DOM. It is much easier to simply scrape text rather than trying to reconstruct the methods that got it there.

The Kindle web app makes use of multiple frames to store and display the text. There's a lot of repeated sections, and it is hard to tell what exactly is being viewed at any given time. However, with the right selectors, it is easy to narrow it down to something manageable.

var appFrame = document.querySelector('#KindleReaderIFrame').contentDocument;
var contentFrames = Array.from(appFrame.querySelectorAll('iframe')).map(f => f.contentDocument);
var text = contentFrames[1].querySelectorAll('body > div')).innerText;

Running this in the developer console will give you all of the text visible on the screen (and some extra).