Algorithm which can predict with 84 per cent accuracy whether a book will be a commercial success

In other news, a study of battles shows that most winning sides used swords. 

My mate Harry Connolly posted a link to “an article on an algorithm which can predict with 84 per cent accuracy whether a book will be a commercial success” and wondered jokingly whether it was a troll. Go read the article and come back. You can also read the original paper (good luck with that!).

Thanks. (If you can already see the flaws in the results, then skip to the Big However at the end of this post.)

The first thing to note is that the researchers drew on out-of-print-books from the Project Gutenberg archive.

They set up their study to include 100 books per genre, and 50 “failures” and 50 “successes” and they measured this based on Gutenberg downloads.

They’re measuring which old books modern readers like to download!

That’s like going to a Renaissance Faire to learn about modern – or historical – fashion.  It only tells you what people like when they are slumming it in the past. They may not even like this stuff. A lot of these old tomes are on the reading list for school and college literature courses.

The researchers went on to apply their metrics to “extremely successful” books:

on a few extremely successful novels (Pulitzer prize, National Award recipients, etc).

“a few extremely successful novels (Pulitzer prize, National Award recipients, etc)….”

However, the Gutenberg statistics applied to neither Hemingway nor Truman Capote. The other successful modern books listed are mostly literary, which is a different game entirely from mainstream fiction.

So the results of the Gutenberg statistics can only reliably predict which old fashioned and literary books modern readers will read.

No wonder they have…

findings that are somewhat contrary to the conventional wisdom with respect to the connection between successful writing styles and readability

Writing styles have changed dramatically over the last century! Our current prose style doesn’t try to compete with the screen, big or small, and favors terse, transparent prose and optimized description.

This study simply does not tell us what readers like in  a modern book.


These folks have  reliably predicted which old fashioned and literary books modern readers will read. What might they manage with a different dataset?

The results of this particular study aren’t useful, but the tool they’ve produced isn’t going away. I bet they are in talks with Amazon right now…


Writer. Swordsman. CLICK TO SEE MY BOOKS !

Posted in Business of Writing, Writing

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.