Plagiarisms, Algorithms, and Ostracisms

The latest outrage jolting the fiction-writing world is the Cristiane Serruya Plagiarism Scandal, or #CopyPasteCris in the Twitter world. I’ll leave it for others to gnash teeth and rend garments over the specifics of this case. As a former engineer and natural problem-solver, I prefer to look at what we might do to prevent future recurrences.

First, let me summarize. Alert and avid readers of romance books noticed matching phrases and paragraphs in two books: The Duchess War (2012) by Courtney Milan and Royal Love (2018) by Cristiane Serruya. These readers notified Ms. Milan, who reacted strongly. More investigations by readers of Ms. Serruya’s 30-odd books found uncredited passages and excerpts from 51 other books by 34 authors, 3 articles, 3 websites, and 2 recipes. In addition to Ms. Milan, the original authors included Nora Roberts, who likewise doesn’t tolerate imitation, no matter how sincere the flattery.

While lawyers gather for the coming feast, let us back away from the immediate affair, make some assumptions about the problem, and consider possible solutions. First, we’ll assume Ms. Serruya actually did what readers allege, that she (or her hired ghostwriters) copied other works and passed them off as her own. Second, let’s assume she is at least somewhat sane and had semi-logical reasons for doing so. Third, we’ll assume Ms. Serruya is not alone, that there are others out there doing the same thing.

What might her motivations have been? Why would anyone do this? In her post, Ms. Roberts asserts the existence of “black hat teams” working to thwart Amazon’s software algorithms to maximize profit. For more on this practice, read Sarah Jeong’s post from last summer, based on the Cockygate scandal.

It’s possible we’ve reached a point where (1) the ease of copying books, (2) the money to be made by turning out large numbers of romance books, and (3) the lack of anti-plagiarism gatekeepers at Amazon, have combined to produce all the incentive needed for unscrupulous “authors” (even a cottage industry of them)  to copy the work of others.

Setting aside the current scandals, which must be resolved in light of existing laws and publishing practices, what can we do to prevent this in the future? How would we arrange things to dissuade future imitators of Ms. Serruya? What follows are four potential solutions, listed in order from least desirable to my favorite.

  1. Make Copyrights like Patents. Consider how copyrights differ from patents: they’re free; they’re automatic; they require no effort by the government. For a patent, though, you must pay the government to determine if your invention is distinctly different from previous patented devices. If the government grants your patent, you then have full government support and sanction for your device, and a solid legal foundation to go after those who dare to infringe. We could do the same with copyrighting books. Boy, would that ever slow down the publishing world!
  2. Make Amazon a Better Gatekeeper. Amazon and other distributors could set up anti-plagiarism software that detected if a proposed new book contained too many copied phrases from other books. Then they’d simply refuse to publish books that didn’t pass that algorithm. Although pressure from customers might force Amazon to do that, it’s not likely to happen, as explained by Jonathan Bailey in this post.
  3. Make Use of Private Plagiarism-Checking Services. Imagine if a private company offered (for a fee) to check your manuscript to see if you’d plagiarized. Assuming your manuscript passed, you would cite that acceptance when you published it, similar to the Underwriters Laboratory model for electrical systems. Readers might tend to select plagiarism-checked books over those not certified. This would put a financial burden on authors.
  4. Trust Readers. We could rely on astute readers to detect plagiarism, to notify the affected author, and to use social media to shame the plagiarizer publicly. This is, of course, where we are today. It requires no new laws, no fees, and no algorithms. It’s not perfect, but so far, it is proving workable.

If you think of other, better solutions, please leave a comment. Oh, and in case you were wondering, I wrote every word of my stories. Just ask my alter ego—

Poseidon’s Scribe

Best-Seller Foreteller?

What if a soothsayer could tell you if your manuscript would become a best-seller? If you were a publisher, you’d hire that soothsayer, right?

Throughout the history of the publishing industry, editors and publishers had to make buy-or-reject decisions based on experience and gut feel.

Welcome to the Age of Big Data.

Crystal ball image from Wikipedia

According to an article in The Telegraph , researchers at Stony Brook University used computers to analyze writing styles and could predict whether a book would be successful with up to 84% accuracy.

Following up on that, Jodie Archer and Matthew L Jockers wrote The Bestseller Code, a book about their algorithm (the “bestseller-o-meter”) that analyzes character, plot, setting, style, and theme to make its predictions. According to an article in BBC Culture, this strangely named algorithm is also highly accurate.

More recently, I read an article in BuiltinAustin about a company in Austin, Texas called AUTHORS.me that has developed their own algorithm, StoryFit, which they market to publishers.

These algorithms chew on massive amounts of data—thousands of novels—and perform statistical analyses. After being given test data about past novels for which the success or failure results are known, the algorithm “learns,” or at least develops rules, to distinguish best-sellers from flops. You then apply the algorithm to an unpublished manuscript and make a reasonable prediction. A crystal ball for novels.

Could this lead to a world where publishers reject your manuscript because their algorithm said it wouldn’t sell? Or a world where authors could edit their manuscript to add in the aspects such algorithms judge to be indicative of success? Could the writing and publishing of novels be reduced to a numbers game?

Not quite yet, apparently. The Stony Brook University algorithm struggled to predict the success of books in one genre—historical fiction. Also their algorithm “predicted” Hemingway’s The Old Man and the Sea would flop. Archer and Jockers’ bestseller-o-meter rated The Help by Kathryn Stockett as meh. Further, the novel achieving their algorithm’s highest score (The Circle by Dave Eggers) was a commercial failure.

Certainly, these artificially intelligent systems will improve and get more accurate in the coming years. They’ll identify trends in how the reading public’s tastes are changing. Maybe the algorithms will never be 100% right, and some books they reject will succeed and vice versa. Every now and then, an author tries something new and it sells well despite being unlike the norm. They do call them novels, after all.

As publishers make increasing use of tools that predict a novel’s success, and as authors begin to use similar tools to tune their manuscripts for market success, could it be that overall novel writing will improve? Will that lead to an increase in readership, a renewed clamor for books by the buying public?

I hope so. In the meantime, my new big-data algorithm has just finished analyzing all my previous blog posts, and states there is a 99% probability I’ll conclude this one by signing it—

Poseidon’s Scribe