Science Success Story
Freedom, They Printed
AI on XSEDE-allocated system solves mystery of who printed seminal works on liberty
By Ken Chiacchia, Pittsburgh Supercomputing Center
Movable metal type for a printing press. By Willi Heidelbach, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=154912
Prior to the 18th century, expressing your ideas on politics, religion, even divorce—anything the country's leaders found threatening—could get you arrested in England. Could get you killed.
The 17th century English Civil War proved a boon to free speech. Censorship was disrupted. A tidal wave of forbidden publications flooded England. It was still dangerous to write and print such books. Some authors were anonymous. Others were willing to take the risk of putting their names to their work. But the vital printers—without whom circulation of the books would have been far more limited—were mostly anonymous. Historians don't know the printers of an astonishing 25 percent of English-language books in the 17th century.
"Take a second and think about a world without the First Amendment. There's no such thing as freedom of the press; publishing is tightly regulated by the guild, the Stationers' Company, [with additional control] by the crown and Parliament. They all had their own interests, but frequently combined to tamp down anything they deemed threatening."—Christopher Warren, Carnegie Mellon University
One of the most important of these books was John Milton's Areopagitica of late 1644. Its plea for freedom of speech helped transform England. It led indirectly to the U.S. First Amendment. But its printers were unknown.
Researchers at Carnegie Mellon University—Christopher Warren, Taylor Berg-Kirkpatrick, Max G'Sell, and Shruti Rijhwani—wondered if they could solve that mystery. They turned to the Bridges platform, an XSEDE-allocated system at the Pittsburgh Supercomputing Center.
How XSEDE Helped
Woodblock of an early printing press of the type used to create the Aeropagitica. By Jost Amman -Meggs, Philip B. A History of Graphic Design. John Wiley & Sons, Inc. 1998. (p 64), Public Domain, https://commons.wikimedia.org/w/index.php?curid=2777036
To solve the mystery, the team wanted to expand massively an approach that had previously been done by the human eye. Their artificial-intelligence (AI) approach would require an ability to store and move data fluidly between many compute nodes. Having worked with XSEDE previously under a number of allocations, they realized that the XSEDE-allocated Bridges system would be particularly suited to the task. The system's user-friendly architecture also made it possible to run an image-recognition program called Ocular, which was ideal for their approach but had not been written for supercomputers. The researchers enhanced their machine analysis with the insight of human expertise.
Their automated approach took advantage of the decidedly non-automated nature of 17th century printing technology. To produce a book, printers would set lead type—one piece for each letter—in a wooden rack, backwards. They smeared the type with ink, and pressed a piece of paper onto it prior to assembling the pages into a book. The ink transferred right-way letters onto the paper. But type pieces were imperfectly cast and suffered damage from use. Some developed tiny irregularities. These would show up every time that type piece was used.
David Como of Stanford University proved that such irregularities were as good as fingerprints. But the laborious process of matching irregular letters between unknown and known printers made the work very slow. The CMU team wondered whether they could use Ocular, a computer program for analyzing type, to compare the type in the first edition of Areopagitica and several other forbidden books of the period with the type in about 100 books whose printers were known.
They used a machine learning (ML) approach to recognize text in old printed documents and to match irregular characters across books by the same publishers. Their specific method was a custom generative probability model of the printing press, in which the AI simultaneously made inferences about the parameters of the printing process and the actual text that was printed. The researchers applied this model to type from both known and unknown volumes.
"Having worked with PSC and XSEDE before on a prior digital humanities project, I knew that the infrastructure and support were well-suited for a project like this one. Working with XSEDE on [our prior project] Six Degrees of Francis Bacon, I experienced XSEDE's strong commitment to supporting projects coming from researchers like historians and literary scholars who are relatively new to supercomputing. When we applied for XSEDE support, we requested allocations on Bridges because of geography, existing relationships with PSC staff, and to ensure smooth transitions from existing local Bridges allocations. XSEDE GPU resources, storage, and Extended Collaborative Support give us the capacity to scale up our project to try to identify printers for roughly 9,000 anonymously printed books from the late 17th-century era of John Locke and Isaac Newton."
Samples from three 17th Century forbidden books, showing how an irregular "C" helped identify them as produced by the same printer.
"Having worked with PSC and XSEDE before on a prior digital humanities project, I knew that the infrastructure and support were well-suited for a project like this one. Working with XSEDE on [our prior project] Six Degrees of Francis Bacon, I experienced XSEDE's strong commitment to supporting projects coming from researchers like historians and literary scholars who are relatively new to supercomputing. When we applied for XSEDE support, we requested allocations on Bridges because of geography, existing relationships with PSC staff, and to ensure smooth transitions from existing local Bridges allocations. XSEDE GPU resources, storage, and Extended Collaborative Support give us the capacity to scale up our project to try to identify printers for roughly 9,000 anonymously printed books from the late 17th-century era of John Locke and Isaac Newton."—Christopher Warren, Carnegie Mellon University
The solution to the mystery was worthy of an Agatha Christie novel—specifically, Murder on the Orient Express, in which (spoiler alert) they all did it. Historians had long suspected that printer Matthew Simmons had been involved in publishing Areopagitica. He was known to print forbidden books and had printed Milton's non-forbidden publications. But they lacked hard evidence. The ML analysis showed that type in Areopagitica matched books known to have been Simmons'. More surprising, the type also matched works by Simmons' ex-partner, Thomas Paine. The researchers don't know what this means; historians had thought the two had broken their partnership by late 1644 and had never worked together after.
First page of Areopagitica, by John Milton
The type also matched works printed previously by Gregory Dexter. But he had been shut down in a government raid early in 1644. Soon after, Dexter left England for the colony that was to become the U.S. State of Rhode Island. The researchers don't know how Dexter's type pieces appear in Areopagitica. Possibly, Simmons or Paine bought them when Dexter's business was liquidated. As Warren and his coauthors write in the Spring 2020 edition of the journal Milton Studies, their analysis "raises nearly as many questions as it answers."
The next step will be to expand the analysis. The Milton Studies paper identified Simmons and Paine as the printer of eight other books on civil liberties, including one by Roger Williams, the founder of Rhode Island. But hundreds still lack an identified publisher. The team is investigating using deep learning, in which multiple layers inference are used to create a more sophisticated artificial intelligence, to tackle this much larger problem. They'd also like to make the analysis no longer need enhancement with human expertise, and sensitive to more subtle measures of type irregularity. This will involve 10,000 books covering all the anonymous books and every known printer across over a decade in the 1600s. Such work will involve the deep-learning-specialized Bridges-AI and the future Bridges-2 system, each of which contains many coupled AI-optimized graphics processing units (GPUs) for large-scale deep learning.
Research for this publication was supported by an A. W. Mellon Digital Humanities seed grant from Carnegie Mellon University, a resource allocation from the Pittsburgh Supercomputing Center (HUM170002P), and a grant from the National Science Foundation ("Print and Probability," 1816311).
At a Glance
- In the 17th century, you could get jailed or even executed for criticizing the government of England.
- A flood of books on civil liberties, produced at great risk by anonymous printers, helped change that.
- An artificial intelligence (AI) analysis of irregular letters using the XSEDE-allocated Bridges platform has helped a Carnegie Mellon team solve the mystery of who printed nine of these seminal works.