Researchers look to visual psychophysics and machine learning in attempt to read ancient text
In an attempt to facilitate the reading of ancient handwriting, Researchers at the University of Notre Dame are working on an AI grounded in human perception.
It is the hope that the endeavor will help to preserve millions of manuscripts, some of which have never even been read. While some of these texts are already available to the public through digital captures, this is a minority.
The project is set to make handwritten manuscripts readily available with automated transcription.
The text will also be easily searchable for quick reference. “We’re dealing with historical documents written in styles that have long fallen out of fashion, going back many centuries, and in languages like Latin, which are rarely ever used anymore,” highlights Walter Scheirer, an Associate Professor at the Department of Computer Science and Engineering at Notre Dame.
Scheirer says that the new system will utilize machine learning and visual psychophysics, which looks at the connections between the physical world and mental phenomena. One of the areas of research is the time it takes for a reader to recognize a letter or an abbreviation, or ascertain the quality of handwriting.
The research team has looked at readers’ transcriptions of digitized manuscripts in Latin written by scribes in the ninth century. They then measured the times it took to understand the various words and passages to ascertain which were easier and which were more difficult to process.
“It’s a strategy not typically used in machine learning. We’re labeling the data through these psychophysical measurements, which come directly from psychological studies of perception by taking behavioral measurements.
We then inform the network of common difficulties in the perception of these characters and can make corrections based on those measurements,” Scheirer says.
While the project looks very promising, Scheirer team is still working on various improvements to the system. The main issue is the accuracy of the transcriptions. This can be particularly problematic when it comes to documents that have been damaged in some way or are incomplete. The network is also not yet ready to handle illustrations.
Connon Wood, the historian from the education portal EDUCALINKAPP, commented that the network could be particularly beneficial to scholars in the humanities. “Those wishing to have a deeper understanding of specific historical events and ancient cultures need to look to written material,” he says.
“So it’s imperative that these manuscripts are preserved, particularly when it comes to languages and cultures that are disappearing,” he continues.
“There are countless texts such texts. One that comes to mind is the Acts of the Town Council of Santiago de los Caballeros (Antigua), Guatemala, which dates back to the 16th century and documents the beginning of self-government in Guatemala.