Deciphering with AI? – UnlockingTheMystery

The question that many people keep asking is whether it will ever be possible to fully decipher the Ramey Memo. At least to make the parts that aren’t obscured readable. With the latest methods, we will try to make as much of it recognizable as possible, even with the help of AI. We are already working on various approaches.

But what about pure AI recognition? Can’t we just use text recognition software to scan the photo with the memo?

First, it’s important to understand that the so-called OCR (Optical Character Recognition) is usually applied to readable content. For example, it allows entire pages of books to be scanned in seconds and immediately converted into text for digital storage or further processing. This makes it possible to digitize entire libraries of books in various languages, which would normally take centuries to manually type, within just a few years.

But what about old documents that aren’t in great condition? Even when it comes to ancient Egyptian scrolls, it’s no longer a problem today, as there is enough training material available for digital processing.

To simplify: To teach a computer or software to read a specific text, training material is needed. For normal text, which we see every day, there’s an abundance of it. To teach a computer how to read hieroglyphs, you first show it what an “A” looks like as a hieroglyph. The glyph in the photographed or scanned image is marked, and the computer is simply told that this marked spot represents an “A”. However, since every “A”, even as a glyph, doesn’t always look identical—varying in size, color, background, thickness, or being in shadow, and thus poorly legible—you need many scrolls with many different-looking “A’s”. Eventually, the computer will automatically be able to identify all the “A’s” in any newly scanned photo. Initially, the system might make mistakes, such as perceiving a poorly formed “M” as an “A”. At this point, the system must be manually corrected, noting that this is not an “A”. The system then stores this information and learns from it. This feedback of incorrect answers is called backpropagation. Humans also use a process of elimination to arrive at the correct answer if it seems plausible.

How can the recognition of an “A” be visualized? Imagine drawing a large “A” in the center of a transparent sheet. You do this on a second sheet and then on thousands of others. The “A” will always look slightly different. Now, if you stack all the sheets on top of each other and look through them, this is roughly how the computer stores the information on what an “A” might look like. You can assume that any additional “A” you write on a new sheet and add to the stack will already fit into the overall pattern without deviating much from it. This is admittedly a very simplified explanation; the reality is more complex. Anyone who wants to delve into the details can find thousands of articles and scientific papers on the internet.

But where lies the problem with the Ramey Memo and automatic recognition?

In short: There is no AI model for it because there isn’t enough training material available in the world. To train the AI, you would need tens of thousands of photographs of a similar memo, of the same size, taken from the same distance, with the added knowledge of exactly what is written on these thousands of memos. Only then could an AI analyze the Ramey Memo, compare it with the thousands of other memos of the same quality, and assign the individual letters based on their characteristics.

Does this mean that it’s impossible to create an AI for this purpose?

So far, the idea exists only on paper, but yes, it would be possible to develop an AI for it. The quicker question to answer is whether the cost would outweigh the benefit. The estimated cost for developing such an AI model, which would only be capable of reading the Ramey Memo, is up to $5 million, with a development time of about 18 months.

What would be the theoretical process for developing this AI?

The first step would be obtaining training material. Since none exists, it would have to be created, which would be an extremely costly and time-consuming effort requiring experienced personnel. Additionally, suitable hardware infrastructure would be necessary. The process would be as follows:

Brig. Gen. Roger Ramey with the Memo and Colonel Thomas DuBose. Attribution to: “Courtesy, Fort Worth Star-Telegram Photograph Collection, Special Collections, The University of Texas at Arlington Library, Arlington, Texas.”

A set similar to that of the press conference would need to be built, replicating the conditions as closely as possible. The lighting conditions are of particular importance, including the aluminum pieces on the floor during the press conference, which reflected light. Even Ramey and DuBose, who are seen holding the memo in the photo, would need to be positioned with dummies in exactly the same way. This is crucial for ensuring the same lighting conditions. Only when these conditions are as close as possible to those of the press conference can further processing be successful, as otherwise, success would be jeopardized. The lighting conditions are partly responsible for some of the letters being unreadable, yet they also left a pattern in the shadows.
A camera like the one that took the original photo must be positioned in exactly the same spot as during the press conference. This calibration is also of particular importance—time-consuming, but possible. The same camera type used by James Bond Johnson, the photographer, must also be employed. This was a 4×5 Speed Graphic. Moreover, the same photographic material and development methods from 1947 must be used. (Picture of a >Four by Five Speed Graphic” camera<)
Since dummies, which wouldn’t move, would replace real people, a rig would need to be built to hold the memo at the same size, angle, and position as during the press conference. In other words, the lighting conditions would be identical if all parameters align.
Several memos would need to be created on the same type of paper. For each type of character (A-Z, 0-9, etc.), a separate memo would be needed. The font must be the same as in the original memo, which shouldn’t be too difficult since there are documents and telegrams from that era. For each possible font in question, a separate memo with each possible character would also need to be created. What would these fabricated memos look like? Simple: one memo would have rows of “A’s”, from left to right, top to bottom. Another would have only “B’s”, and so on.

Since dummies, which wouldn’t move, would replace real people, a rig would need to be built to hold the memo at the same size, angle, and position as during the press conference. In other words, the lighting conditions would be identical if all parameters align.
Each of these fabricated memos would then need to be folded, aligned, or mounted in the prepared rig to ensure the same lighting conditions and angles for all photographs. From the exact same position as on July 8, 1947, a photo would be taken of each memo with the various letters and characters. The result would be photos of memos in the same poor quality as back then, but with one difference: we know which letters we photographed on each memo, which we can use to our advantage.
After all the photos are developed and scanned, the AI would be taught how “A’s” appear in poor quality at each position, regardless of whether they’re dark, light, or seen from different angles. Even though we may not recognize every letter on the new photos, we know what they are because we created the training memos. Each memo only contains one repeated letter.

In short: The memo with all the “A’s” is scanned, and we tell the AI that these are all “A’s”, so it learns the pattern for “A’s” in any position, regardless of poor quality. We do this with all the letter memos. This way, the AI will also be able to recognize all the poorly legible letters, as we’ve taught it what, for example, a poorly legible “B” in the shadow looks like. Even if we couldn’t have identified it from the new photograph, we know it’s a “B” because we made the training memo.

Creating the set, taking all the photos at the correct angles, developing, and processing them would take about a year. Simultaneously, the AI would be developed for this scenario. Despite all these efforts, there’s still a risk that this won’t make the entire memo readable, but it would be a significant step forward. Whether such an extensive effort is worth it is another question. But perhaps a less elaborate method will be developed in the future. Currently, we are working on a partial method.

Several critical factors:

1. Recreating the Original Conditions:

Precision: Recreating the exact lighting conditions, angles, and camera settings is vital. The slightest deviation could introduce variations that may confuse the AI, leading to less accurate results.
Complexity: Given the historical context and the technology used in 1947, there could be subtle, unpredictable factors (e.g., aging of materials, lens imperfections) that are difficult to replicate today.

2. Training Data:

Volume: Training an AI requires a vast amount of data. Generating thousands of training examples by fabricating memos with every possible character is sound. However, the sheer amount of data needed to train an AI to recognize letters in such poor-quality images would be immense.
Accuracy: The AI would need to learn not just the letters themselves but how they appear under varying conditions. This means the training data would need to cover every conceivable variation in lighting, angle, and quality.

3. AI Model Development:

Model Complexity: Developing an AI model sophisticated enough to differentiate between poorly legible letters in such challenging conditions is a significant technical challenge. The model would need to be complex enough to generalize from the training data to the real-world scenario of the Ramey Memo.
Backpropagation and Learning: As with any machine learning model, there will be a learning curve. Initially, the AI will make mistakes, and correcting these (backpropagation) will be essential. Given the uniqueness of this task, iterative development and fine-tuning would be crucial.

4. Potential Challenges:

Resource Intensity: The process would be resource-intensive in terms of time, expertise, and cost. The estimated costs could be up to $5 million, and the process could take about 18 months.
Risk of Incomplete Results: Even with perfect execution, there’s no guarantee that the AI would fully decipher the memo. Some parts might remain unreadable due to factors beyond control (e.g., damage to the original photograph).

Conclusion:

The method is innovative and could theoretically work if executed with precision and sufficient resources. However, the success of such a project would depend on the ability to control variables tightly, develop a robust AI model, and gather enough high-quality training data.