|
Post by Paul on Jun 2, 2005 4:52:35 GMT
Hi,
As you know I have been collating books and papers. I recently OCRd a book that was just a picture for each page. (Arno, it was the 1919 one you posted a link about). I extracted the text probably 95% accurate (it took all night but it got there) and I have been slowly correcting it.
The reason being that, we can make it available in a better form. As it is 300+ pages long and we need to be able to search it so making it properly electronic seems a worthy thing to do. It can be sliced, diced, sections copied out of reference on other pages, searchable and can be easily cross-referenced with other similar papers.
Using this method of OCR and correction, we could do this for a number of manuscripts by splitting it up, and doing 10-20 pages each.
Also, Google are supposed to be doing it for 1000s of old books, and there is talk of a European equivalent. Perhaps, sending loads of requests to these projects or donors will encourage them to pick some of the ones we want. That way, they do all the hard work. It is just a question of when they will get around to it. But harvard.edu is on the list, and they own the Bequaert book.
paul
|
|