University of Birmingham > Talks@bham > Artificial Intelligence and Natural Computation seminars > Document Engineering for Digital Libraries

Document Engineering for Digital Libraries

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Leandro Minku.

Several innovative document transformations and tools developed in the process of building the Digital Mathematical Library DML -CZ http://dml.cz are described. The main result is our new PDF re-compression tool, developed using a enhanced jbig2enc library. Together with pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size and transmission needs by 62%: using both programs we reduced the size of the original already compressed PDFs to 38%. We briefly describe workflow and tools developed for creating the digital library. The batch digital signature stamper, the document similarity metrics which uses four different methods, a [meta]data validation process and math OCR tools represent some of the main [by]products. Such document engineering, together with Google Scholar indexing optimization, have led to the success of serving digitized and born-digital scientific math documents to the public in DML -CZ, and are being employed also in The European Digital Mathematics Library, EuDML.

This talk is part of the Artificial Intelligence and Natural Computation seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Talks@bham, University of Birmingham. Contact Us | Help and Documentation | Privacy and Publicity.
talks@bham is based on talks.cam from the University of Cambridge.