![]() |
![]() |
University of Birmingham > Talks@bham > Artificial Intelligence and Natural Computation seminars > Document Engineering for Digital Libraries
Document Engineering for Digital LibrariesAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Leandro Minku. Several innovative document transformations and tools developed in the process of building the Digital Mathematical Library DML -CZ http://dml.cz are described. The main result is our new PDF re-compression tool, developed using a enhanced jbig2enc library. Together with pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size and transmission needs by 62%: using both programs we reduced the size of the original already compressed PDFs to 38%. We briefly describe workflow and tools developed for creating the digital library. The batch digital signature stamper, the document similarity metrics which uses four different methods, a [meta]data validation process and math OCR tools represent some of the main [by]products. Such document engineering, together with Google Scholar indexing optimization, have led to the success of serving digitized and born-digital scientific math documents to the public in DML -CZ, and are being employed also in The European Digital Mathematics Library, EuDML. This talk is part of the Artificial Intelligence and Natural Computation seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsBham Talks Virtual Harmonic Analysis Seminar Data Science and Computational Statistics SeminarOther talksTBA Ultrafast Spectroscopy and Microscopy as probes of Energy Materials Waveform modelling and the importance of multipole asymmetry in Gravitational Wave astronomy TBC Quantum Sensing in Space TBA |