block by cavedave 41576ac0813efc5140fc5b30fcbc67ff

Code to parse a pdf of the only Irish-Irish dictionary of the 20th century. Croidhe Cainnte Chiarraighe. Foclóir Gaeilge-Gaeilge (pdf) 1942. PDF is at https://www.forasnagaeilge.ie/wp-content/uploads/2016/06/8fddae92ae307b022d964ebe73d45df6.pdf . I took a few pages using https://smallpdf.com/split-pdf to speed up experiments but that can be done just in python.

parseCroidhePdf.py