trochee: (amused)
trochee ([personal profile] trochee) wrote2006-02-11 12:48 pm

Search in side for [OCR t]error

[livejournal.com profile] thetensor points me to Amazon search-inside, which now seems to work with some comics.

The character-recognition software is really terrible though. Consider the following Star-Wars OCR:

AND-- INSTRUMENTS OUR BOUNTY-HUNGRY INDICATE A FRIEND MUST I4AVE PACESYi/P HAD HIS PR/YATE uFTING OFF CRAFT HIDDEN FROM TEN-MILE THERE, LUKE. PLATEAU, HAN!
Boy, they really need a decent (or better-trained) language model!

[identity profile] http://users.livejournal.com/merle_/ 2006-02-11 09:16 pm (UTC)(link)
OCR software is really horrible at word recognition, partly because a lot of it tries desperately to maintain the position of each letter -- so if it sees a "/" above the baseline instead of "I", it gets confused, and cannot reintegrate the letterish things into a word.

It does seem as if a dictionary could be applied, if less importance was given to absolute appearance.

[identity profile] xaosenkosmos.livejournal.com 2006-02-11 10:26 pm (UTC)(link)
I'm totally going around shouting "Plateau, $NAME!" at my friends now.

(and, the requisite typesetting pun: "They really need a descent language model!" hah-hah!)