trochee: (bithead)
[personal profile] trochee
I've spent the last two weeks (!) trying to figure out how to relate no less than five different kinds of truth.

Before anybody thinks I've gone mystic, I should clarify: in speech recognition research, and other machine-learning contexts, truth refers to the right answer. We have hours and hours of conversations, transcribed by listeners at the Linguistic Data Consortium.

Unfortunately, there has been more than one pass at coming up with the right words -- the right truth. And derivative data like treebanks are based off one version, and not always the latest best one. So to do the kind of work I'm doing -- relating treebank annotation to prosody annotation -- I have to relate the latest, best truth words (for which we have prosody annotations) to the substantially older truth words that the treebanks were based on.

This word-alignment was supposed to be about a day's work in coding. But it's turned into two weeks of tedious examination of the various versions of truth words, trying to discover the differences and reproduce the various changes and script-based normalizations that got us from the old bad truth to a new and better truth.

It feels, in an ironic way, like I am doing historical linguistics, with each version of the truth words being a different attested language, and trying to work out how they all relate to each other by looking for mechanisms of change (digging around in the misleading, wrong, lost, or never-written documentation), grouping together those corpora that seem similar. I'm effectively using the Historical Method, except I'm doing it the way that the historical linguists never could until recently -- with Perl and emacs in hand, hammer-and-a-nail.

It's actually been an interesting project (and it's almost done, which is what I've been saying about it for about 13 days of the last two weeks). The frustrating thing is that of all the cleverness in data-munging I've done, and all the careful code- and data-archaeology that I've done to get here, none of it is publishable. I'm just hoping that the other researchers I'm doing this for are grateful enough to put me in as a secondary author.

reading list:
The latest issue of The Nation, headlined The Coronation of George W. Bush: the GOP Convention Issue

Date: 2004-09-03 12:01 am (UTC)
From: [identity profile] lapsedmodernist.livejournal.com
I am sure I am projecting, but it seems to dovetail with one of my comprehensive exam questions (I just took them last week) about History and Mythology as categories and narratives. You've got reified heuristics with Capital Letters. Like Truth.

Date: 2004-09-16 03:12 pm (UTC)
From: [identity profile] trochee.livejournal.com
I think I finally just understood this -- it only took me two weeks!

Yes, programmer geeks like to put capital letters on hackish, poorly defined subjects [heuristics] that we pretend are a clear, obvious concept [reified]. Is that how you meant it?

Even if it's not, it's provoked some additional thought. Examples: Bad and Wrong, Good Thing, Laziness, Impatience and Hubris.

Like some of the uses in other fields, the capitalized Reified Heuristics in geekery often get used in a contrarian way (Truth is opposed to truth in modern studies; Laziness and Impatience are opposed to "plain" laziness and impatience in geekdome). Except geeks like to turn things into TLAs (Three Letter Acronyms) as a followup to capitalizing the reified heuristic (Blue Screen of Death becomes BSOD becomes pronounced "bee-sod").

Thanks for commenting -- I'm glad I came back to this. Fun thought. Oh yeah, and congratulations on getting through the exams.

Profile

trochee: (Default)
trochee

June 2016

S M T W T F S
   1234
567 89 1011
12131415 161718
19202122232425
2627282930  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Dec. 30th, 2025 04:33 pm
Powered by Dreamwidth Studios