trochee

I've spent the last two weeks (!) trying to figure out how to relate no less than five different kinds of truth.

Before anybody thinks I've gone mystic, I should clarify: in speech recognition research, and other machine-learning contexts, truth refers to the right answer. We have hours and hours of conversations, transcribed by listeners at the Linguistic Data Consortium.

Unfortunately, there has been more than one pass at coming up with the right words -- the right truth. And derivative data like treebanks are based off one version, and not always the latest best one. So to do the kind of work I'm doing -- relating treebank annotation to prosody annotation -- I have to relate the latest, best truth words (for which we have prosody annotations) to the substantially older truth words that the treebanks were based on.

This word-alignment was supposed to be about a day's work in coding. But it's turned into two weeks of tedious examination of the various versions of truth words, trying to discover the differences and reproduce the various changes and script-based normalizations that got us from the old bad truth to a new and better truth.

It feels, in an ironic way, like I am doing historical linguistics, with each version of the truth words being a different attested language, and trying to work out how they all relate to each other by looking for mechanisms of change (digging around in the misleading, wrong, lost, or never-written documentation), grouping together those corpora that seem similar. I'm effectively using the Historical Method, except I'm doing it the way that the historical linguists never could until recently -- with Perl and emacs in hand, hammer-and-a-nail.

It's actually been an interesting project (and it's almost done, which is what I've been saying about it for about 13 days of the last two weeks). The frustrating thing is that of all the cleverness in data-munging I've done, and all the careful code- and data-archaeology that I've done to get here, none of it is publishable. I'm just hoping that the other researchers I'm doing this for are grateful enough to put me in as a secondary author.

reading list:
The latest issue of The Nation, headlined The Coronation of George W. Bush: the GOP Convention Issue

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Most Popular Tags

books - 8 uses
brains - 13 uses
busy - 29 uses
comics - 13 uses
computers - 43 uses
culture - 14 uses
diary - 62 uses
double dactyl - 1 use
dream - 1 use
home - 6 uses
lazyweb - 8 uses
linguistics - 1 use
link - 29 uses
mcwhirtle - 1 use
meme - 14 uses
moving pictures - 14 uses
nerd - 1 use
patterns - 1 use
personal - 15 uses
philosophy - 3 uses
poetry - 9 uses
politics - 15 uses
quotidiana - 1 use
school - 47 uses
silly - 32 uses
software - 1 use
theory - 17 uses
wedding - 2 uses
words - 20 uses
worry - 7 uses

Flat | Top-Level Comments Only

From:

lapsedmodernist.livejournal.com

I am sure I am projecting, but it seems to dovetail with one of my comprehensive exam questions (I just took them last week) about History and Mythology as categories and narratives. You've got reified heuristics with Capital Letters. Like Truth.

trochee.livejournal.com

I think I finally just understood this -- it only took me two weeks!

Yes, programmer geeks like to put capital letters on hackish, poorly defined subjects [heuristics] that we pretend are a clear, obvious concept [reified]. Is that how you meant it?

Even if it's not, it's provoked some additional thought. Examples: Bad and Wrong, Good Thing, Laziness, Impatience and Hubris.

Like some of the uses in other fields, the capitalized Reified Heuristics in geekery often get used in a contrarian way (Truth is opposed to truth in modern studies; Laziness and Impatience are opposed to "plain" laziness and impatience in geekdome). Except geeks like to turn things into TLAs (Three Letter Acronyms) as a followup to capitalizing the reified heuristic (Blue Screen of Death becomes BSOD becomes pronounced "bee-sod").

Thanks for commenting -- I'm glad I came back to this. Fun thought. Oh yeah, and congratulations on getting through the exams.

historical data-management

historical data-management

no subject

no subject

Profile

June 2016

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags