i finally had a little time to tackle his bug (after a visit to the Ling department to turn in my fellowship applications) and was able to figure out what was wrong -- at last!
Here's what happened:
The code S. was using was code I developed to synchronize ten or so different annotations of the same speech corpus: one describes words, one describes syntactic structure, another annotates disfluencies and speech acts, another annotates prosody, etcetera etcetera ad nauseam (and I do mean ad nauseam).
Here's one of the gotchas:
syntactic annotation wants to treat a sequence like "I'll do it" as four words:
(S
(NP I)
(VP
(Aux 'll)
(VP (V do) (NP it)))
However, phonological annotation (speech recognition transcripts etcetera) would really like to treat it as three: I'll + do + it.
Both sides have their logic. To align these files, i had to make a hand-built list of syntax "words" that should be mooshed together with the previous word. Here's my list: n't 's 'S -s 'd 're 'll 'm 've '
But I'd missed one: na, as in "gonna". That meant that when I was trying to align the phonological words, I had an extra syntactic word (either "gon" or "na" depending on the mood of the dynamic-programming alignment when it reached that point) that had no corresponding phonological word. And there was much lamenting in the code.
Hooray that it's fixed now! Hoping that S. doesn't find any more of these.
Re: quizzical
Date: 2006-03-28 10:22 pm (UTC)But i could be wrong.
Re: quizzical -> clarity
Date: 2006-03-28 10:28 pm (UTC)No longer confused. :-) Not certain of
Re: quizzical -> clarity
Date: 2006-03-28 10:31 pm (UTC)stay tuned...
Re: quizzical -> clarity
Date: 2006-03-28 10:33 pm (UTC)I was lusting after Vincent D'Onofrio.
Re: quizzical -> clarity
Date: 2006-03-28 10:34 pm (UTC)*sigh*
Re: quizzical -> clarity
Date: 2006-03-28 10:35 pm (UTC)Re: quizzical -> clarity
Date: 2006-03-28 10:36 pm (UTC)Re: quizzical -> clarity
Date: 2006-03-28 10:50 pm (UTC)Re: quizzical -> clarity
Date: 2006-03-29 12:09 am (UTC)