MI in the Matrix
Apr. 10th, 2004 06:52 pmHey,
chr0me_kitten:
moretea has just posted the ten highest-mutual-information word-pairs in the Matrix.
I think the top three are (unsurprisingly): agent jones, mr anderson, and agent smith. I asked him to post the code for more Matrix script-analysis. I wonder what Bible Code methods would uncover in there...
I think the top three are (unsurprisingly): agent jones, mr anderson, and agent smith. I asked him to post the code for more Matrix script-analysis. I wonder what Bible Code methods would uncover in there...
no subject
Date: 2004-04-10 07:47 pm (UTC)no subject
Date: 2004-04-12 02:59 pm (UTC)Swooning kittens!
I am so all over that. I got the code here:
http://crl.nmsu.edu/~raz/Ling5801/papers/PerlIntro/associative.html#wordcounts
It's from an old course by Chris Manning (now at Stanford); an introduction to Perl for NLP.
However, there just a few Mac-isms in there that I edited out, since I use Linux, but unless I'm on crack the thing should run in Windows now as well. (And if I'm not mistaken the Mac-isms are os9 stuff, anyway -- OSX people needn't worry.)
And now that this comment has more lines than the code itself:
http://fieldmethods.com/code/mutual
pat@fieldmethods.net for the inevitable bugs... it's totally hacked together.
a much more credible package for doing Mutual Info and a zillion other collocational measures besides is the ngram statistics package:
http://search.cpan.org/~tpederse/Text-NSP-0.67/Docs/FAQ.pod
;)
& chrome_kitten might be interested in this:
http://del.icio.us/patfm/information_theory
the first link is a pretty good layman's intro to mi and info theory -- the book, the second link, is great but definitely requires somem math background (you might like it, trochee!)
cheers...