trochee: (Default)
[personal profile] trochee
In brief:
  • 08:36 Waiting for the start of the Web-as-corpus workshop www.sigwac.org.uk/wiki/WAC6 #naacl2010 #
  • 08:42 California, WTH is wrong with you? bit.ly/bV0lUj #
  • 08:48 Guevara: WaCorpus for Norwegian. Norwegian has TWO written standards Bokmål & Nynorsk (& many dialects) #Ididnotknowthat #naacl2010 #WAC6 #
  • 08:56 Guevara ran into Norwegian copyright law working on the web; NoWaC will be free & legal (but research-only) #WAC6 #naacl2010 #
  • 09:04 Duplicate removal was crucial issue for Norwegian data (must go read Broder et al 1997,8) #naacl2010 #WAC6 #
  • 09:15 Now Korean WaCorpus (pres. by Ross Israel). Corpus towards learner particle-error detection in Korean #naacl2010 #WAC6 #
  • 09:34 Invited talk by Patrick Pantel (Yahoo! to Bing) on finding Web knowledge and transferring to search #naacl2010 #WAC6 #
  • 09:37 Yahoo!'s "web-of-objects" sounds a lot like @freebase when Pantel describes it (it's dbpedia-branded) #naacl2010 #WAC6 #
  • 09:45 Pantel main punchline: feature engineering makes a huge difference in entity extraction #thatoneIknew #naacl2010 #WAC6 #
  • 10:21 Pantel's experiments on seed-set prototype removal are fascinating. "prototypicality" can actually be a problem #naacl2010 #WAC6 #
  • 11:13 Goyal et al. talk:using clever trix to sketch cts over v. large data (hashing, conservative updates) #naacl2010 #WAC6 #
  • 11:19 Goyal et al. evaluate approx v exact PMI: good eval for these sorts of sketch-counting #naacl2010 #WAC6 #
  • 11:24 Goyal et. al. get almost no loss on Turney SO-PMI by using 8Gb of counters over 60gigaword stream #naacl2010 #WAC6 #
  • 11:33 Dillon: academic prose web corpus with bootcat. with paper handout. #oldschool #naacl2010 #WAC6 #
  • 12:02 Stefan Evert on Google teraword 5gms made easy "but not for computer" #WAC6 #naacl2010 #
  • 12:08 Evert: Web1T5-easy shoves W1T5 db in sqlite (did it myself in mysql last month!) also adds normalization #naacl2010 #WAC6 #
  • 12:11 Evert: "it uses only 211Gb, and we don't worry about that too much." everyone over 30 chuckles uncomfortably #naacl2010 #WAC6 #
  • 12:15 psyched to throw out my own crappy mysql code and get Evert's --actually DOES seem to make it easy on computer #naacl2010 #WAC6 #

I often use twitter to mention what's happening or linkdump. I LT here for posterity.

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

trochee: (Default)
trochee

June 2016

S M T W T F S
   1234
567 89 1011
12131415 161718
19202122232425
2627282930  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 28th, 2026 09:01 am
Powered by Dreamwidth Studios