Suppose [hypothetically] I just received an email with the subject line "This is not an interesting email".

Would I read it?
arrived at jointly with clever coworker:

The Library of Babel demonstrates the flaws in targeting recall in information retrieval systems.
I am reminded by a friend (in a locked entry) of how great is the Peggy McIntosh essay White Privilege: Unpacking the Invisible Knapsack. Read the whole essay here [or PDF]. Some of the most thought-provoking parts:
I did not see myself as a racist because I was taught to recognize racism only in individual acts of meanness by members of my group, never in invisible systems conferring unsought racial dominance on my group from birth. )
Sad but true that this is just as relevant as it was in 1988. No progress yet, as far as I can see; even more headway into this particular delusion. And of course the same things can be said for being male, masculine-presenting, straight, anglophone, and born into an educated upper/middle-class family. [and right-handed too, as [ profile] _dkg_ might point out.]

And it seems particularly relevant in discussions like this one or this one when we consider what it means to be car-less, power-less, and hungry and thirsty in New Orleans this week.

[Update: This Alternet article was recently posted on the same subject (found via [ profile] debunkingwhite). The comments from "liberals" reading an unashamedly left website and still resisting the thought that white people have responsibility for racism make me nauseous.]

According to the New York Times, Gonzales is Seeking to Stem Light Sentences.

I owe this one to a transitive-closure labmate (he's not my labmate, but he's a former labmate of some of my labmates) who's now on the East Coast.
I can't remember if I've mentioned this before on Livejournal, but [ profile] solri's recent complaints about "Theory" rang a bell for me, and I thought I'd spread a useful metalinguistic concept around a little.

A-bleaching: the process by which an acronym or abbreviation moves from full compositionality to -- at the extreme end -- complete lexicalization.

Examples are fairly easy to find in technical work -- acronyms and abbreviations tend to be most obviously A-bleached when they are used in ways that are "redundant"; these are usually considered "anacronyms", a cutesy word describing accidental A-bleaching:

  • laser radiation
  • scuba apparatus
  • the NATO organization
However, the new term "A-bleaching" is particularly interesting in the case of deliberate A-bleaching:

Deliberate A-bleaching: In particular, this term should be used to describe cases where the acronym is still obviously an acronym, but the community-of-use deliberately refuses (for "theoretical" reasons) to allow spell-out. Canonical examples of deliberate bleaching could be "move-alpha", "D-structure", and "PF", where "alpha" is (intra-discipline historically) derived from "A-structure" < "Argument structure", and "D-structure" < "Deep structure" and "PF" < "Phonetic form" --- and yet in all three cases, much contemporary theory denies any relationship of these terms to "deep"ness, "argument"s or "phonetics". The original, compositional meaning has been deliberately bleached.

Possible causes of deliberate A-bleaching:

  1. Physics envy: by using things with obscure and technical-sounding names (like "a-bar movement" or "little v"), the field gains an aura of mathematical-seeming precision.
  2. Attempts to retain results while changing theory: by retaining the conceptual slots of the older theories, the new theory may be trying to maintain the relevance of the older work, while proposing a new interpretation of that work.
  3. Abstraction of two similar concepts: It's possible that (under some circumstances) two different phenomena can be unified together into a single concept. Assigning this concept an abstract name has worked for physics and math. But see #1 above.
  4. Exclusion of outsiders: like any field, jargon serves two purposes. It can be used as a shorthand for useful packages of information, and it can be used as a shibboleth to exclude those who have not been inducted into the secret wisdoms. What better shibboleth than a collection of explicitly opaque symbols?

PS: yes, the term "A-bleaching" is my own invention.
PPS: yes, I am aware that the term "deliberate A-bleaching" is autological [as is my username], because it attempts to unify "acronym bleaching" and "abbreviation bleaching" (type #3 above).

Whoa, the gauntlet is thrown. Richard Sproat to P&P/Minimalism: "put up or shut up": create a working P&P parser by 2008 or concede defeat.

And much very heated foofara then ensued.

I don't know how I missed all the fun before. Oh wait, yes I do. I've been kinda busy.

Quoth Sproat:

...[the final zinger, emphasis mine:]

In fact, we would be delighted if someone succeeds in meeting our challenge. Such success would convince us that the P&P enterprise is, after all, a testable theory with genuine scientific content.

My abstract for the UNLP workshop:
High-speed, high-entropy parse forest pruning with TUNGUSKA

Transylvania Polygnostic University
High Energy Magic Building

Genetic Algorithms are a popular theory, especially because we hear that genetics research is well-funded these days, and we suspect that government agencies often use bag-of-words models to make grant decisions. [Please take note that this research has no bearing on terrorism, biostatistics of terrorism, biostatistical terror, or the missing chemical weaponry in Iraq.] Growing sophistication in these algorithms has incorporated more and more analogies from evolutionary and molecular biology, including "crossover", "mutation", "island effects", "Dr. Moreau", and "wolf-boy".

[Alvarez and Alvarez] propose that the superorder /Dinosauria/ was erased by a long-distance movement phenomenon involving a kiloton ice comet, bringing in the advent of angiosperms known as "trees". Our TUNGUSKA system implements an analogous method for construction of syntactic trees designed to follow these trends. Our parser, implemented in SNOBOL, uses catastrophic destruction of a treebank or parse forest to provide an ecological niche for new trees, using a BLAST and PSI-BLAST pruning technique only recently approved by the Department of Energy.

We present current results in the first stages of this experiment, which has a large effective radius and has resulted in great support from nearby surviving faculty, who are happy to move their offices to accomodate our research. Many have issued supportive comments like "if you run that thing again you'll kill us all." They laughed at us at the academy, but who's laughing now?

Ah, I crack myself up.

A lab mate sent me the following question, passed on from a friend:
So, he's got a sentence "I like the book Gone with the Wind" and he's assuming he can figure out that Gone with the Wind is a title, so he can treat that as kind of a single blob. We're not sure how to parse the sentence, though. If it were something like "I like the red book", it would be easy - "the red book" is a noun phrase. We're not sure if Gone with the Wind would be considered some variant of a noun, though, or how the phrase "the book Gone with the Wind" works. The best idea we have is that Gone with the Wind is an appositive, or maybe even that "the book" is an appositive.
Well, dear readers, never hesitate to ask your friendly neighborhood Language Computeer. Neither rain nor snow nor thesis deadlines looming shall stop the mail. I wrote back as fast as I could, responding to the oddball wireloom projected on the cloudbank overhead:
yeah, the word "appositive" jumped into my head too.

Wikipedia seems to confirm this. The third example seems like a very close match to this case, which is described there as a "restrictive" appositive.

An interesting note here: there are in fact two classes of appositive phrases:

I like Vivien Leigh, the actress.   [non-restrictive]
I like the actress Vivien Leigh.    [restrictive]
note that non-restrictives always have commas and restrictives seem to disprefer them:
*? I like the actress, Vivien Leigh.
*? I like Vivien Leigh the actress.

I think that this comma distinction actually mirrors a prosodic difference between the two: the non-restrictive appositives seem to allow phrasal closure (a L- prosody break, to use ToBI annotations). But my opinions may be biased by years of literacy. Has anybody studied the prosody of appositives? oh, yes.
Jason Eisner announces Unnatural Language Processing Workshop.

Eisner is just the kind of guy who would actually publish these papers, so if you have something in mind, send it!
In a conversation that included [ profile] chr0me_kitten earlier today, she brought up apophenia*. [Edit: Also, earlier this week, [ profile] imtboo and [ profile] blackwingedboy have both been talking about Mercury in retrograde, which, initially, frustrated the heck out of me. But [ profile] imtboo and I talked about it. Now bear with me, here, 'cause I'm coming back to that thought.]

An old professor of mine [at least, I think it was him! it was about the same time I was reading Daniel Dennett for the first time, and it mighta been him] used to rant about how the apparent inner voice of consciousness, and indeed the useful mental processing that goes on as a cognitive tool, is "merely" a short-circuit to the mouth-ear loop. He didn't go into much detail, but I've adopted the idea fairly firmly as I continue to study language, computation and communication.
My point, if it wasn't already clear [i'm getting to it, I promise!]

analysis of the world into any arbitrary system is itself a creative act. Sometimes, the truth is in the data, and sometimes the truth is in the learner. When we have mental "ruts", we often need to reorganize what we already know and look at it all from a new perspective. Have you ever packed a suitcase only to find that not everything fit, and then found that if you unpack it all and start over, it all fits without trouble? That sort of "serialize, then restore" seems to be useful.

what's that John Updike story? you know, the one with the middle aged guy, who has an affair?

Sometimes, I just wish that the dragon flying the spaceship would crash through the roof: it might be tacky, but it'd liven things up a bit.

That bit above is from one of my lab mates, and came from a discussion I had with him. The other day [ profile] redredshoes pointed me to a rant about genre that (despite its raging misogyny) provoked some interesting questions about whether "science fiction" should be even trying to maintain itself as a separate genre. One of the main points (I can't summarize them all) is that there's plenty of good material that calls itself SF, and plenty of bad material that calls itself SF, and that the criteria for distinguishing them aren't so different from the criteria we might use to determine good vs. bad mainstream ("realist"?) fiction.

William Blake is plenty fantastical, but considered mainstream, Catch-22 doesn't get shelved with war fiction. That's because it's not "war fiction", goes the core of the argument. The really good stuff transcends the genre, and genre fans shouldn't be even trying to defend the genre. Recognize that good writing -- good art crosses boundaries anyway, and circling the wagons to pretend that Death of Superman is somehow worthy of the praise that Love and Rockets garners, because they're both comics, for goodness' sake drags L&R down into the muck with underpants on the outside. Never mind that L&R uses superheroes, robots and rocketships occasionally, or fantasy, crime, sex and violence occasionally -- it's still not the same as cheapshot crime fiction, factory-grade pornography, or "I could never marry someone so stupid".

One example of good, genre-crossing fiction I came across recently might be "Spacetime for Springers" which seems to me to be fundamentally a short story by any measure, and essentially free from genre, despite being written by Fritz Leiber.

I've started reading Gibson's latest, Pattern Recognition, which is set in the present (roughly) in a similar genre-ignoring way, and avoids the pitfalls and traps of trying to predict the future, which (as above labmate above commented) always seems to turn out as a period piece of the time of writing.

My favorite example of this is actually from Gibson's classic, Neuromancer, when I read it back in the 80s, where Case escapes black ice hack-protection software by hitting the Escape key.

Oh, it can't be an accident that the main character of Pattern is named "Cayce"; Gibson even goes out of his way to have her explain to an obsolete-hardware otaku that it would ordinarily be pronounced "kay see" but here it's definitely "case". Hm.

I've just bought a new dining room table and matching chairs. I feel so gentrified. It was a bit of an ordeal, because the total was really astonishingly high.

anyway, I'm only writing about it because of the interesting locution that the gentleman who took my credit card had:
Hello, Macy's furniture, this is Rogelio! How may I provide you outstanding service today?
I can't tell what pragmatics rule this violates, but it seems to be startlingly off somehow.

Perhaps it's some kind of double-ironic Griceian toe-pick, intended to encode the phrase fuck you, I don't even know you, strictly by the mechanism of superfluity.
seems to me that the goal of computational linguistics should not really be trying to encode what we know about language into computers.

the goal should be encoding how we learn about language. We [linguists] have a terribly unclear picture of what it means to have a good theory -- we talk and talk about minimality, elegance and Ockham's Razor, but have lousy metrics for quantifying the quality of a theory.

My department is choosing among several candidates for a new computational linguistics position.

porter spent his entire talk explaining how he mapped multiple databases onto the same format. There were no linguistics results, and the mapping wasn't automatic. I wasn't even convinced that he had read his slides before presenting them, either.

haze spent his entire talk using interesting methods on an interesting (if simple) problem. But he showed no interest in exploring why his methods worked -- a few directed questions from the engineers in the audience revealed that he had no interest in the methods, not even well enough to understand them. Any member of my lab would be better qualified, even those of us who are pre-Master's.

glass spent her talk exploring a technique that tries to learn how linguists analyze data, using some mocked-up linguistics results. I wasn't convinced by the utility of the problem she was trying to solve, but she followed the approach I believe in:
linguists seem to know a good solution when they see it. But they can't pin down how it's measured. Therefore let us use a number of exemplars of good (and bad) solutions and try to infer the metric for "good solution".
This solution matches what I want.

Unfortunately, I think that the faculty will hire porter. This is not helping my mood today.
Here's [pdf, may need subscription] an interesting article.

"The language of genes." Nature, 420:211--217.
Linguistic metaphors have been woven into the fabric of molecular biology since its inception. The determination of the human genome sequence has brought these metaphors to the forefront of the popular imagination, with the natural extension of the notion of DNA as language to that of the genome as the 'book of life'. But do these analogies go deeper and, if so, can the methods developed for analysing languages be applied to molecular biology? In fact, many techniques used in bioinformatics, even if developed independently, may be seen to be grounded in linguistics. Further interweaving of these fields will be instrumental in extending our understanding of the language of life.
I think this article -- as interesting and useful as it is -- seems to be a little oversimplified in all the areas.

Perhaps that's what makes interdisciplinary success -- as long as you're convincing each reviewer that the material in the other area is good, you're golden.

EDIT: here are some other links that seem to point to the same article.

nerd joke

Mar. 11th, 2005 03:34 pm
what do machine-learning experts have in common with obstetric fertility experts?

nerd humor

Feb. 22nd, 2005 11:10 pm
in a discussion of the nuances of the comparative lexical semantics of pet and pat, colleague writes:
The construction of jokes about non-paradigmatic dog touching is left as an exercise for the reader.
and, I might add, the construction of the appropriate context for that joke is left as an exercise for the reader.

okay, I'll shut up and let someone else use your friends-list now. Thanks for tolerating me.

