trochee: (fear)
trochee ([personal profile] trochee) wrote2006-02-09 01:33 pm

google desktop users beware

EFF points out the risks of the new "search across computers" option for Google Desktop.
"Unless you configure Google Desktop very carefully, and few people will, Google will have copies of your tax returns, love letters, business records, financial and medical files, and whatever other text-based documents the Desktop software can index. The government could then demand these personal files with only a subpoena rather than the search warrant it would need to seize the same things from your home or business, and in many cases you wouldn't even be notified in time to challenge it. Other litigants—your spouse, your business partners or rivals, whomever—could also try to cut out the middleman (you) and subpoena Google for your files."
Heads up. [em-dashes corrected here; see comments for discussion of the encoding errors in EFF original]

[identity profile] xaosenkosmos.livejournal.com 2006-02-09 10:20 pm (UTC)(link)
The text contains wonderfully non-UTF characters, 0x97 em-dashes. I only mention it because a blarg i read has been using 0x14 em-dashes, so i'm in minor encoding-nazi mode. (0x14?! Makes me want to die. Who decided em-dash belonged in the control characters?!).

Out of curiosity, what did you use to make this post? Normally, LJ is absurdly well-behaved about character encodings, so i'd have to guess a browser that really wants to believe it's in CP1252 even though the LJ update page is pretty vocally UTF-8.

[identity profile] trochee.livejournal.com 2006-02-09 10:25 pm (UTC)(link)
I updated with an oldish version of Logjam. But I'm running Gnome on a work computer and copy-pasted to get that quotation in here.

Incidentally, it looks fine from here (ah -- I see where they are. EFF wrongly claims their text is charset=iso-8859-1), which should not use 0x97 (END GUARDED AREA) as em-dash.

Wanna complain to them?

[identity profile] xaosenkosmos.livejournal.com 2006-02-09 10:45 pm (UTC)(link)
Oh, funness. They are, in fact, using CP1252 instead of ISO-8859-1 as they advertize.

Oddly enough, their LJ syndicated feed [livejournal.com profile] eff_news still parses (and renders correctly) with the bogus codepoints. Once upon a time, it would simply choke and die on invalid input. Progress, it's a beautiful thing.

Once i figure out the best way to approach them, i'll drop a note to someone at the EFF.

[identity profile] evan.livejournal.com 2006-02-09 11:21 pm (UTC)(link)
I wrote the code that fixes that. If the feed reports 8859, we scan for the 1252 chars and switch it on 'em. It's hacky, done with regexes, before the XML parser gets to it.

[identity profile] trochee.livejournal.com 2006-02-10 12:15 am (UTC)(link)
just curious:
do you switch the reported encoding or do you correct the CP-1252 characters?

[identity profile] evan.livejournal.com 2006-02-10 05:14 am (UTC)(link)
Switch the reported characters. The code's pretty short: go to cvs.livejournal.org, then livejournal, then navigate dirs to something like bin/maint/synsuck.pl ... [can't look right now, typing in a text console]

[identity profile] boobirdsfly.livejournal.com 2006-02-10 01:24 am (UTC)(link)
That is so lame.

I am totally going to disable my google desktop.
Sucks !!!!
because i love it so much and it indexes all my plays and it's just so great.

why is the government doing this.... why why why ????!!!

[identity profile] trochee.livejournal.com 2006-02-10 01:42 am (UTC)(link)
I think you can keep it, as long as you read the options carefully, and set the settings to not index "across computers".

Read the EFF article for more, if you want.