Sentimental Education

Regarding Blogger Civility, I’d like to add that where there are real threats, of course the person threatened should complain, and if the threatener can be tracked down, there are already laws that apply. Also, some people think that technical subjects aren’t contentious enough to provoke threats; those people apparently haven’t yet gotten crazy rants from people who incorporate technology into their conspiracy theories, or who fear technology because it might help people oppose their favorite policies, or who don’t like technology because they’ve always been afraid of people who understand it, or who don’t like women/gays/blacks/whites/southerners/foreigners/whatever participating in it. And there are people who think the blogosphere is unusual in harboring threats; those people apparently don’t get out much. I wonder what sort of mail somebody like Condoleeza Rice or Hillary Clinton or Barack Obama or John McCain gets?

Anyway, the idea of a blogger code of conduct reminds me of something else:

A technique to detect favorable and unfavorable opinions toward specific subjects (such as organizations and their products) within large numbers of documents offers enormous opportunities for various applications. It would provide powerful functionality for competitive analysis, marketing analysis, and detection of unfavorable rumors for risk management.

Overview, Sentiment Analysis, IBM Tokyo Research Lab, accessed 13 April 2007

Yet another artificial intelligence scheme; ho hum. Or is it?

There’s another application, which is government analysis of dissident opinions. Given that much sentiment analysis and information analysis research is being funded by the U.S. Department of Homeland Security in order to

advance efforts to identify common patterns from numerous sources of information, which may be indicative of potential threats to the nation.

Four Universities to Collaborate on New Homeland Security Research, Department of Homeland Security, Office of the Press Secretay, Contact: 202-282-8010, July 25, 2006

And for such an application, the artificial intelligence scheme probably doesn’t have to actually work well before it gets applied.

Some readers may recall the DARPA office of Total Information Awareness of 2002, which gradually faded from public awareness. Well, its mission to data mine all databases and communications apparently didn’t go away; it just spread across several departments and got more sophisticated.

Back to IBM Japan again:

Our sentiment analysis approach is to extract sentiments associated with polarities of positive or negative for specific subjects from a document, instead of classifying the whole document into positive or negative. The essential issues in sentiment analysis are to identify how sentiments are expressed in texts and whether the expressions indicate positive (favorable) or negative (unfavorable) opinions toward the subject. In order to improve the accuracy of the sentiment analysis, it is important to properly identify the semantic relationships between the sentiment expressions and the subject. By applying semantic analysis with a syntactic parser and sentiment lexicon, our prototype system achieved high precision (75-95%, depending on the data) in finding sentiments within Web pages and news articles.

One would hope other techniques would get better than 75-95% precision, given the possible penalties for those who are caught in false positives as "potential threats to the nation" (no-fly lists, deportation, torture, etc.)

But how can they be more precise? Human language is notoriously subjectively ambiguous and hard to parse even by humans. Dave Weinberger posts a simple example:

"When we moved from the area, my wife’s most frequent complaint was that no one made orange peel beef like Su Hong."

It’s a pithy restaurant review phrased in negatives that is actually positive.

And what about this:

Am I the only guy in this country who’s fed up with what’s happening? Where the hell is our outrage? We should be screaming bloody murder. We’ve got a gang of clueless bozos steering our ship of state right over a cliff….

Is that some crazy threat to the nation? Actually, it’s former captain of industry Lee Iacocca in his new book, Where Have All the Leaders Gone, ranting for the nation by opposing what he sees as faults in the people currently running it. Should we trust the people running it (at any time, whoever they are) to make that distinction? Iacocca will probably get a free pass in any case; he’s rich. But would you, if you wrote the same thing?

Both of the non-anonymous principals (i.e., not those who actually made the anonymous threats) in the current tempest in a teapot go out of their way to say that:

…it would be tragic if this incident were used as a weapon by those who would limit free and open exchange.

Count on it: it will be so used. And a blog civility code such as Tim O’Reilly proposes would be used by those who would limit free and open exchange; probably used as the basis for laws and surveillance including semantic analysis.

Meanwhile, I think Kathy Sierra and Chris Locke have already done one of the best possible things: talk to each other directly, and determine that they have a lot more in common than not. Dialog and discussion are the best ways to promote civility.

Oh, and better risk management for companies than spying on your customers’ conversations would be actually talking to them.