It’s funny how the simplest things can be the most inspiring. Last week we parsed through some text files and calculated how many times each word appeared in the file. Simple enough. But start doing it with text that you’ve written, say, in email, to other people, and you start getting an interesting sense of your communication that goes beyond how you think you present yourself to others, and forces you to look at what you actually say to people.
I think it would be interesting to take this same data, and group it by contacts. First, by specific people, but people who represent a different type of contact in my address book. I’d like to do a word analysis for each of these people and compare them to eachother. How much overlap is there between emails to my mother and emails to my sister? Are there words that appear in emails to my boss that don’t appear in emails to my friends? And last, can I create a sort of “spectrum” of these contacts based on the words that I type to them, or that we type to each other? Can I interpret what my relationship is with an anonymous correspondent, just by looking at a collection of key words.
Overtime, using some lessons from Bayesian filtering, I’d like to compare other outgoing/incoming emails to these control groups, and see how they stack up. Do all of my casual acquaintances fit nicely into the casual acquaintance group, or do some of them fall closer to the close friends group? Does this say something about my relationships to those people?
Last, I’d like to visualize this data in some way. While I”m usually wary of data visualizations in general, I think that in this case it’s appropriate. I especially like this processing visualization, We Feel Fine as it deals with verbal expressions without being to wordy or to abstract. I’d like to be able to accomplish something similar.
I was also thinking about doing something similar with IM, mostly because I downloaded this iChatAnalyzer app a few weeks ago, really wanted to use it, but for some reason can’t get it to work on my computer. The more I think about it, though, it seems to me that email is a much better medium for analyzing this sort of thing. For one thing, there is a much broader range of people I email than people I IM (everyone from perfect strangers to my mother). For another, IM has a way of bastardizing any conversation with any person into the same generic IM language, whereas different emails are better at expressing different ideas. Even IM’s with my bosses or professors are determined to revert back to the same “brb” “no probs,” and “l8r”.
I’m trying not to expect too much from my results and I’m hoping that this will be an interesting analyzation tool and learning experience. Probably the best plan of action will be to try it out first, see what happens, and, if it is interesting/telling enough, continue on with the data set to create a visualization, perhaps for my final in this class. I’m pretty sure I can lump this is with thesis research/experiments as well, naturally.