A New article in the December issue of Science details the new software program that Harvard scientists have created to analyze data.  Apparently, the strength of the software is that it can find patterns in enormous datasets, and further, it can detect these patterns even though the scientist doesn’t specify what he or she is looking for.  In other words, the software simply finds things–anomalous patterns– that MIGHT be interesting, rather than finding things that the researcher is trying to find, or things the researcher already is interested in.

This seems a key distinction.  As one of the authors of the paper put it:

“This ability to search for patterns in an equitable way offers tremendous exploratory potential in terms of searching for patterns without having to know ahead of time what to search for,” said David Reshef.

To me there is a big intellectual problem lurking here.  Are we turning too much of our inquiry over to the computer?  The computer should exist to help us answer the questions that we as humans have identified.  But what happens when the computer itself suggests the question?  Assuming that the computer cannot identify all the possible questions that we might want to answer, are we at risk of letting the machine limit the kinds of questions that we ask?

Further, are the computer’s questions the same as ours?  For sure, this dilemma exists in digital humanities.  Sometimes it seems digital humanities projects– on GIS for instance, or especially text analysis using text searching strategies– have the computer cart in front of the intellectual horse.  Do we really CARE about how many times a given word appears in a run of 18th century newspapers?  Or are we discussing this only because the computer allows us to do so?  In a similar way, will the computer at Harvard identify patterns that are important to it, but not intrinsically important to US?  If the machine suggests the answer before we even had the question, is that really intellectual work?  Or are we passive consumers of research then?


2 thoughts on "Harvard scientists train computers to do their research for them?"

  1. Bob– I think you’re overplaying the hand a bit here. Computational techniques offer different things at different points in the research cycle. Humans are very good at pattern recognition up to a certain point of scale. And computers can be very good at pattern recognition beyond that scale, “discovering” patterns with a rapidity and scale beyond individual capacity. And, what the linked article is referencing is big data. You and I, working with 17th and 18th century sources, will never come up on the scale of data that future historians of the current day will. Their tools will have to be different from our tools.

    More to the point, though, what we’re talking about is the difference between deduction and induction, and a difference between seeing our texts as sources of data or as data. Especially at early stages of the research cycle, screwing around with pattern recognition techniques can provide compelling directions for a, as it were, grounded theory of [insert topic here]. Moreover, historians (whether of the eventual type or not) have a long history of viewing the archive as a mine to be exploited for the rare nuggets of narrative or data gold. After extraction, those nuggets are refined further into the polished analysis of the lecture, article, book, and the left over are discarded. The digital turn, however, offers the opportunity to see “the documents” as the data themselves, as opposed to as sources of data.

  2. So… I just thought about this thread earlier today when I came across this quote and was reminded of my “computer cart in front of the intellectual horse” dilemma:

    ‘… minds unduly fascinated by computers carefully confine themselves to asking only the kind of question that computers can answer and are completely negligent of the human contents or the human results.’

    Lewis Mumford, “The Sky Line “Mother Jacobs Home Remedies”,” The New Yorker, December 1, 1962, p. 148

    I am not in the habit of quoting Mumford like this, but I think he identified way back in 62 the kind of problem that worries me… it’s easy to see how the analysis of data, whether big or little, could become an end in itself kind of operation, and meanwhile we could neglect the kinds of questions and intellectual work that has little to do with these kinds of operations. And this neglect could heat up more as the intellectual trends of DH continue, and as funding continues to be directed towards these kinds of projects, as many people have already observed (see http://t.co/5XgonvmE). Surely historians of the future will need tools to analyze data, if they’re interested in trying to aggregate credit card records and blogposts, and online activities of people today (and they surely will be interested in those things). But there will also certainly be software applications that make it possible to answer questions that nobody ever asked, and that’s a little bit weird. Data analysis, discovering patterns, finding anomalies– these are good and interesting things to do, and computers are much better at them than people. But I still… worry… about the computer telling us what we’re looking for.

