There’s heaps and heaps of text mining going on in this Economist article about researchers tracking the use of words to find the origin of “original ideas.” They started with some concept association,
For example, “Big Bang” and “black hole” often will co-occur, but not as often as each does with “galaxy”. Neither, however, would be expected to pop up next to “genome”. This captures the intuition that the first three terms, but not the fourth, are part of a single topic.
And then they started working on the clever stuff,
But Dr Blei found himself wondering if his method could yield any truly novel insights into the scientific method. And he thinks it can. In tandem with Sean Gerrish, a doctoral student at Princeton, he has now produced a version that not only peruses text for topics, but also tracks how these topics evolve, by looking at how the patterns in each topic bin change from year to year.
The new version is able to trace a topic over time. For example, a 1903 paper with the evocative title “The Brain of Professor Laborde” was correctly assigned to the same topic bin as “Reshaping the Cortical Motor Map by Unmasking Latent Intracortical Connections”, published in 1991. This allows important shifts in terminology to be tracked down to their origins, which offers a way to identify truly ground-breaking work—the sort of stuff that introduces new concepts, or mixes old ones in novel and useful ways that are picked up and replicated in subsequent texts. So a paper’s impact can be determined by looking at how big a shift it creates in the structure of the relevant topic.