Wednesday, July 18, 2012

Robot Research Reports Reconsidered!

A note about a retracted nature paper (via Massimo Pigliucci) got me thinking:
Bentham Science accepts computer generated hoax paper for publication without realizing it.

A hoax paper is bad of course, and should be retracted. But computer generated? Computer-generated papers would be good in my view. Any idea just how much of people's time is spent drafting, editing and revising papers? Too much time. Time that you could use for doing your research, or presenting it, or reading up on other people's work.

Imagine a lab set-up with experimental apparatus and analysis system, all run pretty much automatically, once set up for a specific experiment. The system could be completely autonomous, such as a chemical reactor or a robot-assisted wet lab. Or it could be a semi-automatic set-up where a human does much of the actual work, but the system helps them record everything they do, collects all data from cameras, measuring devices and probes and so on. We already have this, pretty much, in many fields ranging from chemistry to physics to plant and animal biology to  behavioural research. We also all have more or less improvised analysis systems for crunching the data once we've gathered it.

That system effectively knows what goes in to the experiment, it knows all details of the experimental procedure — as it was supposed to happen as well as what actually did happen — and it has direct access both to the raw data and to your analysis of it.

It seems to me there is only a short step to use natural language processing to generate a fairly complete Methods and Results section for a paper without you having to enter a single word of your own. You'd simply select the data and the results you want to present (having to explicitly choose would reduce chance of inadvertent cherry-picking), and the system would churn out a complete text, formatted and adapted for the journal of your choice.

It won't rival the very best-written papers of course, but it doesn't have to; nobody expects deathless prose in a research paper. All it has to do is not be _bad_ and it will already be ahead of half the papers out there. Sure, it will be staid and formulaic, but in the context of a research paper that is a good thing. A benefit, not a drawback. You want to be creative and original with your research, not with your paper prose.

And the generated text will be more likely to be correct; the system knows what actually happened, what parameters where used when, in what order you did the analysis, and will not misremember, mix things up or forget. You would have to actively deselect particular experimental runs or data sets, and that decision would of course be on record for later in case someone asks whether your analysis is all aboveboard.

The introduction and the discussion would not be amenable to automation in the same way, at least not in the short to medium term (though the abstract is, I think, once the rest of the paper is done). What you could do is have the researcher draft the sections, perhaps in point form, then have the system interpret it and recast it in the language and formalisms best suited for the chosen journal. A secondary use would be automated revision and reformatting of a paper for a different journal if you're rejected from the first one.

Won't this take our jobs? no - it will let us spend more time working as researchers, and less as amateur editors. In the same way, lab automation simply frees up grad students and post-docs to work on original research, not as lab technicians. I do not fear a time when computers conduct experiments and write research reports. I await it, I embrace it and I can not see it happen fast enough.

No comments: