Thursday, June 16, 2011

Spellchecker to Weed Out Botanical Typos [Updated]

[Update 17/6/11]
Following the comments by araygoza to this article (see below), I ran my list of misspelled plant names through the TNRS spellchecker and voila! it works!

That's really cool! The new URL is here.

Scientific nomenclature - the business of giving names to organisms - is a huge book-keeping exercise that is notoriously error-prone. Not only may one species be given multiple binomials, as scientists argue whether it should be classified one way or another, but misspellings and typos are easy to make.

Typos directly affect scientific research, which often relies on analyzing data from species databases. For example, a simple count of how many species are present in a certain place may be inflated because some name records have been misspelled. For botanists, there is now a possible solution available online. A team from several institutions has launched the Taxonomic Name Resolution Service, which uses technology similar to spell-checking software to detect erroneous names and suggest corrections.

I decided to give the system a try (click on "Try it now!" on the main page), using the names of four common plants with single-letter misspellings, listed below:
  • Cocos nuciferra
  • Samana saman 
  • Pterocarpus indicum
  • Ficus grossulariodes
The results showed that these names were not found in the database, but it couldn't suggest any possible matches, which was a disappointment. Just in case they weren't actually in the database (it's based on TROPICOS, which focuses on plants from the Americas), I ran the correctly-spelled names through the checker and got a 100% match on each one.

From this small trial, which admittedly is not very thorough, I would say that the fuzzy-logic system for suggesting correct names is not really effective right now. However, the batch-check function is useful if one has a large database of records to sort through for validity. They have posted their source code and also are developing an API (application programming interface) for the service, so developers (or botanists with some computing savvy) can readily plug in their own systems.

(Via Nature News.)


araygoza said...

Hi, i'm sorry to hear you didn't found those names by now, we will be updating the name list to cover all the names on tropicos, stay tuned!

Brandon said...

cool! i should make it clear that it couldn't find the correct matches to the misspelled names

araygoza said...

We just updated the application to contain the full tropicos api and the spellchecking is working fine, please try it again, there was some issue before with that. We appreciate your comments.

The application is available through the url.