Using the Reconciliation Service

The recommended way to use the Reconciliation Service is with OpenRefine. This tool, previously called Google Refine, can query a Web Service — a website that returns information in a form the computer can interpret — and record the results, whether that’s an exact match, a close match, a list of possible matches, or no match at all.

Software overview and installation

Watch the three introductory videos — these instructions assume some familiarity with Open Refine. There’s also written documentation.

Users at Kew: OpenRefine has been installed on the network. Go to X:\apps\OpenRefine\ and double-click refine.bat. When you have finished, press Control C in the black window that pops up to close the program properly.

Users elsewhere: Open Refine needs Java to run. If your computer supports it choose 64-bit Java — this allows working on larger datasets that consume more memory. A version of OpenRefine including the Kew extension is available via GitHub (recommended). Alternatively, download Open Refine from the download page. Choose the development version, currently 2.6-beta1. This does not include the Kew extension, so the functionality to extend data using The Plant List will not be available.

Data preparation

The services are easiest to use if the whole name (or value to be reconciled) is in a single column, like Quercus alba L. or Quercus alba f. latiloba Sarg.. Better results can sometimes be obtained with a column for each necessary part (e.g. generic epithet, species epithet, publication title etc). You can use Open Refine to do this — see the videos — or any other program.

Optionally, use facets to limit which names you wish to match — for example, to select particular ranks to match. If you have a lot of names (over 1000) you could star 10 or so names and facet on them, for a trial run.

Find the configuration you want to use from the list here. Note the two endpoints: the Open Refine reconciliation service, and the JSON web service. These instructions will assume you have a list of plant names and wish to reconcile them against the IPNI Name reconciliation service.

Querying the Reconciliation Service

  1. If you have whole entities (e.g. full scientific names) in a single column, choose that column
  2. Otherwise, choose a column unique to each record, like an identifier.
  3. Click the column heading, and choose ReconcileStart reconciling….
  4. If this is the first time you’ve reconciled against a particular service, you will need to click Add Standard Service. Enter the URL from the Reconciliation Service website, for example http://data1.kew.org/reconciliation/reconcile/IpniName;jsessionid=383876AF11126C963E10CF4CAFCF8E88.kppapp01, and click OK.
  5. Select the service from the list on the left. After a moment, the dialog is filled in with options.
  6. If you have columns for genus, species etc fill in the text boxes for Also use relevant details from other columns. The values to fill in come from those listed on the website describing the service (in this case, epithet_1, epithet_2 etc).
  7. Click Start Reconciling
  8. Results appear after a while. Where there’s a single possibility it will have been automatically selected. Otherwise, you can select the match using the tick boxes.

    It’s likely you will receive multiple results where IPNI has duplicate names. We hope to hide the duplicates from IPNI in the near future.
  9. If matching hasn’t worked you can also click Search for match and adjust the query.
  10. To get the identifiers: click the column, choose Add column based on this column… and use the expression cell.recon.match.id. To get the name use cell.recon.match.name instead. This is a GREL expression — see the GREL Functions Documentation for more information.

Extending data using the Metaweb Query Language service

Data that has already been (partially!) reconciled against IPNI and presented through a MQL service can be added to your data. At present, only some data from The Plant List is available in this way.

  1. Click a reconciled column heading and choose Edit columnAdd column by fetching URLs.
  2. This shows a list of available properties — choose one or more properties from this list and click OK.

Using the results

You can then export the results into CSV (or other standard formats) using the Export menu.

Troubleshooting

This section will be completed as we discover problems — please let us know!. Allocating more memory may help, refer to the OpenRefine documentation on this.

Advanced data preparation / manipulation

It's possible to use some of the transformers that are behind these reconciliation services to prepare your data. For example, you may wish to extract a year out of a field containing a whole reference. See the String Transformers project for how to do this.

Source code

The source code is available on Kew’s GitHub page.


Version UNKNOWN.