Using the Reconciliation Service
The recommended way to use the Reconciliation Service is with OpenRefine. This tool, previously called Google Refine, can query a Web Service — a website that returns information in a form the computer can interpret — and record the results, whether that’s an exact match, a close match, a list of possible matches, or no match at all.
Software overview and installation
Watch the three introductory videos — these instructions assume some familiarity with Open Refine. There’s also written documentation.
- General introduction, editing messy data
- Transforming semi-structured data into properly structured data
- Calling a web service to supplement the dataset, reconciliation
Users at Kew: OpenRefine has been installed on the network. Go to X:\apps\OpenRefine\
and double-click refine.bat
. When you have finished, press Control C
in the black window that pops up to close the program properly.
Users elsewhere:
Open Refine needs Java to run. If your computer supports it choose 64-bit Java — this allows working
on larger datasets that consume more memory.
A version of OpenRefine including the Kew extension is available
via GitHub (recommended).
Alternatively, download Open Refine from the download page.
Choose the development version, currently 2.6-beta1
. This does not include the
Kew extension, so
the functionality to extend data using The Plant List will not be available.
Data preparation
The services are easiest to use if the whole name (or value to be reconciled) is in a single column, like
Quercus alba L.
or
Quercus alba f. latiloba Sarg.
. Better results can sometimes be obtained with a column
for each necessary part (e.g. generic epithet, species epithet, publication title etc).
You can use Open Refine to do this — see the videos — or any other program.
Optionally, use facets to limit which names you wish to match — for example, to select particular ranks to match. If you have a lot of names (over 1000) you could star 10 or so names and facet on them, for a trial run.
Find the configuration you want to use from the list here. Note the two endpoints: the Open Refine reconciliation service, and the JSON web service. These instructions will assume you have a list of plant names and wish to reconcile them against the IPNI Name reconciliation service.
Querying the Reconciliation Service
- If you have whole entities (e.g. full scientific names) in a single column, choose that column
- Otherwise, choose a column unique to each record, like an identifier.
- Click the column heading, and choose Reconcile → Start reconciling….
-
If this is the first time you’ve reconciled against a
particular service, you will need to click Add Standard
Service. Enter the URL from the Reconciliation Service website,
for example
http://data1.kew.org/reconciliation/reconcile/IpniName;jsessionid=383876AF11126C963E10CF4CAFCF8E88.kppapp01
, and click OK. - Select the service from the list on the left. After a moment, the dialog is filled in with options.
-
If you have columns for genus, species etc fill in the text
boxes for Also use relevant details from other columns.
The values to fill in come from those listed on the website
describing the service (in this case,
epithet_1
,epithet_2
etc). - Click Start Reconciling
-
Results appear after a while. Where there’s a single
possibility it will have been automatically selected. Otherwise,
you can select the match using the tick boxes.
It’s likely you will receive multiple results where IPNI has duplicate names. We hope to hide the duplicates from IPNI in the near future. -
If matching hasn’t worked you can also click Search
for match and adjust the query.
-
To get the identifiers: click the column, choose Add column based on this column…
and use the expression
cell.recon.match.id
. To get the name usecell.recon.match.name
instead. This is a GREL expression — see the GREL Functions Documentation for more information.
Extending data using the Metaweb Query Language service
Data that has already been (partially!) reconciled against IPNI and presented through a MQL service can be added to your data. At present, only some data from The Plant List is available in this way.
- Click a reconciled column heading and choose Edit column → Add column by fetching URLs.
- This shows a list of available properties — choose one or more properties from this list and click OK.
Using the results
You can then export the results into CSV (or other standard formats) using the Export menu.
Troubleshooting
This section will be completed as we discover problems — please let us know!. Allocating more memory may help, refer to the OpenRefine documentation on this.
Advanced data preparation / manipulation
It's possible to use some of the transformers that are behind these reconciliation services to prepare your data. For example, you may wish to extract a year out of a field containing a whole reference. See the String Transformers project for how to do this.
Source code
The source code is available on Kew’s GitHub page.