Connecting MediaWiki to Wikidata

in #mediawiki2 years ago (edited)

wiki-tutorial-banner-wikiwerkers.jpg


This post is also available in dutch

This post is about using structured data from Wikimedia's Wikidata project on your private MediaWiki website. It's not implemented through the Wikibase-client, but in an alternative setup using the general purpose "External Data" extension.

About the Wikimedia Wikidata-project

Wikidata is technically based on the standard MediaWiki software, extended with the Wikibase software (server + client) to store and access the triplets. Wikidata is growing rapidly and become the world largest free and open semantic triplets store on the internet. That's the good news. The setback is, that using Wikidata other wiki's than Wikipedia is not supported by the Wikibase software. There are technical reasons for this, but still, it's disappointing.

About Semantic MediaWiki

The "other" MediaWiki data store is Semantic MediaWiki (SMW). It's a stable, well maintained and widely spread extension for MediaWiki, mainly used on private wiki installations. It would be very desirable to have SMW interconnect with Wikibase, but this not yet been done. So there's no way to add Wikidata info to your Semantic MediaWiki store. "Linked Open Data" is still a vision, rather than a reality.

Imagine how cool it would be if you could use data from Wikidata in your own Infobox templates! It could automatically add location data, street addressed, zip codes, email, date of birth, etc. etc. etc. on your wiki without having to add this information yourself.

Using the Wikidata API for data extraction

MediaWiki is open software developed with the principle of "information is for sharing" in mind. For that purpose all MediaWiki-project have an API available. The same goes for the Wikidata-project: apart from the standard, human readable "Linked Data Interface" has a very functional and well documented API.

Example using External Data for extracting information

I'll give you a quite simple example of project we did for the TheaterEncyclopedie, aimed to extract location-information about theaters in the Netherlands from Wikidata. The TheaterEncyclopedie is a Semantic MediaWiki project, which also has the External Data extension implemented.

The example below is a live example of a page about Het Muziektheater, Amsterdam on the TheaterEncyclopedie.nl

It shows an infobox with information which was locally entered, as well as a dropdown table with information imported from Wikidata. Some information, such as the image, has been transferred to the local Infobox.
Currently the imported information is not stored in the Semantic Database of the TheaterEncyclopedie, which could of course be a next step. For now it's only being used for editors to add, compare and check information.

Example Infobox using local structured data and imported semantic information from Wikidata

Procedure, technical implementation

The procedure to import the data consists of 3 steps:

  1. Enter the Wikidata reference number in the Theater Infobox on the TheaterEncyclopedie
  2. Querying the Wikidata API for the relevant data
  3. Display the retrieved data in a table

Let's elaborate the querying, since this is where the specific magic happens! In fact the Wikidata API is queried three times:

  1. The first time for the desired information stored in semantic triplets ('claims') about the entity
  2. Then for the desired information or 'properties' of the entity itself
  3. And last but not least for related information that was retrieved in the first query, in this case the (human readable) name of the city.

The MediaWiki extension "External Data" supports API-calls in f.e. XML or JSON formats as well as filtering of the retrieved data using XPATH (also see the - limited - documentation or check out the full source code, which is freely available here.).


Step 1 - Query for data ('claims') of the specific reference number (Q-number=Q1325514):

https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q1325514&props=claims&languages=nl&format=xml

Click the link for the example's output: link

The data retrieved from the API is then filtered and selected info is imported:

  • Image: //mainsnak[@property="P18"]/datavalue/@value
  • Zipcode=//mainsnak[@property="P281"]/datavalue/@value,
  • Date of opening: //mainsnak[@property="P1619"]/datavalue/value/@time,
  • Geo-location, latitude: //mainsnak[@property="P625"]/datavalue/value/@latitude,
  • Geo-location longitude: //mainsnak[@property="P625"]/datavalue/value/@longitude,
  • MediaWiki Commons categorie=//mainsnak[@property="P373"]/datavalue/@value,
  • Located in=//mainsnak[@property="P131"]/datavalue/value/@id

Step 2: Query for the url ('props=sitelinks') to the website and Wikipedia page related to the reference number (Q-number):

https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q1325514&props=sitelinks&languages=nl&format=json

Click the link for the example's output: link

The data retrieved from the API is then filtered and selected info is imported::

  • Website: site,
  • Wikipedia paginatitle: title

Step 3: Secondary query for data ('claims') about referenced location ('city') with reference number Q9899) from the first query:

https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Q9899&language=nl&format=xml

Click the link for the example's output: link
Filtered and imported info:

  • Name of location: //entity/@label

The information gathered from the above API calls can then be displayed in the table (see example animation) and used to retrieve the image from Wikimedia Commons project. All data processing is supported by the MediaWiki extension External Data and is contained in a single template. The input required for this template is enkel het Wikidata reference number (the Q-number). The operation of this procedure has been found to work stably and reliably in practice.

Some additional thoughts

At first sight this method may seem a bit inefficient: the reference number (Q-number) is added manually and the querying is done on a page per page basis instead of a single mass-import or synchronization.

Of course this can be automated, but looking closer at this method, it definitely has an important advantage related to information quality control. Linking open data into your wiki always raises questions about how to control the quality; you don't want to mass-import errors!!

I'll give you and example, how information can be added to a wiki's semantic database, while maintaining in control of the information quality.

Strategy for storing the information in the Semantic database

  • If no local value is entered in the TheaterEncyclopedie, use the imported value
  • If a local value is entered, store and display the locally entered value
    -- Signal if the locally entered value differs from the imported value from Wikidata (f.e. red color)
    -- If there is no value available in Wikidata, display a link to update Wikidata

Other examples

Conclusion

As long as there exists no dedicated MediaWiki-extension to funnel Wikidata information into Semantic MediaWiki, the multi-purpose External Data extension offers a great way to do this trick for you.

It lets you use information from Wikidata within you wiki, while also keeping control of the information quality.

We can help you linking your wiki to Wikidata

WikiWerkers are skilled professionals and have experience linking your wiki to Wikidata for standard and semantic wiki's. We're flexible and ready to help you building your own wiki-project.
Send us a message and get to know us. WikiWerkers are also available on Discord.


Contact the author on LinkTree


~+~ WikiWerkers -~- Personal Website -~- Twitter ~+~