Open Access Media Importer: Page Templates, Automatic Import

Since the last post, most work was done on page templates for uploaded media files. Page names are now based on article titles and both the database and pages created by the Open Access Media Importer now contain articles’ DOIs. Page categories based on journal names and MeSH terms were also introduced, the latter soon turned out to be too broad (“Female” is not very specific). A good example is this page (screenshot).

On the interface side, found media are now shown grouped by media type count:

Screenshot of the Open Access Media Importer, running the command “oa-cache find-media pmc_doi”

Daniel Mietchen proposed a shell script to automate importing media via the pmc_doi source. As pmc_doi reads DOIs from stdin, it can be used both interactively and programmatically (for example, echo 10.1371/journal.pone.0002365 | ./oami_pmc_doi_import). Since the script is rather short and illustrates the usual workflow of running the Open Access Media Importer, it is shown below:


# clear database to get rid of old data
./oa-cache clear-database pmc_doi

# normal workflow for OAMI
./oa-get download-metadata pmc_doi
./oa-cache find-media pmc_doi
./oa-get download-media pmc_doi
./oa-cache convert-media pmc_doi
./oa-put upload-media pmc_doi

I rewrote the code that tries to make sense of plain text licensing statements, mapping them to proper licensing URLs like, correcting several bugs in the process. I am now convinced that publishers providing only plain text licensing information are one of the biggest obstacles to identifying materials that are suitable for Wikimedia Commons.

Daniel Mietchen has identified a number of articles that cause the import to fail at various stages. While I am working on correcting issues arising from my own code, I do not think that GStreamer being unable to decode a particular quicktime file or producing pixelated output are issues I will be able to solve on my own.

This entry was posted in Open Access Media Importer, Tools and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *