From the beginning, the Open Access Media Importer was intended to be modular. Only with the latest patches that feature has actually landed, as a result of decoupling interface and actual application logic: Data sources are now Python modules that expose a number of functions for data retrieval and refinery.
When issuing a command like oa-get metadata pubmed, the wrapper script imports the chosen module (in this case sources/pubmed.py) and calls the module’s function associated with the action (in this case, the metadata action is associated with the download_metadata function). Those functions then do their work and communicate their state back to the calling process.
However, the functions called by the wrapper scripts are unlike most “normal” functions: They are generators, functions that can save state on returning values and resume execution on the next call. In practical terms, that means a function does not have to yield a complete result to the caller: It can provide information in chunks to be iterated over – be it download completion or refined metadata.
To demonstrate and test the implementation, I created a dummy module that returns fake data before re-writing the code that crawls pubmed for the new interface. Compared to the old spaghetti code, the simplicity is beautiful: Only two functions are needed to provide a new data source – and much of the dummy module’s content is simply the fake data it yields.
New frontend functionality is provided by oa-cache‘s list-articles action: Inspired by a request from Daniel Mietchen to find out what papers about Malaria are licensed under a specific Creative Commons license, it returns metadata for articles as CSV (commonly known as “the format that Excel can read”). With that, the mentioned task becomes as easy as oa-cache list-articles pubmed | grep Malaria | grep 'creativecommons.org/licenses/by/'.
Implementing the list-articles action helped me to iron-out many corner cases in parsing the data returned by PubMed Central and laid the groundwork for the reworked find-media action that extracts supplementary materials. The next article will concern itself with downloading, converting and uploading media files from supplementary materials.