Goodbye Aaron Swartz – and Long Live Your Legacy

The following entry is reposted from the OKFN’s main blog. It was written by Jonathan Gray and is licensed CC BY 3.0. The photo is by Daniel J. Sieradski (on Flickr), licensed CC BY-SA 2.0.


January 14, 2013 in Access to InformationBibliographicCampaigningFeaturedNewsOpen AccessOpen DataOpen Government DataPolicy

Aaron Swartz, coder, writer, archivist and activist, took his own life in New York on Friday. Aaron worked tirelessly to open up and maximise the societal impact of information in three areas which are central to our work at the Foundation: public domain cultural works, public sector information, and open access to publicly funded research. He was one of the original architects behind the Internet Archive’s Open Library project, which aims to create ‘one web page for every book’. While he was there we compared notes about trying to automatically estimatewhich works are in the public domain in different countries around the world. This was part of a broader vision to enable public access to the public domain, and to ensure that digitisation initiatives result in open digital copies of public domain works that everyone is free to use and enjoy, not just copies owned and protected by large corporations who might sell or restrict access to the world’s heritage. Around this time Aaron and I met in San Francisco to co-draft a petition to the Library of Congress to encourage them to take a leading role in opening up data from the world’s libraries and memory institutions. This was several years before a wave of institutions started explicitly opening up data about their holdings. We remained in contact regarding his work on open government data in the US. Aaron was involved in drafting the highly influential 8 principles for open government data. We wanted to try to better coordinate developments on either side of the Atlantic. Later he was in the papers for downloading around a fifth of the US government’s huge Public Access to Court Records (PACER) system, around 780 gigabytes, and releasing it for free to the public (access was usually charged by the page) – which earned him an FBI file. In his 2008 Guerilla Open Access Manifesto Aaron argued that “the world’s entire scientific and cultural heritage, published over centuries in books and journals, is increasingly being digitized and locked up by a handful of private corporations” and, “in the grand tradition of civil disobedience”, urged internet users to “fight back”:
We need to take information, wherever it is stored, make our copies and share them with the world. We need to take stuff that’s out of copyright and add it to the archive. We need to buy secret databases and put them on the Web. We need to download scientific journals and upload them to file sharing networks. We need to fight for Guerilla Open Access.
In 2010 he founded Demand Progress, which helped to mobilise over a million people in response to proposed legislation like the Combating Online Infringement and Counterfeits Act (COICA). In 2011 he again hit the headlines when he was arrested for downloading roughly 4 million subscription-only academic articles from JSTOR by placing a laptop in a computer cupboard at MIT and using this to gain unauthorised access to the JSTOR service. The prosecution alleged that he intended to make these articles freely available on the web. Last September the US Federal Government raised the felony count from four to thirteen, which meant that Aaron was potentially facing a total of 50+ years and a fine in the area of $4 million for his actions. His familysuggested that the case was a factor in his death – and blamed the Massachusetts U.S. Attorney’s office for “intimidation and prosecutorial overreach” and MIT for “refus[ing] to stand up for Aaron and its own community’s most cherished principles”. The president of MIT has just announced that he has ordered an investigation into their role in Aaron’s prosecution. As Peter Eckersley from the Electronic Frontier Foundation commented on Saturday:
While his methods were provocative, the goal that Aaron died fighting for — freeing the publicly-funded scientific literature from a publishing system that makes it inaccessible to most of those who paid for it — is one that we should all support.
While Aaron was deeply involved in all kinds of technical, scholarly and organising activities to promote an open digital commons and an open internet – from helping to develop RSS 1.0 and Markdown, to early sketches of the semantic web with some of its pioneers and work on the first technical implementations of the Creative Commons licenses – he also never lost sight of the bigger picture, of what it was all for. He was a talented coder and knew how to take a principled stance, but he was never one to get lost in detail or dogma. From his writings about how data-driven transparency initiatives are not enough to effect change in themselves, to his guide to developing software that addresses real needs, he was always aware of the fact that using the information, technology and the internet to change the world is not easy, and requires graft, skill, scrutiny, critical reflection and taking risks. Aaron’s passing is a tremendously sad and significant loss. Long live his legacy.   To find out more about Aaron’s life and works, you can look at his writings and the memorial site set up by his family. You can also read tributes from Tim Berners-LeeCory DoctorowBrewster KahleLawrence Lessig, andErik Moeller, and read obituaries and news articles on the BBCForbesGigaom, the Guardian, the Huffington Post, the New York TimesTechdirt, the Telegraph and Wired. In tribute, hundreds of academics have startedtweeting links to their research papers using the hashtag #pdftribute. The Internet Archive has started an Aaron Swartz Collection.
Posted in In the news | Tagged , | Leave a comment

Open Access Report December 2012

 

Since January 2012, I have been posting a monthly summary of Open-Access-related activities pertaining to Wikimedia projects as part of the GLAM Newsletter on the Wikimedia Outreach wiki. I have also occasionally contributed Tool Testing reports to the same GLAM newsletter, usually taking OA-related materials as examples.

I am posting these reports also here on the blog in order to reach out to a wider audience. This breaks some of the formatting, so please go to the respective wiki page to see things in a proper setting, including proper attribution.

Links to all Open Access reports posted so far: JanuaryFebruaryMarchAprilMay, JuneJulyAugustSeptemberOctoberNovemberDecember.

Links to all Tool Testing reports posted so far: JanuaryFebruaryDecember.

Text is available under the Creative Commons Attribution/Share-Alike License 3.0. Licenses of images and media files used in the reports may differ but are always compatible with reuse in a CC BY-SA 3.0 environment.


Open Access Report

GLAM‎ | Newsletter‎ | December 2012‎ | Contents

This month in GLAM logo.png


 
OPEN ACCESS REPORT

1 year Open Access File of the Day

A bat in flight at night. Open Access File of the Day on December 11, 2011.

1 year Open Access File of the Day

On December 1, the Open Access File of the Day initiative turned one year. Since December 2011, a file originating from an open access source has been posted on a daily basis, the only exception being January 18, 2012, when a copy of theResearch Works Act article served as Open Access File of the Day for the duration of the SOPA blackout. During the first year, the main prerequisite for a file to be featured this way was that it be used at least twice across Wikimedia projects outside user space, and on the occasion of the anniversary, that criterion was raised to a usage of three times or more. About 400 files have been posted so far, originating from some 40 journals, half of which are published by BioMed Central. The most frequent sources of the files were PLOS Biology (136 files), PLOS ONE (76) and ZooKeys (40). According to BaGLAMa (see also the Tool testing report), these 400 files gathered a total of 10 million page views in December 2012, which is almost half of the 23 million for the entire category Open access (publishing), home to about 15,000 files (two thirds of which have been uploaded over the last few months by the Open Access Media Importer. On the occasion of the anniversary, this issue of the Open Access report ends with a listing of not only the files featured in December 2012 but also those early ones featured in December 2011, when no Open Access Report was compiled that they could have been included in.

Articles started this month

Growth of the number of open access repositories indexed in ROAR.
According to TreeViews, 30 articles on OA topics have been started this month across languages, including

Open Access Media Importer

Gallery

The following files represent a selection of what has been uploaded by the Open Access Media Importer this month. If you can think of wiki pages where these files could be useful, please put them in there or let us know.

WikiProject Open Access

The following news from WikiProject Open Access have been posted this month:

Open Access File of the Day

The following files have been featured as Open Access File of the Day in December 2011 and December 2012:

2011

2012

 
+ Add a commentDISCUSS THIS STORY
TO FOLLOW COMMENTS, ADD THE PAGE TO YOUR WATCHLIST. IF YOUR COMMENT HAS NOT APPEARED HERE, YOU CAN TRYPURGING THE CACHE.
No comments yet. Yours could be the first!
SHARE:  EMAIL · TWITTER · IDENTICA · FACEBOOK · GOOGLE+ · DELICIOUS · REDDIT · DIGG · STUMBLEUPON
CHECK BACK FOR A NEW EDITION NEXT MONTH.
Home Suggestions Subscribe Archives Newsroom

Tool Testing Report 

GLAM‎ | Newsletter‎ | December 2012‎ | Contents

This month in GLAM logo.png


 
TOOL TESTING REPORT

BaGLAMa is back

This photograph is used on en:Isaac Newton, as it depicts a descendant of the tree that supposedly yielded the apple that helped Newton develop his understanding ofgravity. Seen in the December 2012 BaGLAMa record for “Cambridge University Botanic Garden”.
One of the most important tools for tracking the usage of files uploaded to Wikimedia Commons is BaGLAMa, which provides a summary of pageviews of Wikimedia pages that contain images from specific categories on Commons (seeJanuary 2012 report). It also provides for an easy way to compare the stats for any two months on record (see February 2012 report). The tool has had trouble finishing its job in recent months, due to the stressed Toolserver. Magnus Manske, the developer behind BaGLAMa, has taken this as an occasion to review and rework his code, with the result that the tool can now be used again. He also re-calculated the stats for those earlier months where a record was missing, which resulted in an overestimation of the respective pageviews, since category membership was taken as of the time of running, not as of the month in question (which is difficult to implement and would drain even more sources from the Toolserver). The stats for December 2012 had been computed within a few days, and on the basis of the results for Category:Open access (publishing), it can be seen that the combined pageviews for images from that category rose from about 16.5 million last December (discounting an image featured on the Main Page of the Russian Wikipedia) to 23 million in December 2012. In the following, I would like to highlight another use case of BaGLAMa – that of helping to identify articles that use a relatively large number of images from the target category. As an example, articles are listed that use at least five files from Category:Open access (publishing).
45 files
  • Aegista vulgivaga dart.jpg
  • Bradybaena similaris dart.jpg
  • Cantareus aperta dart.jpg
  • Arianta arbustorum dart.svg
  • Chilostoma glaciale dart.jpg
  • Chilostoma cingulatum dart.jpg
  • Chilostoma planospira dart.jpg
  • Cantareus aspersus dart.jpg
  • Cepaea hortensis dart.jpg
  • Cepaea nemoralis dart.jpg
  • Cernuella cisalpina dart.jpg
  • Cernuella hydruntina dart.jpg
  • Cernuella virgata dart.jpg
  • Eobania vermiculata dart.jpg
  • Euhadra amaliae dart.jpg
  • Euhadra quaesita dart.jpg
  • Euhadra sandai dart.jpg
  • Fruticicola fruticum dart.svg
  • Helicigona lapicida dart.jpg
  • Helicella itala dart.jpg
  • Helix lucorum dart.jpg
  • Helix pomatia dart.jpg
  • Helminthoglypta nickliniana dart.jpg
  • Helminthoglypta tudiculata dart.jpg
  • Monachoides vicinus dart lateral.jpg
  • Marmorana scabriuscula dart.jpg
  • Marmorana serpentina dart.jpg
  • Leptaxis erubescens dart.jpg
  • Hygromia cinctella dart.jpg
  • Monachoides vicinus dart.jpg
  • Love-darts.png
  • Monadenia fidelis dart.jpg
  • Humboldtiana nuevoleonis dart.svg
  • Polymita picta dart.jpg
  • Otala lactea dart.jpg
  • Perforatella bidentata dart.jpg
  • Perforatella incarnata dart.jpg
  • Pseudotrichia rubiginosa dart.jpg
  • Trichia hispida dart.jpg
  • Trichia striolata dart.jpg
  • Theba pisana dart.jpg
  • Xeromunda durieui dart.jpg
  • Xerosecta cespitum dart.jpg
  • Xerarionata kellettii dart.jpg
  • Xerotricha conspurcata dart.jpg
26 files
  • Morbus Fabry Cornea verticillata 01.jpg
  • Morbus Fabry EKG 01.jpg
  • Morbus Fabry kidney biopsy 02.jpg
  • Angiokeratoma 01.jpg
  • Morbus Fabry Angiokeratoma 01.jpg
  • Morbus Fabry DXA 01.jpg
  • Morbus Fabry EKG 02.jpg
  • Morbus Fabry Genotyping 01.jpg
  • Morbus Fabry Hypoacousia 01.jpg
  • Morbus Fabry kidney biopsy 01.jpg
  • Morbus Fabry kidney biopsy TEM 01.jpg
  • Morbus Fabry kidney biopsy TEM 02.jpg
  • Morbus Fabry kidney biopsy TEM 03.jpg
  • Morbus Fabry LVH echo 01.jpg
  • Morbus Fabry LVH echo 02.jpg
  • Morbus Fabry male with mother.jpg
  • Morbus Fabry MRA 01.jpg
  • Morbus Fabry MRI 01.jpg
  • Morbus Fabry MRT Osteoporosis 01.jpg
  • Morbus Fabry pulvinar sign 01.jpg
  • Morbus Fabry Skin 01.jpg
  • Morbus Fabry skin biopsy 01.jpg
  • Morbus Fabry Skin Rash 01.jpg
  • Morbus Fabry Stroke 01.jpg
  • Morbus Fabry Stroke MRT 01.jpg
  • Morbus Fabry Tissue Doppler 01.jpg
17 files
  • Catocala benjamini benjamini.JPG
  • Catocala beutenmuelleri.JPG
  • Catocala carissima.JPG
  • Catocala concumbens2.JPG
  • Catocala delilah mounted.JPG
  • Catocala diantha.JPG
  • Catocala dionyza2.JPG
  • Catocala elda2.JPG
  • Catocala euphemia.JPG
  • Catocala grotiana.JPG
  • Catocala irene2.JPG
  • Catocala jessica.JPG
  • Catocala luctuosa.JPG
  • Catocala nurus.JPG
  • Catocala rosalinda.JPG
  • Catocala sancta.JPG
  • Catocala unijuga.JPG
11 files
  • Mycosphaerella graminicola 10.png
  • Mycosphaerella graminicola 14.png
  • Mycosphaerella graminicola 2.png
  • Mycosphaerella graminicola 3.png
  • Mycosphaerella graminicola 4.png
  • Mycosphaerella graminicola 5.png
  • Mycosphaerella graminicola 6.png
  • Mycosphaerella graminicola 7.png
  • Mycosphaerella graminicola chromosomes.png
  • Mycosphaerella graminicola 8.png
  • Mycosphaerella graminicola 9.png
9 files
  • C amelogenesis imperfecta.jpg
  • A amelogenesis imperfecta.jpg
  • Amelogenesis.jpg
  • B amelogenesis imperfecta.jpg
  • D amelogenesis imperfecta.jpg
  • E amelogenesis imperfecta.jpg
  • F amelogenesis imperfecta.jpg
  • G amelogenesis imperfecta.jpg
  • H amelogenesis imperfecta.jpg
8 files each
  • Paratype of Paedophryne amauensis (LSUMZ 95004).png
  • Paedophryne dekot2.jpg
  • Map of Paedophryne localities 2.png
  • Paedophryne kathismaphlox, dorsal view.jpg
  • Paedophryne kathismaphlox.jpg
  • Paedophryne oyatabu.jpg
  • Paedophryne verrucosa2.jpg
  • Paratype of Paedophryne swiftorum.png
7 files
  • Open Access logo PLoS white.svg
  • Greenmand60.png
  • Roar1aug2011.png
  • Roarmap1aug2011.png
  • Bjorkspring.png
  • Development of Open Access.png
6 files
  • Schematic illustration of differences in neuronal specification and migration patterns between the mammalian and avian pallium.png
  • Pone.0001454.g007.jpg
  • Differential reelin levels in the cortex of adult high and low LG rats.gif
  • Journal.pone.0000252.g008.png
  • Journal.pone.0001454.g005 center cropped.jpg
  • Profile of intense and punctate reelin IR during hippocampal maturation journal pone 0005505 g001 cr.png
6 files
  • Bradypodion taeniabronchum.jpg
  • Bradypodion transvaalense dominant.jpg
  • Bradypodion atromontanum1.jpg
  • Bradypodion caffrum submissive.jpg
  • Bradypodion caffrum dominant.jpg
  • Bradypodion transvaalense submissive.jpg
6 files
  • Azhdarchfeedingwittonnaish2008.png
  • Azhdarchwingshapewittonnaish2008.png
  • Haenamichnuswittonnaish2008.png
  • Pneumatic Anhanguera santanae.jpg
  • Pterosaurs.jpg
  • Life restoration of a group of giant azhdarchids, Quetzalcoatlus northropi, foraging on a Cretaceous fern prairie.png
6 files
  • Acromegaly growth hormone levels.JPEG
  • Acromegaly hands.JPEG
  • Acromegaly pituitary macroadenoma.JPEG
  • Acromegaly prognathism.JPEG
  • Acromegaly facial features.JPEG
  • Acromegaly treatment diagram.JPEG
6 files
  • Psoriatic arthritis ankle ar1934-3.gif
  • Psoriatic arthritis dactylitis ar1934-4.gif
  • Psoriatic arthritis digit ar1934-2.gif
  • Psoriatic arthritis fingers ar1934-1.gif
  • Psoriatic arthritis spine ar1934-6.gif
  • Sacroiliitis MRI ar1934-5.gif
6 files
  • Clostridium perfringens gas gangrene.jpg
  • Gas gangrene pathology slide.jpg
  • Gas gangrene shoulder.jpg
  • Gas gangrene.jpg
  • Hemipelvectomy gas gangrene.jpg
  • Pneumatosis coli gas gangrene.jpg
5 files
  • Schematic illustration of differences in neuronal specification and migration patterns between the mammalian and avian pallium.png
  • Pone.0001454.g007.jpg
  • Differential reelin levels in the cortex of adult high and low LG rats.gif
  • Journal.pone.0000252.g008.png
  • Journal.pone.0001454.g005 center cropped.jpg
5 files each
  • Paleogeography of North America during the late Campanian Stage of the Late Cretaceous.png
  • Kosmoceratops.png
  • Kosmoceratops richardsoni.png
  • Phylogenetic relationships of Utahceratops gettyi and Kosmoceratops richardsoni within Ceratopsidae.jpg
  • Skull reconstruction of Kosmoceratops richardsoni.jpg
5 files each
  • Azhdarchwingshapewittonnaish2008.png
  • Haenamichnuswittonnaish2008.png
  • Quad launch.jpg
  • Pterosaurs.jpg
  • Life restoration of a group of giant azhdarchids, Quetzalcoatlus northropi, foraging on a Cretaceous fern prairie.png
5 files
  • Growth improvement conferred by Camponotus schmitzi to its host-plant Nepenthes bicalcarata.png
  • Isotopic signature (δ15N) of samples and assessment of myrmecotrophy.png
  • Nepenthes bicalcarata and Camponotus schmitzi.png
  • Positive effect of Camponotus schmitzi on pitcher production.png
  • Positive effect of Camponotus schmitzi on pitcher volume and prey biomass.png
5 files
  • Biston betularia.png
  • Biston betularia parva female.JPG
  • Biston betularia parva male.JPG
  • Biston nepalensis female.JPG
  • Biston nepalensis male.JPG
5 files
  • Schematic illustration of differences in neuronal specification and migration patterns between the mammalian and avian pallium.png
  • Pone.0001454.g007.jpg
  • Differential reelin levels in the cortex of adult high and low LG rats.gif
  • Journal.pone.0000252.g008.png
  • Journal.pone.0001454.g005 center cropped.jpg
5 files
  • Cabello.eugeni.lateral.svg
  • Chrosiothes.iviei.female.svg
  • Anatea.formicaria.svg
  • Craspedisia.spatulata.male.svg
  • Chrosiothes.niteroi.female.svg
 
+ Add a commentDISCUSS THIS STORY
TO FOLLOW COMMENTS, ADD THE PAGE TO YOUR WATCHLIST. IF YOUR COMMENT HAS NOT APPEARED HERE, YOU CAN TRYPURGING THE CACHE.
No comments yet. Yours could be the first!
SHARE:  EMAIL · TWITTER · IDENTICA · FACEBOOK · GOOGLE+ · DELICIOUS · REDDIT · DIGG · STUMBLEUPON
WANT THE LATEST EDITION DELIVERED TO YOUR TALK PAGE EACH MONTH?
Home Suggestions Subscribe Archives Newsroom
Posted in Open Access Report, Tools | Leave a comment

Open Access Report November 2012

Since January, I have been posting a monthly summary of Open-Access-related activities pertaining to Wikimedia projects as part of the GLAM Newsletter on the Wikimedia Outreach wiki.

I am posting these reports also here on the blog in order to reach out to a wider audience. This breaks some of the formatting, so please go to the respective wiki page to see things in a proper setting.

Links to all Open Access reports posted so far: JanuaryFebruaryMarchAprilMay,JuneJulyAugustSeptemberOctoberNovember.

Text is available under the Creative Commons Attribution/Share-Alike License 3.0. Licenses of images and media files used in the reports may differ but are always compatible with reuse in a CC BY-SA 3.0 environment.


GLAM‎ | Newsletter‎ | November 2012‎ | Contents

This month in GLAM logo.png


 
OPEN ACCESS REPORT

The Open Access Media Importer at full speedPublishers deliver inconsistent XML to PubMed CentralImporting from other sources

Open Access Media Importer

While walking, an Eland antelope emits multiple signals related to its fighting ability, including click sounds caused by knee movements.
After having been approved as a bot on Commons late in October, the Open Access Media Importer ran almost continuously throughout November, scanning PubMed Central for suitably licensed scholarly articles with supplementary multimedia and importing these into Wikimedia Commons, to a current total of well over 9,000, raising the number of files under Category:Open access (publishing) and its subcategories to almost 14,000 by the end of the month. The bot attempts to provide the files with categories based on keywords, subject categories or MeSH terms supplied by the journal, by PubMed Central or by PubMed for the corresponding article. This sometimes leads to miscategorizations, often to overcategorization, and occasionally to no categories at all. At present, the files are spread over more than 20,000 categories, almost 10% of which had to be created on the occasion (e.g. for well over a hundred journals). Some of these categories (e.g. Caenorhabditis elegans or Green fluorescent proteins) are now filled with hundreds of files, which will eventually have to be distributed across more fine-grained categories. For many topics covered by PubMed Central (mostly biomedicine), there are thus now way more multimedia files available than the current Wikipedia entries (if they exist) can accommodate. For further examples, see actin cytoskeleton (on the English Wikipedia and on Commons), Gap junction (WikipediaCommons) or Woronin body (WikipediaCommons). The review of the categorization of the files and of these new categories themselves continues – a process that you can facilitate by checking out (thanks to the overburdened Toolserver) a few of them and adding or removing categories as appropriate. If you can think of wiki pages where these files could be useful, please put them in there. Bug fixes continued but shifted in focus from providing functionality to minimizing the effects of inconsistent and incorrect metadata available from PubMed Central.

Metadata at PubMed Central

The most prominent issue with the XML is that of incorrect or self-contradicting licensing statements. While this had been noticed already in spring (e.g. “licensed under a Creative Commons Attr0ibution 3.0 License (by-nc 3.0)” – yes, with a typo on top of it, in an article from Orthopedic Reviews, published by PagePress), actually deploying the bot to larger parts of the database made it clear that the phenomenon is rather common and not restricted to small and lesser known publishers. The Open Access Media Importer analyzes the XML of articles stored in PubMed Central’s Open Access subset. That XML is being delivered there by the individual journals or publishers, which provides the basis for a plethora of individual styles that may or may not be close to the actual specifications of the National Library of Medicine‘s Document Type Definition (now named JATS).
License mismatch in Stem Cells (Dayton, Ohio), published by Wiley-Blackwell: while the machine-readable license is CC BY, the human-readable version has a non-commercial clause (PMC3468739).
Similar license mismatch – with an added “Share Alike” component – in mBio, published by Oxford University Press (PMC3000542).
Besides the two journals highlighted in the figures, other journals affected by contradictory license statements include Evolutionary Applications (Wiley-Blackwell),Traffic (Copenhagen, Denmark) (Wiley-Blackwell), Cellular Microbiology (Wiley-Blackwell), Cytotheraphy (Informa), The American Journal of Tropical Medicine and Hygiene (American Society of Tropical Medicine and Hygiene), The Febs Journal‏‎ (Wiley-Blackwell), Hepatology (Baltimore, Md.) (Wiley-Blackwell), Journal of Cellular Physiology (Wiley-Blackwell) and Database: The Journal of Biological Databases and Curation (Oxford University Press). At the Journal of Neurochemistry (Wiley-Blackwell), the self-contradictory notice “Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.” is even displayed directly on the article’s page. The closest match to the term “Creative Commons Deed, Attribution 2.5″ would be theCreative Commons Attribution 2.5 Generic License (CC BY 2.5), which is indeed linked from the XML and does permit commercial exploitation. While contradictions between machine-readable and human-readable license statements are one sort of problem, many journals – including those published by PLOS, which account for the majority of the bot’s uploads so far – do not provide a license link at all or mix up the license and copyright tags in other ways. On a related note, even articles clearly and unambiguously labeled CC BY may occasionally contain materials incompatible with such licensing, and some articles in journals otherwise using Creative Commons licenses occasionally publish something under Crown copyright or similar conditions, causing the bot to skip the articles. Such licensing mess raises a number of questions: if the licensing for a given article agrees in its human- and machine-readable version on PubMed Central, can we then be sure that this information is correct? This is the case, for instance, with the journal Molecular Vision. What if the same article does not have any licensing statement at the journal’s site (or in the XML there), or if the journal’s copyright policy states CC BY-NC-ND as the only option? What if Google finds several articles from the same journal that are also labeled as CC BY? The mismatch between stated licenses and actual licensing conditions also makes it difficult to assess, in an automated fashion, what amount of audio, video or other materials is available from PubMed Central under Wikimedia-compatible licenses. For some plots on the matter, see this blog post, which also highlights another frequent issue: that of a mismatch between the actual MIME type and that stated in the XML, as in the following example:
The MIME type of this supplementary video file from a PLOS ONE article is indicated at PubMed Central to be audio/wav. For another video from the same article, the MIME type is given as application/msword. In both cases, the XML at the journal’s website states the MIME types correctly.
As a rough estimate, MIME type mismatches of this kind affect on the order of 10% of the supplementary files in the database. Since this translates to hundreds of multimedia files, the bot now attempts to determine the MIME type of all supplementary materials and chooses those that are, in fact, audio or video, irrespective of what the XML states about them, thereby even covering cases in which the XML makes no statement about the MIME types . The bot naturally fails, however, in cases when suitably licensed articles do have supplementary multimedia files but these are not mentioned in the XML available from PMC (another case: journal;PMC) Another reason preventing the import of some suitably licensed materials is that files are frequently hidden in zip archives, which the bot ignores for the time being. Once suitably licensed multimedia have been identified as such, they have to be converted to a format accepted at Wikimedia Commons, i.e. OGG. This does not always work, since some authors use rather unusual file formats, or the metadata about the files (e.g. the length of a video) is incorrect or not stated at all. Most journals have a disclaimer that proper functioning of supplementary materials is within the authors’ responsibility, but it would be nice to establish a standard for testing that supplementary files submitted to journals actually convert properly to common standard formats. Further issues arise when the files are converted and need to be associated with their metadata in Commons style: Sometimes, there is no description whatsoever of supplementary files, or the description of several files is lumped together in a way that the bot cannot parse. Some minor issues include line breaks in article titles ortypos in categories or keywords provided by the journal, which the bot uses for the initial categorization of the files. A problem not really solved so far is that of duplicate detection – while this works well for images, this is not the case for multimedia files, since multiple copies of a file will normally have different hashes.

Gallery

The following files represent a selection of what has been uploaded by the Open Access Media Importer this month. If you can think of wiki pages where these files could be useful, please put them in there or let us know. For metadata about the files, please click on the Menu button.
Videos
Can you guess the research question addressed in the corresponding scholarly article?
Sound files
Can you guess what these sounds represent?
  • MENU
  • MENU
  • MENU
  • MENU
  • MENU
  • MENU

Beyond PubMed Central

While PubMed Central is the only database currently spidered by the Open Access Media Importer, it is designed in a modular fashion, such that other sources could easily be plugged in. To lay the ground for such future work, a number of (manual) test uploads from such potential sources have been made this month.  

WikiProject Open Access

The following news from WikiProject Open Access have been posted this month:
MENU
MENU
MENU
MENU

Open Access File of the Day

The following files have been featured as Open Access File of the Day this month:  
WANT THE LATEST EDITION DELIVERED TO YOUR TALK PAGE EACH MONTH?
Home Suggestions Subscribe Archives Newsroom
Posted in Open Access File of the Day, Open Access Media Importer, Open Access Report | Tagged , , , , | Leave a comment

Open Access Media Importer progress report: October & November

The OAMI development process was streamlined to better incorporate feedback: We now use GitHub issues. Tickets are prioritized according to tags similar to a scheme used by Kathrin Passig in 2010:

The plot helper script now can sort data by DOI prefix, displaying publisher names. This is useful to identify high-yield sources of free content: Regarding PMC, both the Public Library of Science and BioMed Central lead by a noticeable margin. Note that quite a few publishers frequently signal the licensing conditions incorrectly in the XML they deliver to PubMed Central (see below), which probably explains, for instance, Wiley being listed third here.

Plot of Video Supplementary Materials under Free Licenses by DOI Prefix Plot of Audio Supplementary Materials under Free Licenses by DOI Prefix

Plotting data can also reveal the extent of problems in the PMC Open Access Subset:

Plot of Supplementary Materials with wrong MIME type by DOI prefix

Also, oa-cache can now selectively “forget” conversion, download and upload of materials.

Since I have not found a good way to prevent the OAMI from stalling on conversion of certain files, oa-cache convert-media now exits when conversion has not progressed for some amount of time. I already tried to debug the GStreamer pipeline – for the story so far, see this bug report. Media conversion stalling is one of the major obstacles to running the converter unattended.

Suggested by Daniel Mietchen, a new OAMI source called pmc_pmcid can now be used to import content via its PMCID. Since PMCIDs – unlike DOIs – are sequential, the seq command line tool can be used to generate a list of papers to be examined: seq 3461993 1 3491993 | ./oami_pmc_pmcid_import

The check for existing content was refactored and now happens also before downloading original files for conversion. For reasons unknown, the MediaWiki API sometimes errors out when checking for duplicates – if this happens, the OAMI now retries until it gets an answer.

Licensing proves to be a perpetual source of problems: non-free materials labeled incorrectly may be recognized as free. An example of this is the paper Nestin- and Doublecortin-Positive Cells Reside in Adult Spinal Cord Meninges and Participate in Injury-Induced Parenchymal Reaction, which has self-contradictory licensing information in the metadata XML:

<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.5/"> <license-p>Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation. </license-p> </license>

Allowing commercial use in the machine-readable version while disallowing it in human-readable text is asking for trouble. Incorrect metadata creates additional work for humans – this is contrary to the automation approach of the OAMI.

To reduce the impact of mis-labeling, I am working on a whitelisting functionality for known-good DOI prefixes. The list was created by Daniel Mietchen, who reasoned that publishers that have a blanket policy of free licensing will not mistakenly label content as non-free – and even if they do, it is preferable to miss some free content than to import non-free content. A plot of problematic licensing metadata by DOI prefix is available.

Posted in Open Access Media Importer, Tools | Leave a comment

Open Access Report October 2012

Since January, I have been posting a monthly summary of Open-Access-related activities pertaining to Wikimedia projects as part of the GLAM Newsletter on the Wikimedia Outreach wiki.

I am posting these reports also here on the blog in order to reach out to a wider audience. This breaks some of the formatting, so please go to the respective wiki page to see things in a proper setting.

Links to all Open Access reports posted so far: January, February, March, April, May, June, July, August, September, October.

Text is available under the Creative Commons Attribution/Share-Alike License 3.0. Licenses of images and media files used in the reports may differ but are always compatible with reuse in a CC BY-SA 3.0 environment.


< GLAM‎ | Newsletter‎ | October 2012/Contents

This month in GLAM logo.png


 
OPEN ACCESS REPORT

Videos from Noble laureates; Open Access Week; Open Access Wikipedia Challenge; Open Access Media Importer approved

Membrane proteins are notoriously difficult to crystallize, so Kobilka and colleagues fused an easily crystallizable protein (orange) to a membrane protein (cyan) in order to facilitate the crystallization of the latter.

Videos from Noble laureates

The Nobel Prize winners were announced this month, and at least three of them have published in journals whose Creative Commons Attribution License allows for import into Wikimedia Commons: Brian Kobilka ( Chemistry) as well as John Gurdon and Shinya Yamanaka (who shared the prize for Physiology or Medicine this year). Running the Open Access Media Importer over these articles brought the following videos onto Wikimedia Commons:
  • Co-authored by Gurdon:
  • Co-authored by Yamanaka:
Physics laureate Serge Haroche has 21 papers listed on arXiv, but none of them is under a Wikimedia-compatible license.

Open Access Week

A video by Jorge Cham (of PhD Comics) on the occasion of Open Access Week 2012
The last full week in October each year is Open Access Week - an occasion for librarians, researchers, publishers, journal editors, students and others to engage in discussions around the past, present and future of Open Access and to showcase their related activities. Some of the highlights from this year include
  • a guideline “How Open Is It?” has been released by SPARC, PLOS and OASPA that differentiates between different levels of access to scholarly articles, with reusability being an important aspect. Such information could be used to indicate systematically the openness of references cited on Wikimedia pages, as per Template:Open Access.
  • an announcement by the publisher Institute of Physics to license its freely available journal articles under a Creative Commons Attribution License from next year on, which would make these materials reusable at Wikimedia projects.
  • Wikipedia entries on Open Access Week have been started in French and Japanese

Open Access Wikipedia Challenge

The Open Access Wikipedia Challenge is an online challenge in the MOOC realm, where accepting users attempt are asked to place Open Access content from Wikimedia Commons into Wikipedia. It’s built as a social lesson in Wikipedia editing requiring no previous experience. The challenge is filleted into 6 phases each with an accompanying YouTube screencast tutorial and mini-challenge totaling 2 hours of instruction. The challenge, hosted on Peer to Peer University, gave users the guided tasks of rating journals for openness, calculating how quickly it took Gangnam Style to get onto Wikipedia, writing Wikitext, categorizing on Commons with Hot Cat, and embedding media into Wikipedia. P2PU’s online platform allows users to track progress, to discuss the challenges, and to offer peer support, which probably makes the OAWC the first MOOC-ified Wikipedia tutorial. A special edition barnstar has been created for those who complete the challenge. Originally, the course was a celebratory measure, part of Wikipedia Loves Libraries and Open Access Week. After 21 netizens joined the course, it was decided to extend the challenge indefinitely. New and expert users are invited to sign up.
The first video associated with a PLOSarticle, from December 2003.

Open Access Media Importer approved

The Open Access Media Importer continued to be tested and refined throughout the month, leading to its approval on October 29.

Gallery

The following files represent a selection of what has been uploaded by the Open Access Media Importer this month. If you can think of wiki pages where these files could be useful, please put them in there or let us know.
Videos
Before you watch a video, consider guessing at the research question addressed in the corresponding scholarly article.
Sound files
Can you guess what these sounds represent?
  • MENU
  • MENU
  • MENU
  • MENU
  • MENU
  • MENU

WikiProject Open Access

The following news from WikiProject Open Access have been posted this month:
Reconstruction of the heterodontosaur Pegomastax africana.

Open Access File of the Day

The following files have been featured as Open Access File of the Day this month:  
Posted in Open Access Report | Tagged , , , , | Leave a comment

Reusing, revising, remixing and redistributing research

A contribution to the PLOS blog on the occasion of Open Access Week.

Introduction

The initial purpose of Open Access is to enable researchers to make use of information already known to science as part of the published literature. One way to do that systematically is to publish scientific works under open licenses, in particular the Creative Commons Attribution License that is compatible with the stipulations of the Budapest Open Access Initiative and used by many Open Access journals. It allows for any form of sharing of the materials by anyone for any purpose, provided that the original source and the licensing terms are shared alongside. This opens the door for the incorporation of materials from Open Access sources into a multitude of contexts both within and outside traditional academic publishing, including blogs and wikis.

Amongst the most active reusers of Open Access content are Wikimedia projects like the over 280 WikipediaWikispecies and their shared media repository, Wikimedia Commons. In the following, a few examples of reusing, revising, remixing and redistributing Open Access materials in the context of Wikimedia projects shall be highlighted.

Reuse

An example for intensive reuse is an article from BMC Evolutionary Biology that features a number of phylogenetic trees of gastropods, along with pictorial depictions of individual species contained therein. Over a dozen of these depictions have been cropped from the figures and uploaded to Wikimedia Commons, from where they are currently being served to over 7000 pages across Wikimedia projects.

While these numbers far exceed the reuse of the average figure in scientific manuscripts, the potential for reuse has not been fully exploited yet. For instance, these phylogenetic trees have been published in a scalable format but some of the shell drawings have been included in bitmapped formats, which limits the size range at which the images can be re-used in Wikipedia articles. Furthermore, the trees have not been provided in an editable format, nor with code that could be used to reconstruct and adapt them.

Phylogenetic relationships within Gastropoda.A Fusiturris similis shell.
A phylogenetic tree of some gastropods. // One of the tree’s species, Fusiturris similis.
Composite images like this illustrated phylogenetic tree take a lot of effort to assemble. Typical reuse scenarios — e.g. in articles on the individual species — then require decomposition and are limited by the resolution of the original figures. Source: Cunha, R. L.; Grande, C.; Zardoya, R. (2009). “Neogastropod phylogenetic relationships based on entire mitochondrial genomes”BMC Evolutionary Biology 9: 210. doi:10.1186/1471-2148-9-210PMC 2741453.PMID 19698157. License: CC BY 2.0.

Of course, some images are genuinely created in bitmapped formats, e.g. photographs. But why then do publishers not preserve the EXIF information that could provide valuable context in interpreting images or sound files?

Revise

In January, a species of frog — Paedophryne amauensis — made headlines as the smallest known vertebrate. It belongs to a genus whose currently six species have all been described in Open Access articles, of which the two latest ones — published one month apart — both state that there are four species. So did a map provided in one of them, and a contributor to the Polish Wikipedia — Szczureq — took the initiative to update the map accordingly, which is currently in use in about 20 Wikipedia languages in articles related to the genus. The file has since been tagged for conversion into SVG, an editable vector graphics format, so as to facilitate future updates.

Map with the localities of four Paedophryne species.Map with the localities of six Paedophryne species.
A map indicating the localities at which Paedophryne species have been found. On the left is the original map published in PLOS ONE with four species. On the right is a revision that takes into account the two additional species that had been published in ZooKeys a month earlier. Sources:

Remix

Scholarly communication nowadays takes place primarily in English. Open licenses allow for materials to be translated into other languages. This is particularly relevant for topics that are being taught in schools, such as the basic anatomy of the human ear and auditory cortex, as illustrated in the following figure originally published in PLOS Biology.

(A) The human ear and frequency mapping in the cochlea. (B) Lateral view of the human brain, with theauditory cortex exposed. Source: Chittka, L.; Brockmann, A. (2005). “Perception Space—The Final Frontier”PLoS Biology 3 (4): e137.doi:10.1371/journal.pbio.0030137PMC 1074815PMID 15819608. License: CC BY 2.5.

Part A of the above figure has been converted to SVG and from there adapted to Czech (with a variant), GermanSpanishIndonesianJapanesePolishPotugueseRomanianand Ukrainian, with further versions being devoid of any descriptionsnumbered descriptions or detailed frequency mapping.

Part B has also been converted to SVG and from there adapted for use in the Japanese Wikipedia‘s article on the insula.

Still in the auditory system and PLOS Biology, the next figure depicts some of the key processing steps involved in auditory perception:

Sound processing in the auditory system. Source: Gollisch, T.; Herz, A. M. V. (2005). “Disentangling Sub-Millisecond Processes within an Auditory Transduction Chain”PLoS Biology 3 (1): e8. doi:10.1371/journal.pbio.0030008PMC 539322PMID 15660161.License: CC BY 2.5.

For this one, too, a SVG version has been created but the file has also been remixed in another way: User:Was a bee noticed that the depicted processing chain includes neither a sound source nor a mental representation of the perceived sound, and created a new version that does, which is used on both the Japanese and Italian Wikipedias.

Redistribute

Distribution of the published literature traditionally takes place on the level of individual articles, journals or publishers but open licenses allow it to be aggregated at a cross-publisher level. For instance, the Open Access Subset at PubMed Central is now being automatically spidered for articles that (1) are licensed compatibly with reuse on Wikimedia platforms and (2) contain audio or video files. If such files have been detected, they will be downloaded from PubMed Central, converted to the open format OGG and uploaded to Wikimedia Commons, along with the accompanying metadata and suggested categories based on the article’s XML and the corresponding MeSH terms. Of course, the naming of these article-derived categories does not map one to one to categories used at Wikimedia Commons, but for known correspondances, there is another bot to fix that in a way that makes it easy for writers of Wikipedia articles to find relevant materials for illustration. This way, supplementary files — that otherwise are often neglected and rarely accessed — can live a second life in a new context. One of them, for instance, is featured on the Main Page of Wikimedia Commons today. The most recently uploaded media files from Open Access supplements can be viewed in a dedicated gallery.

In the process of setting up the bot, it became very clear that the XML supplied to PubMed Central varies widely in terms of compliance with PubMed Central guidelines and general machine readability. For instance, the XML indicates the MIME type of the supplementary files, but for about ten percent of the files, this type is indicated wrongly (e.g. for all videos in this paper), and even the licensing and copyright statements of the articles themselves are sometimes contradictory in themselves, so work remains to be done to address these issues and to further standardize the exchange of metadata.

It is interesting to note that the only permission that had to be sought in order to run the import from PubMed Central into Wikimedia Commons was actually on the Commons end, since running a bot there requires approval, which is normally granted after the bot has demonstrated compliance with relevant policies and standards. There is a caveat to such large-scale import, however: it relies on proper assertion of copyright and correct indication of licensing back at the journals and, ultimately, by the authors of the corresponding articles. This is not a given, since many scholarly authors are still far from being familiar with these legal aspects of publishing. Raising awareness of such issues amongst the scholarly, librarian and publishing communities is one of the purposes of Open Access Week, and trying to get Open Access materials used on Wikipedia (or simply checking the provenance of an image or media file used there) is a good start to familiarize oneself with the subject.

Wikifying publications

Reusing, revising, remixing and redistributing openly licensed content is easier if the materials are created in an editable fashion right from the start. The journal RNA Biology has for several years required that authors of manuscripts describing new families of RNA submit a draft for a Wikipedia entry along with their manuscript, which will go through the same peer-review process. Earlier this year, PLOS Computational Biology has taken this approach a step further by introducing Topic Pages — review articles drafted according to the guidelines of the journal and of the English Wikipedia — that are published as traditional non-editable documents in the journal and additionally also posted to the English Wikipedia, where they can be expanded and updated as the need arises. It would be nice to see further experimentation in this area, so as to increasingly integrate scholarly workflows with the Web, for which Open Access provides the first step.

Posted in Open Access Media Importer, Open Access Week | Tagged , , , , , , , | Leave a comment

Open Access Media Importer: Presentation at WMDE, Collaborative Coding

Last Friday, I visited Wikimedia Deutschland to conduct a presentation on the current state of the Open Access Media Importer (slides), which is funded by WMDE. Attending were project manager Nicole Ebber and senior software developer Daniel Kinzler; Daniel Mietchen participated remotely via Skype.

The modular software architecture and resulting workflow, adapted from the original proposal was quickly grasped, possibly due to being inspired by the Debian package manager apt. When discussing details, however, I realized that the documentation should explicitly mention that despite the size of the PMC Open Access Subset the OAMI can also be run with access to modest computing resources, focusing on particular documents.

Where the software will be run after the initial bulk import is done is unclear. According to Daniel Kinzler, it is not particularly suited to run on the Wikimedia Toolserver: Extracting metadata from archives and conversion of media are both processing intensive tasks while downloading and uploading are I/O-bound; additionally, quite some storage is required. Since a demonstration was not part of the visit. Daniel Kinzler now has an account on our development server to test and possibly review parts of the OAMI as soon as he finds the time to do so.

On the feature side, many incremental improvements were made: Besides bugfixes, there is now support for audio (which is converted to Vorbis); the new oa-cache subcommand update-mimetypes downloads the first 4 kB of files to supplementary materials and detects their MIME type via magic. While slow, this step is useful because some of the files in the PMC Open Access Subset are listed with a wrong type. For example, in the machine-readable version of this paper all videos are reported as being plain text.

When Daniel Mietchen came to Berlin on September 23th, we corrected some subtle bugs regarding data extraction from XML and implemented a cheap heuristic for proper (not too broad) categories – to be used in the page template, a category has to contain a space. For getting the upload done in fewer API calls, I imported a patched version of python-wikitools into our repository. Since python-wikitools was the only dependency that is not in Debian, installation is now possible with a simple git clone https://github.com/erlehmann/open-access-media-importer.git.

Posted in Open Access Media Importer, Tools | Leave a comment

Budapest Open Access Initiative – looking ten years into both past and future

A good ten years since the Budapest Open Access Initiative went public, the participants of the 10th anniversary meeting in February have released a set of recommendations concerning the next ten years of Open Access.

Jonathan Gray over on the main OKFN blog has already covered some of the most interesting details (I have listed a few other ones here), and I am reposting his entry (licensed CC BY) in full below the fold.


 

The notion of open access – or making research freely usable by all, without cost or legal barriers – has been in the news quite a bit this year.

It received significant media coverage on the back on the so-called Academic Spring, and subsequent high profile activities and announcements in the UK, the US and the EU.

One of the most significant milestones for open access advocates in the recent past is the Budapest Open Access Initiative, an international conference which convened experts from around the world to build consensus around a shared definition of ‘open access’. It is widely referred to as one of the defining events in the history of open access advocacy.

Ten years after this event, a diverse group of academics, advocates, librarians, and legal and policy experts met in Budapest. Today the group has issued a series of new recommendations for the next ten years of open access.

Some of the prefatory remarks to the recommendations are worth quoting in full:

Today we’re no longer at the beginning of this worldwide campaign, and not yet at the end. We’re solidly in the middle, and draw upon a decade of experience in order to make new recommendations for the next ten years. We reaffirm the BOAI “statement of principle,…statement of strategy, and…statement of commitment.” We reaffirm the aspiration to achieve this “unprecedented public good” and to “accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge.” We reaffirm our confidence that “the goal is attainable and not merely preferable or utopian.” Nothing from the last ten years has made the goal less attainable. On the contrary, OA is well-established and growing in every field. We have more than a decade’s worth of practical wisdom on how to implement OA. The technical, economic, and legal feasibility of OA are well-tested and well-documented. Nothing in the last ten years makes OA less necessary or less opportune. On the contrary, it remains the case that “scientists and scholars…publish the fruits of their research in scholarly journals without payment” and “without expectation of payment.” In addition, scholars typically participate in peer review as referees and editors without expectation of payment. Yet more often than not, access barriers to peer-reviewed research literature remain firmly in place, for the benefit of intermediaries rather than authors, referees, or editors, and at the expense of research, researchers, and research institutions. Finally, nothing from the last ten years suggests that the goal is less valuable or worth attaining. On the contrary, the imperative to make knowledge available to everyone who can make use of it, apply it, or build on it is more pressing than ever.

If you believe in open access, the following four sections are worth reading in detail – and contain lots of ideas on policy, licensing, infrastructure, sustainability, and advocacy.

Following are a couple of excerpts that might be of particular interest to readers of the OKFN’s blog.

Firstly, while there have been no shortage of debates about the legal and practical meaning of ‘open access’ and associated questions of licensing and strategy (resulting in various inflections: strong/weak, libre/gratis, green/gold, etc), the recommendations contain a clear endorsement of a strong conception of open access which only requires attribution with the CC-BY license (which is compliant with the Open Knowledge Foundation’s Open Definition):

2.1. We recommend CC-BY or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
  • OA repositories typically depend on permissions from others, such as authors or publishers, and are rarely in a position to require open licenses. However, policy makers in a position to direct deposits into repositories should require open licenses, preferably CC-BY, when they can.
  • OA journals are always in a position to require open licenses, yet most of them do not yet take advantage of the opportunity. We recommend CC-BY for all OA journals.
  • In developing strategy and setting priorities, we recognize that gratis access is better than priced access, libre access is better than gratis access, and libre under CC-BY or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving gratis in order to achieve libre, and we should not stop with gratis when we can achieve libre.

Secondly, they explicitly suggest that open access advocates should more closely coordinate with advocacy for other forms of openness:

The worldwide campaign for OA to research articles should work more closely with the worldwide campaigns for OA to books, theses and dissertations, research data, government data, educational resources, and source code.

If you’re interested in finding out more about the Open Knowledge Foundation’s open access activities you can join our open-access mailing list.

Posted in In the news, Policy | Tagged , , , , | Leave a comment

Open Access Media Importer: Page Templates, Automatic Import

Since the last post, most work was done on page templates for uploaded media files. Page names are now based on article titles and both the database and pages created by the Open Access Media Importer now contain articles’ DOIs. Page categories based on journal names and MeSH terms were also introduced, the latter soon turned out to be too broad (“Female” is not very specific). A good example is this page (screenshot).

On the interface side, found media are now shown grouped by media type count:

Screenshot of the Open Access Media Importer, running the command “oa-cache find-media pmc_doi”

Daniel Mietchen proposed a shell script to automate importing media via the pmc_doi source. As pmc_doi reads DOIs from stdin, it can be used both interactively and programmatically (for example, echo 10.1371/journal.pone.0002365 | ./oami_pmc_doi_import). Since the script is rather short and illustrates the usual workflow of running the Open Access Media Importer, it is shown below:

#!/bin/sh

# clear database to get rid of old data
./oa-cache clear-database pmc_doi

# normal workflow for OAMI
./oa-get download-metadata pmc_doi
./oa-cache find-media pmc_doi
./oa-get download-media pmc_doi
./oa-cache convert-media pmc_doi
./oa-put upload-media pmc_doi

I rewrote the code that tries to make sense of plain text licensing statements, mapping them to proper licensing URLs like http://creativecommons.org/publicdomain/zero/1.0/, correcting several bugs in the process. I am now convinced that publishers providing only plain text licensing information are one of the biggest obstacles to identifying materials that are suitable for Wikimedia Commons.

Daniel Mietchen has identified a number of articles that cause the import to fail at various stages. While I am working on correcting issues arising from my own code, I do not think that GStreamer being unable to decode a particular quicktime file or producing pixelated output are issues I will be able to solve on my own.

Posted in Open Access Media Importer, Tools | Tagged , , , , , , , , | Leave a comment