This month, after creating the upload functionality (oa-cache upload-media), my Laptop’s SSD died. Since, by habit, I only push feature-complete commits I lost the entire upload routine. Additionally, my backup from one month earlier was corrupted – and the backup from January did not contain any emails regarding the project.
Besides working on uploads and restoring backups, some of my time was spent collecting plain text licensing information, assigning proper license URLs to it. It is frustrating to see publishers giving useless or inconsistent licensing information like the following:
This work is licensed under a Creative Commons Attribution 3.0 License (by-nc 3.0)
This work is licensed under a Creative Commons Attr0ibution 3.0 License (by-nc 3.0)
This document may be redistributed and reused, subject to certain conditions .
Recently, PubMed Central seems to dislike the user agent string of Python‘s urllib2, answering with a 403 error. Doing this to hinder bots seems to be common, as Wikipedia does it too, even though changing the user agent defeats the measure.
As the software is almost finished, Raphael has ordered a server; we will be starting the batch import next week. I am certain this will uncover quite some bugs in the crawler. I will not be at the Hackathon on the coming weekend,
but I’ll write a short post that may help people extending the Open Access Media Importer.