Extended grant support, ORCID and language field.

by Krzysztof Nowak on October 10, 2017

Today, we are introducing three additions to Zenodo:

  1. Extended grant support
  2. Language field
  3. ORCID for authors

Extended grant support (powered by OpenAIRE)

We are expanding our grants database with over 620,000 grants from 8 new funders such as the National Science Foundation (US) and Wellcome Trust (UK) - all thanks to OpenAIRE's ever growing grants database.

Zenodo OpenAIRE grants database

So far we have only been supporting grants from the European Commission (FP7 and Horizon 2020). Today our dataset has grown and contains grants from the following funders:

  • National Science Foundation (USA) - 497646 grants
  • European Commission (EU) - 39409 grants
  • Foundation for Science and Technology (Portugal) - 37277 grants
  • National Health and Medical Research Council (Australia) - 24354 grants
  • Netherlands Organisation for Scientific Research (The Netherlands) - 24180 grants
  • Australian Research Council (Australia) - 23011 grants
  • Wellcome Trust (United Kingdom) - 12196 grants
  • Ministry of Science and Education (Croatia) - 2120 grants
  • Ministry of Education, Science and Technological Development (Serbia) - 777 grants

which means that our grants database grew from nearly 40,000 to over 660,000 grants! We wouldn't have been able to do that if it wasn't for the OpenAIRE team and their hard work in collecting, maintaining and distributing the grants database for the benefit of Open Science!

Language field

Today we are also adding a new language field to our metadata, which allows you to record the primary language of an upload. You can select the language in the upload form simply by starting to type the English name, a 2-letter or a 3-letter ISO 639 code:

Language for Zenodo

The new language field supports all languages defined in ISO 639-3, which in total defines 436 individual and macro-languages. For the full reference of language codes see the Library of Congress ISO 639 Language List.

What if my record contains more than one language?

The field is used to specify the primary language of the resource, hence if, e.g., a thesis is written in Danish and has an English abstract, then the primary language is Danish. Similarly, the primary language of a paper written in English, which is on the topic of Greek linguistics, thus containing a lot of text in Greek, is English.

There are always cases where it is not possible to clearly determine the primary language, for example for a dataset containing the mapping between common Polish and French phrases. In those special cases you can always use ISO 639-3 code mul (Multiple Languages).

ORCID for authors

Last but not least, you can now include an author's ORCID under the Authors section of the metadata on the deposit web interface. Zenodo author ORCID

Google Summer of Code and Zenodo summer update

by Krzysztof Nowak on July 31, 2017

Google Summer of Code 2017

Zenodo has been taking part in Google Summer of Code 2017 and since June two students, Aman Jain and Xiao Meng, have been working with our team on introducing two exciting features to Zenodo by the end of the summer.

Aman Jain's project will introduce public user profiles on Zenodo that will allow our users to share and show case their uploads on Zenodo.

Xiao Meng's project introduces a new backend files processing module, which will enable us to e.g. extract the metadata from the PDF documents and use that information to improve search or easy pre-filling of upload forms.

New feature: support contact form

Thanks to Aman Jain, one of our GSoC students, you can now contact us more efficiently through a contact form available at zenodo.org/support. This will allow us to organize and resolve our support requests better and faster.

New feature: logged in devices

We recently updated Zenodo to the latest Invenio version which brought along a new feature that allow users to view all devices currently logged into their account. This security feature allow you to remotely log out of devices in case you e.g. forgot to logout of your Zenodo account on a public computer.

You can view all currently logged-in devices by navigating to the Security tab in your account settings.

Active sessions

New feature: status page

On the footer of all Zenodo pages you will now find a "Status" hyperlink (status.zenodo.org), which will show you status of Zenodo and uptime statistics for Zenodo pages and services.

Upload storage incident

by Lars Holm Nielsen on July 19, 2017

What happened?

As a result of a regular automatic file integrity check, as well as some user reports, we have discovered that 18 files uploaded to Zenodo after June 21st this year were not stored successfully. Despite serious efforts we have not been able to recover any of these 18 files from the CERN storage servers.

How did it happen?

We are taking this incident very seriously and have thoroughly investigated what happened. The root cause was the coincidence of two software bugs; one bug was found in the underlying disk storage system and the other bug was found in the client software that our web servers uses to connect to the disk storage system. The two bugs were activated on June 21st when our underlying CERN disk storage system was upgraded to a new major software release. Only recent files uploaded on or after June 21st could have been affected, and of those, only 18 out of the 15,000 files uploaded to Zenodo since June 21st were actually affected.

An in-depth explanation of the incident is provided below.

Is it fixed?

Yes. We have already deployed fixes for the two software bugs. We have also taken further measures to ensure similar issues cannot happen. Even though it was good that our file integrity checks caught the errors, we have taken steps to improve this monitoring and ensure that we are alerted immediately in the future.

Is my file affected?

We have personally contacted all affected users by email, and since only a tiny fraction of recently uploaded files were affected we are hoping to recover all files from their respective uploaders.

Why could you not recover the files?

The reason we could not recover any of the files was because the files was never stored on our storage system, and thus our backups did also not have the file (see in-depth explanation below). The information we do have is metadata such as the file size and file fingerprint (MD5 checksum) as these a calculated on the web server side. This information allows us to check if files recovered from the respective uploaders is indeed the exact same files.

What measures are you taking to prevent this in the future?

We are operating complex systems with tens of terabytes of data and millions of files, and we anticipate failures to inevitably happen. That's also why we go to a great deal of length to safeguard files that users upload on Zenodo. In this case, one of our many checks also caught the problem, however with a delay of three weeks instead of immediately. We have now measures in place that ensures we catch a similar problem right away, and will continue to proactively anticipate other types of failures and build countermeasures against them as part of our preservation strategy.

In-depth explanation of the incident

When a user uploads a file to Zenodo, the file is streamed through one of our web servers down to a storage server in our disk storage system. The disk storage system then immediately replicates the file to another storage server in the cluster before sending back a response to the web server that the file was successfully written to disk. On a successful write, the web server will then record metadata about the file in our database and let the user know the file was successfully uploaded.

One of the software bugs affected the underlying client library that Zenodo uses to connect to the storage system. After a complete file was sent from the web server to the storage system, the client library did not properly check the final reply from the storage system for errors. This meant that some particular errors reported by the storage system would not be caught by the client library and lead the web server to think that the file was written successfully to disk when in fact there was an error.

The other software bug was found in the new version of the disk storage system software. Once the storage server had received the entire file it would try to replicate the file to another storage server in the cluster. If this other storage server was unresponsive (e.g. due to high workload or network congestion), the replication operation would timeout. The storage server would then proceed to cleanup the file (i.e. delete it) and send back an error reply.

Thus, when a file replication operation failed in the storage system, the client library did not catch that there had been an error, leading the web server to think the file was successfully written to disk when in fact the storage system had never stored the file. This error did not expose itself prior to June 21st, because the previous software version on the disk storage system would automatically recover from the replication failure and not send an error reply back. As a result of this incident, the disk storage system software will reinstantiate the previous behaviour and try to immediately recover from the replication failure.