As many of you noticed, the Zenodo upgrade last Friday didn’t go as smoothly as hoped. Afterwards Zenodo was sluggish and file uploads were painfully slow, some features didn’t quite work as expected. We’ve been working flat-out behind the scenes to steer Zenodo out of these stormy waters. We’d like to keep you updated on the issues we had, what we solved so far, and what we are still working on.
The upgrade involved orchestration of three components; Zenodo’s refactored software platform, the data, and the technical infrastructure:
The main issue after the upgrade was immediately evident that file upload/download was painfully slow (1GB file taking 1 hour instead of 20 seconds). We worked the weekend to finally discover that circumventing the front-facing load balancer alleviated the problem, and on removing it we restored Zenodo’s expected data performance (the root cause is still being investigated by the infrastructure experts).
We also had indexing issues with records which meant they didn't show up despite the data having been migrated, and despite having had successful test migration runs. We have almost finished to resolve this issue and expect the main issues to have been solved by Monday.
We want to thank you for all the incredible support and understanding that we have been receiving even under such difficulties. We’ve not ignored the functionality hiccups you’ve been reporting, we already solved some in parallel to the performance debugging. Please rest assured that we continue to work relentlessly to address all your support requests and feedback, and to address all outstanding issues in a timely manner to help you publish/access your content.
We apologise for all the inconvenience it has caused. We hope once we're over the hiccups the new Zenodo will serve you and your communities well!
Final lesson learnt: Friday 13th[**] might not have been a good day for a major release after all.
(**) We chose a Friday as Zenodo has considerable less traffic during weekends and thus as risk mitigation measure we would impact less users if there were problems (as there unfortunately turned out to be).