Dataverse: v5.5 Release

Release date:
May 19, 2021
Previous version:
v5.4.1 (released April 13, 2021)
Magnitude:
3,935 Diff Delta
Contributors:
16 total committers
Data confidence:
Commits:

49 Features Released with v5.5

Top Contributors in v5.5

pdurbin
sekmiller
landreev
pkiraly
qqmyers
djbrooke
rtreacy
donsizemore
scolapasta
jggautier

Directory Browser for v5.5

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

Dataverse Software 5.5

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Note: this release has a change to the default value for the :ZipDownloadLimit setting, from 100 MB to 0 bytes. If you have not previously adjusted this setting from the default, your Dataverse installation will no longer generate zip files once v5.5 is installed, as the setting will now be 0 bytes. This behavior will be revisited in a later release.

Release Highlights

Auxiliary Files Accessible Through the UI

Auxiliary Files can now be downloaded from the web interface. Auxiliary files uploaded as type=DP appear under "Differentially Private Statistics" under file level download. The rest appear under "Other Auxiliary Files".

Please note that the auxiliary files feature is experimental and is designed to support integration with tools from the OpenDP Project. If the API endpoints are not needed they can be blocked.

Improved Workflow for Downloading Large Zip Files

Users trying to download a zip file larger than the Dataverse installation's :ZipDownloadLimit will now receive messaging that the zip file is too large, and the user will be presented with alternate access options. Previously, the zip file would download and files above the :ZipDownloadLimit would be excluded and noted in a MANIFEST.TXT file.

Guidelines on Depositing Code

The Software Metadata Working Group has created guidelines on depositing research code in a Dataverse installation. Learn more in the Dataset Management section of the Dataverse Guides.

New Metrics API

Users can retrieve new types of metrics and per-collection metrics. The new capabilities are described in the guides. A new version of the Dataverse Metrics web app adds interactive graphs to display these metrics. Anyone running the existing Dataverse Metrics app will need to upgrade or apply a small patch to continue retrieving metrics from Dataverse instances upgrading to this release.

Major Use Cases

Newly-supported major use cases in this release include:

  • Users can now select and download auxiliary files through the UI. (Issue #7400, PR #7729)
  • Users attempting to download zip files above the installation's size limit will receive better messaging and be directed to other download options. (Issue #7714, PR #7806)
  • Superusers can now sort users on the Dashboard. (Issue #7814, PR #7815)
  • Users can now access expanded and new metrics through a new API (Issue #7177, PR #7178)
  • Dataverse collection administrators can now add a search facet on their collection pages for the Geospatial metadatablock's "Other" field, so that others can narrow searches in their collections using the values entered in that "Other" field (Issue #7399, PR #7813)
  • Depositors can now receive guidance about depositing code into a Dataverse installation (PR #7717)

Notes for Dataverse Installation Administrators

Simple Search Fix for Solr Configuration

The introduction in v4.17 of a schema_dv_mdb_copies.xml file as part of the Solr configuration accidentally removed the contents of most metadata fields from index used for simple searches in Dataverse (i.e. when one types a word without indicating which field to search in the normal search box). This was somewhat ameliorated/hidden by the fact that many common fields such as description were still included by other means.

This release removes the schema_dv_mdb_copies.xml file and includes the updates needed in the schema.xml file. Installations with no custom metadata blocks can simply replace their current schema.xml file for Solr, restart Solr, and run a 'Reindex in Place' as described in the guides.

Installations using custom metadata blocks should manually copy the contents of their schema_dv_mdb_copies.xml file (excluding the enclosing <schema> element and only including the <copyField> elements) into their schema.xml file, replacing the section between

<!-- Dataverse copyField from http://localhost:8080/api/admin/index/solr/schema -->

and

<!-- End: Dataverse-specific -->.

In existing schema.xml files, this section currently includes only one line:

<xi:include href="schema_dv_mdb_copies.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />.

In this release, that line has already been replaced with the default set of <copyFields>. It doesn't matter whether schema_dv_mdb_copies.xml was originally created manually or via the recommended updateSchemaMDB.sh script and this fix will work with all prior versions of Dataverse from v4.17 on. If you make further changes to metadata blocks in your installation, you can repeat this process (i.e. run updateSchemaMDB.sh, copy the entries in schema_dv_mdb_copies.xml into the same section of schema.xml, restart solr, and reindex.)

Once schema.xml is updated, Solr should be restarted and a 'Reindex in Place' will be required. (Future Dataverse Software versions will avoid this manual copy step.)

Geospatial Metadata Block Updated

The Geospatial metadata block (geospatial.tsv) was updated. Dataverse collection administrators can now add a search facet on their collection pages for the metadata block's "Other" field, so that people searching in their collections can narrow searches using the values entered in that field.

Extended support for S3 Download Redirects ("Direct Downloads")

If your installation uses S3 for storage and you have "direct downloads" enabled, please note that it will now cover the following download types that were not handled by redirects in the earlier versions: saved originals of tabular data files, cached RData frames, resized thumbnails for image files and other auxiliary files. In other words, all the forms of the file download API that take extra arguments, such as "format" or "imageThumb" - for example:

/api/access/datafile/12345?format=original

/api/access/datafile/:persistentId?persistentId=doi:1234/ABCDE/FGHIJ&imageThumb=true

etc., that were previously excluded.

Since browsers follow redirects automatically, this change should not in any way affect the web GUI users. However, some API users may experience problems, if they use it in a way that does not expect to receive a redirect response. For example, if a user has a script where they expect to download a saved original of an ingested tabular file with the following command:

curl https://yourhost.edu/api/access/datafile/12345?format=original > orig.dta

it will fail to save the file when it receives a 303 (redirect) response instead of 200. So they will need to add "-L" to the command line above, to instruct curl to follow redirects:

curl -L https://yourhost.edu/api/access/datafile/12345?format=original > orig.dta

Most of your API users have likely figured it out already, since you enabled S3 redirects for "straightforward" downloads in your installation. But we feel it was worth a heads up, just in case.

Authenticated User Deactivated Field Updated

The "deactivated" field on the Authenticated User table has been updated to be a non-nullable field. When the field was added in version 5.3 it was set to 'false' in an update script. If for whatever reason that update failed in the 5.3 deploy you will need to re-run it before deploying 5.5. The update query you may need to run is: UPDATE authenticateduser SET deactivated = false WHERE deactivated IS NULL;

Notes for Tool Developers and Integrators

S3 Download Redirects

See above note about download redirects. If your application integrates with the the Dataverse software using the APIs, you may need to change how redirects are handled in your tool or integration.

Complete List of Changes

For the complete list of code changes in this release, see the 5.5 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email [email protected].

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.5.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

  • $PAYARA/bin/asadmin list-applications
  • $PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

  • service payara stop
  • rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

  • service payara start

4. Deploy this version.

  • $PAYARA/bin/asadmin deploy dataverse-5.5.war

5. Restart payara

  • service payara stop
  • service payara start

Additional Release Steps

1. Follow the steps to update your Solr configuration, found in the "Notes for Dataverse Installation Administrators" section above. Note that there are different instructions for Dataverse installations running with custom metadata blocks and those without.

2. Update Geospatial Metadata Block (if used)

  • wget https://github.com/IQSS/dataverse/releases/download/v5.5/geospatial.tsv
  • curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @geospatial.tsv -H "Content-type: text/tab-separated-values"