Dataverse: v5.13 Release

Release date:
February 14, 2023
Previous version:
v5.12.1 (released November 4, 2022)
Magnitude:
11,014 Diff Delta
Contributors:
24 total committers
Data confidence:
Commits:

101 Features Released with v5.13

Top Contributors in v5.13

landreev
qqmyers
pdurbin
rtreacy
sekmiller
poikilotherm
eryk-k
JayanthyChengan
jggautier
scolapasta

Directory Browser for v5.13

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

Dataverse Software 5.13

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Schema.org Improvements (Some Backward Incompatibility)

The Schema.org metadata used as an export format and also embedded in dataset pages has been updated to improve compliance with Schema.org's schema and Google's recommendations for Google Dataset Search.

Please be advised that these improvements have the chance to break integrations that rely on the old, less compliant structure. For details see the "backward incompatibility" section below. (Issue #7349)

Folder Uploads via Web UI (dvwebloader, S3 only)

For installations using S3 for storage and with direct upload enabled, a new tool called DVWebloader can be enabled that allows web users to upload a folder with a hierarchy of files and subfolders while retaining the relative paths of files (similarly to how the DVUploader tool does it on the command line, but with the convenience of using the browser UI). See Folder Upload in the User Guide for details. (PR #9096)

Long Descriptions of Collections (Dataverses) are Now Truncated

Like datasets, long descriptions of collections (dataverses) are now truncated by default but can be expanded with a "read full description" button. (PR #9222)

License Sorting

Licenses as shown in the dropdown in UI can be now sorted by the superusers. See Sorting Licenses section of the Installation Guide for details. (PR #8697)

Metadata Field Production Location Now Repeatable, Facetable, and Enabled for Advanced Search

Depositors can now click the plus sign to enter multiple instances of the metadata field "Production Location" in the citation metadata block. Additionally this field now appears on the Advanced Search page and can be added to the list of search facets. (PR #9254)

Support for NetCDF and HDF5 Files

NetCDF and HDF5 files are now detected based on their content rather than just their file extension. Both "classic" NetCDF 3 files and more modern NetCDF 4 files are detected based on content. Detection for older HDF4 files is only done through the file extension ".hdf", as before.

For NetCDF and HDF5 files, an attempt will be made to extract metadata in NcML (XML) format and save it as an auxiliary file. There is a new NcML previewer available in the dataverse-previewers repo.

An extractNcml API endpoint has been added, especially for installations with existing NetCDF and HDF5 files. After upgrading, they can iterate through these files and try to extract an NcML file.

See the NetCDF and HDF5 section of the User Guide for details. (PR #9239)

Support for .eln Files (Electronic Laboratory Notebooks)

The .eln file format is used by Electronic Laboratory Notebooks as an exchange format for experimental protocols, results, sample descriptions, etc...

Improved Security for External Tools

External tools can now be configured to use signed URLs to access the Dataverse API as an alternative to API tokens. This eliminates the need for tools to have access to the user's API token in order to access draft or restricted datasets and datafiles. Signed URLs can be transferred via POST or via a callback when triggering a tool via GET. See Authorization Options in the External Tools documentation for details. (PR #9001)

Geospatial Search (API Only)

Geospatial search is supported via the Search API using two new parameters: geo_point and geo_radius.

The fields that are geospatially indexed are "West Longitude", "East Longitude", "North Latitude", and "South Latitude" from the "Geographic Bounding Box" field in the geospatial metadata block. (PR #8239)

Reproducibility and Code Execution with Binder

Binder has been added to the list of external tools that can be added to a Dataverse installation. From the dataset page, you can launch Binder, which spins up a computational environment in which you can explore the code and data in the dataset, or write new code, such as a Jupyter notebook. (PR #9341)

CodeMeta (Software) Metadata Support (Experimental)

Experimental support for research software metadata deposits has been added.

By adding a metadata block for CodeMeta, we take another step toward adding first class support of diverse FAIR objects, such as research software and computational workflows.

There is more work underway to make Dataverse installations around the world "research software ready."

Note: Like the metadata block for computational workflows before, CodeMeta is listed under Experimental Metadata in the guides. Experimental means it's brand new, opt-in, and might need future tweaking based on experience of usage in the field. We hope for feedback from installations on the new metadata block to optimize and lift it from the experimental stage. (PR #7877)

Mechanism Added for Stopping a Harvest in Progress

It is now possible for a sysadmin to stop a long-running harvesting job. See Harvesting Clients in the Admin Guide for more information. (PR #9187)

API Endpoint Listing Metadata Block Details has been Extended

The API endpoint /api/metadatablocks/{block_id} has been extended to include the following fields:

  • controlledVocabularyValues - All possible values for fields with a controlled vocabulary. For example, the values "Agricultural Sciences", "Arts and Humanities", etc. for the "Subject" field.
  • isControlledVocabulary: Whether or not this field has a controlled vocabulary.
  • multiple: Whether or not the field supports multiple values.

See Metadata Blocks in the API Guide for details. (PR #9213)

Advanced Database Settings

You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be sslmode=require. See the new Database Persistence section of the Installation Guide for details. (PR #8915)

Support for Cleaning up Leftover Files in Dataset Storage

Experimental feature: the leftover files stored in the Dataset storage location that are not in the file list of that Dataset, but are named following the Dataverse technical convention for dataset files, can be removed with the new Cleanup Storage of a Dataset API endpoint.

OAI Server Bug Fixed

A bug introduced in 5.12 was preventing the Dataverse OAI server from serving incremental harvesting requests from clients. It was fixed in this release (PR #9316).

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release not already mentioned above include:

  • Administrators can configure an alternative storage location where files uploaded via the UI are temporarily stored during the transfer from client to server. (PR #8983, See also Configuration Guide)
  • To improve performance, Dataverse estimates download counts. This release includes an update that makes the estimate more accurate. (PR #8972)
  • Direct upload and out-of-band uploads can now be used to replace multiple files with one API call (complementing the prior ability to add multiple new files). (PR #9018)
  • A persistent identifier, CSRT, is added to the Related Publication field's ID Type child field. For datasets published with CSRT IDs, Dataverse will also include them in the datasets' Schema.org metadata exports. (Issue #8838)
  • Datasets that are part of linked dataverse collections will now be displayed in their linking dataverse collections.

New JVM Options and MicroProfile Config Options

The following JVM option is now available:

  • dataverse.personOrOrg.assumeCommaInPersonName - the default is false

The following MicroProfile Config options are now available (these can be treated as JVM options):

  • dataverse.files.uploads - alternative storage location of generated temporary files for UI file uploads
  • dataverse.api.signing-secret - used by signed URLs
  • dataverse.solr.host
  • dataverse.solr.port
  • dataverse.solr.protocol
  • dataverse.solr.core
  • dataverse.solr.path
  • dataverse.rserve.host

The following existing JVM options are now available via MicroProfile Config:

  • dataverse.siteUrl
  • dataverse.fqdn
  • dataverse.files.directory
  • dataverse.rserve.host
  • dataverse.rserve.port
  • dataverse.rserve.user
  • dataverse.rserve.password
  • dataverse.rserve.tempdir

Notes for Developers and Integrators

See the "Backward Incompatibilities" section below.

Backward Incompatibilities

Schema.org

The following changes have been made to Schema.org exports (necessary for the improvements mentioned above):

  • Descriptions are now joined and truncated to less than 5K characters.
  • The "citation"/"text" key has been replaced by a "citation"/"name" key.
  • File entries now have the mimetype reported as 'encodingFormat' rather than 'fileFormat' to better conform with the Schema.org specification for DataDownload entries. Download URLs are now sent for all files unless the dataverse.files.hide-schema-dot-org-download-urls setting is set to true.
  • Author/creators now have an @type of Person or Organization and any affiliation (affiliation for Person, parentOrganization for Organization) is now an object of @type Organization

License Files

License files are now required to contain the new "sortOrder" column. When attempting to create a new license without this field, an error would be returned. See Configuring Licenses section of the Installation Guide for reference.

Complete List of Changes

For the complete list of code changes in this release, see the 5.13 milestone on GitHub.

Installation

If this is a new installation, please see our Installation Guide. Please don't be shy about asking for help if you need it!

After your installation has gone into production, you are welcome to add it to our map of installations by opening an issue in the dataverse-installations repo.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from version 4.x to 5.0 of the Dataverse software following the instructions in the release notes for version 5.0. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.13.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

  • $PAYARA/bin/asadmin list-applications
  • $PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

  • service payara stop
  • rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

  • service payara start

4. Deploy this version.

  • $PAYARA/bin/asadmin deploy dataverse-5.13.war

5. Restart Payara

  • service payara stop
  • service payara start

6. Reload citation metadata block

  • wget https://github.com/IQSS/dataverse/releases/download/v5.13/citation.tsv
  • curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

If you are running an English-only installation, you are finished with the citation block. Otherwise, download the updated citation.properties file and place in the dataverse.lang.directory.

  • wget https://github.com/IQSS/dataverse/releases/download/v5.13/citation.properties
  • cp citation.properties /home/dataverse/langBundles

7. Replace Solr schema.xml to allow multiple production locations and support for geospatial indexing to be used. See specific instructions below for those installations without custom metadata blocks (1a) and those with custom metadata blocks (1b).

Note: with this release support for indexing of the experimental workflow metadata block has been removed from the standard schema.xml. If you are using the workflow metadata block be sure to follow the instructions in step 7b) below to maintain support for indexing workflow metadata.

7a. For installations without custom or experimental metadata blocks:

  • Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide

  • Replace schema.xml

    • cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
  • Start solr instance (usually service solr start, depending on Solr/OS)

7b. For installations with custom or experimental metadata blocks:

  • Stop solr instance (usually service solr stop, depending on solr installation/OS, see the Installation Guide

  • Edit the following line to your schema.xml (to indicate that productionPlace is now multiValued='true"):

    <field name="productionPlace" type="string" stored="true" indexed="true" multiValued="true"/>

  • Add the following lines to your schema.xml to add support for geospatial indexing:

    <!-- Dataverse geospatial search --> <!-- https://solr.apache.org/guide/8_11/spatial-search.html#rpt --> <field name="geolocation" type="location_rpt" multiValued="true" stored="true" indexed="true"/> <!-- https://solr.apache.org/guide/8_11/spatial-search.html#bboxfield --> <field name="boundingBox" type="bbox" multiValued="true" stored="true" indexed="true"/> <!-- Dataverse - per GeoBlacklight, adding field type for bboxField that enables, among other things, overlap ratio calculations --> <fieldType name="bbox" class="solr.BBoxField" geo="true" distanceUnits="kilometers" numberType="pdouble" />

  • Restart Solr instance (usually service solr start, depending on solr/OS)

Optional Upgrade Step: Reindex Linked Dataverse Collections

Datasets that are part of linked dataverse collections will now be displayed in their linking dataverse collections. In order to fix the display of collections that have already been linked you must re-index the linked collections. This query will provide a list of commands to re-index the effected collections:

select 'curl http://localhost:8080/api/admin/index/dataverses/' 
|| tmp.dvid  from (select distinct  dataverse_id as dvid  
from dataverselinkingdataverse)  as tmp

The result of the query will be a list of re-index commands such as:

curl http://localhost:8080/api/admin/index/dataverses/633

where '633' is the id of the linked collection.

Optional Upgrade Step: Run File Detection on .eln Files

Now that .eln files are recognized, you can run the Redetect File Type API on them to switch them from "unknown" to "ELN Archive". Afterward, you can reindex these files to make them appear in search facets.