Dataverse: v6.2 Release

Release date:
April 1, 2024
Previous version:
v6.1 (released December 12, 2023)
Magnitude:
22,130 Diff Delta
Contributors:
26 total committers
Data confidence:
Commits:

133 Features Released with v6.2

Top Contributors in v6.2

qqmyers
steven-winship-8c8c
landreev
jp-tosca
pdurbin
GPortas
sekmiller
poikilotherm
luddaniel
vera

Directory Browser for v6.2

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

Dataverse 6.2

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.2 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

<a id="table-of-contents"></a>

Table of Contents

  • πŸ’‘ Release Highlights
  • [πŸͺ² Bug fixes](#-bug-fixes)
  • [πŸ’Ύ Persistence](#-persistence)
  • [🌐 API](#-api)
  • [⚠️ Backward Incompatibilities](#-backward-incompatibilities)
  • [πŸ“– Guides](#-guides)
  • [βš™οΈ New Settings](#-new-settings)
  • [πŸ“‹ Complete List of Changes](#-complete-list-of-changes)
  • [πŸ›Ÿ Getting Help](#-getting-help)
  • [πŸ’» Upgrade instructions](#-upgrade-instructions)

<a id="release-highlights"></a>

πŸ’‘Release Highlights

Search and Facet by License

License have been added to the search facets in the search side panel to filter datasets by license (e.g. CC0).

Datasets with Custom Terms are aggregated under the "Custom Terms" value of this facet. See the Licensing section of the guide for more details on configured Licenses and Custom Terms.

For more information, see #9060.

Licenses can also be used to filter the Search API results using the fq parameter, for example : /api/search?q=*&fq=license%3A%22CC0+1.0%22 for CC0 1.0, see the Search API guide for more examples.

For more information, see #10204.

When Returning Datasets to Authors, Reviewers Can Add a Note to the Author

The Popup for returning to author now allows to type in a message to explain the reasons of return and potential edits needed, that will be sent by email to the author.

Please note that this note is mandatory, but that you can still type a creative and meaningful comment such as "The author would like to modify his dataset", "Files are missing", "Nothing to report" or "A curation report with comments and suggestions/instructions will follow in another email" that suits your situation.

For more information, see #10137.

<a id="multiple-pid-sup"></a>

Support for Using Multiple PID Providers

This release adds support for using multiple PID (DOI, Handle, PermaLink) providers, multiple PID provider accounts (managing a given protocol, authority, separator, shoulder combination), assigning PID provider accounts to specific collections, and supporting transferred PIDs (where a PID is managed by an account when its authority, separator, and/or shoulder don't match the combination where the account can mint new PIDs). It also adds the ability for additional provider services beyond the existing DataCite, EZId, Handle, and PermaLink providers to be dynamically added as separate jar files.

These changes require per-provider settings rather than the global PID settings previously supported. While backward compatibility for installations using a single PID Provider account is provided, updating to use the new microprofile settings is highly recommended and will be required in a future version.

For more information check the PID settings on this link.

New microprofile settings

Rate Limiting

The option to rate limit has been added to prevent users from over taxing the system either deliberately or by runaway automated processes. Rate limiting can be configured on a tier level with tier 0 being reserved for guest users and tiers 1-any for authenticated users. Superuser accounts are exempt from rate limiting.

Rate limits can be imposed on command APIs by configuring the tier, the command, and the hourly limit in the database. Two database settings configure the rate limiting :RateLimitingDefaultCapacityTiers and RateLimitingCapacityByTierAndAction, If either of these settings exist in the database rate limiting will be enabled and If neither setting exists rate limiting is disabled.

For more details check the detailed guide on this link.

Simplified SMTP Configuration

With this release, we deprecate the usage of asadmin create-javamail-resource to configure Dataverse to send mail using your SMTP server and provide a simplified, standard alternative using JVM options or MicroProfile Config.

At this point, no action is required if you want to keep your current configuration. Warnings will show in your server logs to inform and remind you about the deprecation. A future major release of Dataverse may remove this way of configuration.

Please do take the opportunity to update your SMTP configuration. Details can be found in section of the Installation Guide starting with the SMTP/Email Configuration section of the Installation Guide.

Once reconfiguration is complete, you should remove legacy, unused config. First, run asadmin delete-javamail-resource mail/notifyMailSession as described in the 6.2 guides. Then run curl -X DELETE http://localhost:8080/api/admin/settings/:SystemEmail as this database setting has been replace with dataverse.mail.system-email as described below.

Please note: as there have been problems with email delivered to SPAM folders when the "From" within mail envelope and the mail session configuration didn't match (#4210), as of this version the sole source for the "From" address is the setting dataverse.mail.system-email once you migrate to the new way of configuration.

New SMTP settings:

Binder Redirect

If your installation is configured to use Binder, you should remove the old "girder_ythub" tool and replace it with the tool described at https://github.com/IQSS/dataverse-binder-redirect

For more information, see #10360.

Optional Croissant πŸ₯ Exporter Support

When a Dataverse installation is configured to use a metadata exporter for the Croissant format, the content of the JSON-LD in the <head> of dataset landing pages will be replaced with that format. However, both JSON-LD and Croissant will still be available for download from the dataset page and API.

For more information, see #10382.

Harvesting Handle Missing Controlled Values

Allows datasets to be harvested with Controlled Vocabulary Values that existed in the originating Dataverse installation but are not in the harvesting Dataverse installation. For more information, view the changes to the endpoint here.

Add .QPJ and .QMD Extensions to Shapefile Handling

Support for .qpj and .qmd files in shapefile uploads has been introduced, ensuring that these files are properly recognized and handled as part of geospatial datasets in Dataverse.

For more information, see #10305.

Ingested Tabular Data Files Can Be Stored Without the Variable Name Header

Tabular Data Ingest can now save the generated archival files with the list of variable names added as the first tab-delimited line.

Access API will be able to take advantage of Direct Download for .tab files saved with these headers on S3 - since they no longer have to be generated and added to the streamed content on the fly.

This behavior is controlled by the new setting :StoreIngestedTabularFilesWithVarHeaders. It is false by default, preserving the legacy behavior. When enabled, Dataverse will be able to handle both the newly ingested files, and any already-existing legacy files stored without these headers transparently to the user. E.g. the access API will continue delivering tab-delimited files with this header line, whether it needs to add it dynamically for the legacy files, or reading complete files directly from storage for the ones stored with it.

We are planning to add an API for converting existing legacy tabular files in a future release.

For more information, see #10282.

Uningest/Reingest Options Available in the File Page Edit Menu

New Uningest/Reingest options are available in the File Page Edit menu. Ingest errors can be cleared by users who can published the associated dataset and by superusers, allowing for a successful ingest to be undone or retried (e.g. after a Dataverse version update or if ingest size limits are changed).

The /api/files/<id>/uningest api also now allows users who can publish the dataset to undo an ingest failure.

For more information, see #10319.

Sphinx Guides Now Support Markdown Format and Tabs

Our guides now support the Markdown format with the extension .md. Additionally, an option to create tabs in the guides using Sphinx Tabs has been added. (You can see the tabs in action in the "dev usage" page of the Container Guide.) To continue building the guides, you will need to install this new dependency by re-running:
pip install -r requirements.txt

For more information, see #10111.

Number of Concurrent Indexing Operations Now Configurable

A new MicroProfile setting called dataverse.solr.concurrency.max-async-indexes has been added that controls the maximum number of simultaneously running asynchronous dataset index operations (defaults to 4).

For more information, see #10388.

⬆️

<a id="-bug-fixes"></a>

πŸͺ² Bug fixes

Publication Status Facet Restored

In version 6.1, the publication status facet location was unintentionally moved to the bottom. In this version, we have restored the original order.

Assign a Role With Higher Permissions Than Its Own Role Has Been Fixed

The permissions required to assign a role have been fixed. It is no longer possible to assign a role that includes permissions that the assigning user doesn't have.

Geospatial Metadata Block Fields for North and South Renamed

The Geospatial metadata block fields for north and south were labeled incorrectly as longitudes, as reported in #5645. After updating to this version of Dataverse, users will need to update any API client code used "northLongitude" and "southLongitude" to "northLatitude" and "southLatitude", respectively, as mentioned on the mailing list. Also, we have updated the tooltips in the Geospatial metadata block, where the use of commas instead of dots in coordinate values was incorrectly suggested.

OAI-PMH Error Handling Has Been Improved

OAI-PMH error handling has been improved to display a machine-readable error in XML rather than a 500 error with no further information.

  • /oai?foo=bar will show "No argument 'verb' found"
  • /oai?verb=foo&verb=bar will show "Verb must be singular, given: '[foo, bar]'"

Granting File Access Without Access Request

A bug introduced with the guestbook-at-request, requests are not deleted when granted, they are now given the state granted.

Harvesting redirects fixed

Redirects from search cards back to the original source for datasets harvested from "Generic OAI Archives", i.e. non-Dataverse OAI servers, have been fixed.

⬆️

<a id="-persistence"></a>

πŸ’Ύ Persistence

Missing Database Constraints

This release adds two missing database constraints that will assure that the externalvocabularyvalue table only has one entry for each uri and that the oaiset table only has one set for each spec. (In the very unlikely case that your existing database has duplicate entries now, install would fail. This can be checked by running the following commands:

SELECT uri, count(*) FROM externalvocabularyvalue group by uri;

And: SELECT spec, count(*) FROM oaiset group by spec; Then removing any duplicate rows (where count>1).

Universe Field in Variablemetadata Table Changed

Universe field in variablemetadata table was changed from varchar(255) to text. The change was made to support longer strings in "universe" metadata field, similar to the rest of text fields in variablemetadata table.

PostgreSQL Versions

This release adds install script support for the new permissions model in PostgreSQL versions 15+, and bumps Flyway to support PostgreSQL 16.

PostgreSQL 13 remains the version used with automated testing.

⬆️

<a id="-api"></a>

🌐 API

Listing Collection/Dataverse API

Listing collection/dataverse role assignments via API still requires ManageDataversePermissions, but listing dataset role assignments via API now requires only ManageDatasetPermissions.

New API Endpoint for Clearing an Individual Dataset From Solr

A new Index API endpoint has been added allowing an admin to clear an individual dataset from Solr.

For more information visit the documentation on this link

New Accounts Metrics API

Users can retrieve new types of metrics related to user accounts. The new capabilities are described in the guides.

New canDownloadAtLeastOneFile Endpoint

The /api/datasets/{id}/versions/{versionId}/canDownloadAtLeastOneFile endpoint has been created.

This API endpoint indicates if the calling user can download at least one file from a dataset version. Note that Shibboleth group permissions are not considered.

Harvesting Client Endpoint Extended

The API endpoint api/harvest/clients/{harvestingClientNickname} has been extended to include the following fields:

  • allowHarvestingMissingCVV: enable/disable allowing datasets to be harvested with controlled vocabulary values that exist in the originating Dataverse server but are not present in the harvesting Dataverse server. The default is false.

Note: This setting is only available to the API and not currently accessible/settable via the UI.

Version Files Endpoint Extended

The response for getVersionFiles /api/datasets/{id}/versions/{versionId}/files endpoint has been modified to include a total count of records available totalCount:x. This will aid in pagination by allowing the caller to know how many pages can be iterated through. The existing API (getVersionFileCounts) to return the count will still be available.

Metadata Blocks Endpoint Extended

The API endpoint /api/metadatablocks/{block_id} has been extended to include the following fields:

  • isRequired: Whether or not this field is required
  • displayOrder: The display order of the field in create/edit forms
  • typeClass: The type class of this field ("controlledVocabulary", "compound", or "primitive")

Get File Citation as JSON

It is now possible to retrieve via API the file citation as it appears on the file landing page. It is formatted in HTML and encoded in JSON.

This API is not for downloading various citation formats such as EndNote XML, RIS, or BibTeX.

For more information check the documentation on this link

Files Endpoint Extended

The API endpoint api/files/{id} has been extended to support the following optional query parameters:

  • includeDeaccessioned: Indicates whether or not to consider deaccessioned dataset versions in the latest file search. (Default: false).
  • returnDatasetVersion: Indicates whether or not to include the dataset version of the file in the response. (Default: false).

A new endpoint api/files/{id}/versions/{datasetVersionId} has been created. This endpoint returns the file metadata present in the requested dataset version. To specify the dataset version, you can use :latest-published, :latest, :draft or 1.0 or any other available version identifier.

The endpoint supports the includeDeaccessioned and returnDatasetVersion optional query parameters, as does the api/files/{id} endpoint.

api/files/{id}/draft endpoint is no longer available in favor of the new endpoint api/files/{id}/versions/{datasetVersionId}, which can use the version identifier :draft (api/files/{id}/versions/:draft) to obtain the same result.

Datasets, Dataverse Collections, and Datafiles Endpoints Extended

The API endpoints for getting datasets, Dataverse collections, and datafiles have been extended to support the following optional 'returnOwners' query parameter.

Including the parameter and setting it to true will add a hierarchy showing which dataset and dataverse collection(s) the object is part of to the json object returned.

For more information visit the full native API guide on this link

Endpoint Fixed: Datasets Metadata

The API endpoint api/datasets/{id}/metadata has been changed to default to the latest version of the dataset to which the user has access.

Experimental Make Data Count processingState API

An experimental Make Data Count processingState API has been added. For now it has been documented in the (developer guide)[https://guides.dataverse.org/en/6.2/developers/make-data-count.html#processing-archived-logs].

⬆️ <a id="-backward-incompatibilities"></a>

⚠️ Backward Incompatibilities

To view a list of changes that can be impactful to your implementation please visit our detailed list of changes to the API.

⬆️

<a id="-guides"></a>

πŸ“– Guides

Container Guide, Documentation for Faster Redeploy

In the Container Guide, documentation for developers on how to quickly redeploy code has been added for Netbeans and improved for IntelliJ.

Also in the context of containers, a new option to skip deployment has been added and the war file is now consistently named "dataverse.war" rather than having a version in the filename, such as "dataverse-6.1.war". This predictability makes tooling easier.

Evaluation Version Tutorial on the Containers Guide

The Container Guide now containers a tutorial for running Dataverse in containers for demo or evaluation purposes: https://guides.dataverse.org/en/6.2/container/running/demo.html

New QA Guide

A new QA Guide is intended mostly for the core development team but may be of interest to contributors on: https://guides.dataverse.org/en/6.2/develop/qa

⬆️

<a id="-new-settings"></a>

βš™οΈ New Settings

<a id="microprofile-settings"></a>

MicroProfile Settings

The * indicates a provider id indicating which provider the setting is for

  • dataverse.pid.providers
  • dataverse.pid.default-provider
  • dataverse.pid.*.type
  • dataverse.pid.*.label
  • dataverse.pid.*.authority
  • dataverse.pid.*.shoulder
  • dataverse.pid.*.identifier-generation-style
  • dataverse.pid.*.datafile-pid-format
  • dataverse.pid.*.managed-list
  • dataverse.pid.*.excluded-list
  • dataverse.pid.*.datacite.mds-api-url
  • dataverse.pid.*.datacite.rest-api-url
  • dataverse.pid.*.datacite.username
  • dataverse.pid.*.datacite.password
  • dataverse.pid.*.ezid.api-url
  • dataverse.pid.*.ezid.username
  • dataverse.pid.*.ezid.password
  • dataverse.pid.*.permalink.base-url
  • dataverse.pid.*.permalink.separator
  • dataverse.pid.*.handlenet.index
  • dataverse.pid.*.handlenet.independent-service
  • dataverse.pid.*.handlenet.auth-handle
  • dataverse.pid.*.handlenet.key.path
  • dataverse.pid.*.handlenet.key.passphrase
  • dataverse.spi.pidproviders.directory
  • dataverse.solr.concurrency.max-async-indexes

<a id="smtp-settings"></a>

SMTP Settings:

  • dataverse.mail.system-email
  • dataverse.mail.mta.host
  • dataverse.mail.mta.port
  • dataverse.mail.mta.ssl.enable
  • dataverse.mail.mta.auth
  • dataverse.mail.mta.user
  • dataverse.mail.mta.password
  • dataverse.mail.mta.allow-utf8-addresses
  • Plus many more for advanced usage and special provider requirements. See configuration guide for a full list.

Database Settings:

  • :RateLimitingDefaultCapacityTiers
  • :RateLimitingCapacityByTierAndAction
  • :StoreIngestedTabularFilesWithVarHeaders

<a id="-complete-list-of-changes"></a>

πŸ“‹ Complete List of Changes

For the complete list of code changes in this release, see the 6.2 Milestone in GitHub.

⬆️

<a id="-getting-help"></a>

πŸ›Ÿ Getting Help

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email [email protected].

⬆️

<a id="-upgrade-instructions"></a>

πŸ’» Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.1.

0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.

export PAYARA=/usr/local/payara6

(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)

1. Usually, when a Solr schema update is released, we recommend deploying the new version of Dataverse, then updating the schema.xml on the solr side. With 6.2, we recommend to install the base schema first. Without it Dataverse 6.2 is not going to be able to show any results after the initial deployment. If your instance is using any custom metadata blocks, you will need to further modify the schema, see the last step of this instruction (step 8).

  • Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)

  • Replace schema.xml

    • wget https://raw.githubusercontent.com/IQSS/dataverse/v6.2/conf/solr/9.3.0/schema.xml
    • cp schema.xml /usr/local/solr/solr-9.3.0/server/solr/collection1/conf
  • Start Solr instance (usually service solr start, depending on Solr/OS)

2. Undeploy the previous version.

  • $PAYARA/bin/asadmin undeploy dataverse-6.1

3. Stop Payara and remove the generated directory

  • service payara stop
  • rm -rf $PAYARA/glassfish/domains/domain1/generated

4. Start Payara

  • service payara start

5. Deploy this version.

  • $PAYARA/bin/asadmin deploy dataverse-6.2.war

As noted above, deployment of the war file might take several minutes due a database migration script required for the new storage quotas feature.

6. Restart Payara

  • service payara stop
  • service payara start

7. Update the following Metadata Blocks to reflect the incremental improvements made to the handling of core metadata fields:

 wget https://github.com/IQSS/dataverse/releases/download/v6.2/geospatial.tsv

 curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file geospatial.tsv

wget https://github.com/IQSS/dataverse/releases/download/v6.2/citation.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv

wget https://github.com/IQSS/dataverse/releases/download/v6.2/astrophysics.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file astrophysics.tsv

wget https://github.com/IQSS/dataverse/releases/download/v6.2/biomedical.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file biomedical.tsv

8. For installations with custom or experimental metadata blocks:

  • Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)

  • Run the update-fields.sh script that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your solr installation): wget https://raw.githubusercontent.com/IQSS/dataverse/v6.2/conf/solr/9.3.0/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.3.0/server/solr/collection1/conf/schema.xml

  • Restart Solr instance (usually service solr restart depending on solr/OS)

9. Reindex Solr:

For details, see https://guides.dataverse.org/en/6.2/admin/solr-search-index.html but here is the reindex command:

   curl http://localhost:8080/api/admin/index

⬆️