Département de Biologie, Université Laval, Québec, Canada
Type of resources
Available actions
Topics
Keywords
Contact for the resource
Provided by
Years
Formats
-
Dataset summary Plankton and detritus are essential components of the Earth’s oceans influencing biogeochemical cycles and carbon sequestration. Climate change impacts their composition and marine ecosystems as a whole. To improve our understanding of these changes, standardized observation methods and integrated global datasets are needed to enhance the accuracy of ecological and climate models. Here, we present a global dataset for plankton and detritus obtained by two versions of the Underwater Vision Profiler 5 (UVP5). This release contains the images classified in 33 homogenized categories, as well as the metadata associated with them, reaching 3,114 profiles and ca. 8 million objects acquired between 2008-2018 at global scale. The geographical distribution of the dataset is unbalanced, with the Equatorial region (30° S - 30° N) being the most represented, followed by the high latitudes in the northern hemisphere and lastly the high latitudes in the Southern Hemisphere. Detritus is the most abundant category in terms of concentration (90%) and biovolume (95%), although its classification in different morphotypes is still not well established. Copepoda was the most abundant taxa within the plankton, with Trichodesmium colonies being the second most abundant. The two versions of UVP5 (SD and HD) have different imagers, resulting in a different effective size range to analyse plankton and detritus from the images (HD objects >600 µm, SD objects >1 mm) and morphological properties (grey levels, etc.) presenting similar patterns, although the ranges may differ. A large number of images of plankton and detritus will be collected in the future by the UVP5, and the public availability of this dataset will help it being utilized as a training set for machine learning and being improved by the scientific community. This will reduce uncertainty by classifying previously unclassified objects and expand the classification categories, ultimately enhancing biodiversity quantification. Data tables The data set is organised according to: - samples : Underwater Vision Profiler 5 profiles, taken at a given point in space and time. - objects : individual UVP images, taken at a given depth along the each profile, on which various morphological features were measured and that where then classified taxonomically in EcoTaxa. samples and objects have unique identifiers. The sample_id is used to link the different tables of the data set together. All files are Tab separated values, UTF8 encoded, gzip compressed. samples.tsv.gz - sample_id <int> unique sample identifier - sample_name <text> original sample identifier - project <text> EcoPart project title - lat, lon <float> location [decimal degrees] - datetime <text> date and time of start of profile [ISO 8601: YYYY-MM-DDTHH:MM:SSZ] - pixel_size <float> size of one pixel [mm] - uvp_model <text> version of the UVP: SD: standard definition, ZD: zoomed, HD: high definition samples_volume.tsv.gz Along a profile, the UVP takes many images, each of a fixed volume. The profiles are cut into 5 m depth bins in which the number of images taken is recorded and hence the imaged volume is known. This is necessary to compute concentrations. - sample_id <int> unique sample identifier - mid_depth_bin <float> middle of the depth bin (2.5 = from 0 to 5 m depth) [m] - water_volume_imaged <float> volume imaged = number of full images × unit volume [L] objects.tsv.gz - object_id <int> unique object identifier - object_name <text> original object identifier - sample_id <int> unique sample identifier - depth <float> depth at which the image was taken [m] - mid_depth_bin <float> corresponding depth bin [m]; to match with samples_volumes - taxon <text> original taxonomic name as in EcoTaxa; is not consistent across projects - lineage <text> taxonomic lineage corresponding to that name - classif_author <text> unique, anonymised identifier of the user who performed this classification - classif_datetime <text> date and time at which the classification was - group <text> broader taxonomic name, for which the identification is consistent over the whole dataset - group_lineage <text> taxonomic lineage corresponding to this broader group - area_mm2 <float> measurements on the object, in real worl units (i.e. comparable across the whole dataset) … - major_mm <float> - area <float> measurements on the objet, in [pixels] and therefore not directly comparable among the different UVP models and units - mean <float> … - skeleton_area <float> properties_per_bin.tsv.gz The information above allows to compute concentrations, biovolumes, and average grey level within a given depth bin. The code to do so is in `summarise_objects_properties.R`. - sample_id <int> unique sample identifier - depth_range <text> range of depth over which the concentration/biovolume are computed: (start,end], in [m] where `(` means not including, `]` means including - group <text> broad taxonomic group - concentration <float> concentration [ind/L] - biovolume <float> biovolume [mm3/L] - avg_grey <float> average grey level of particles [no unit; 0 is black, 255 is white] ODV_biovolumes.txt, ODV_concentrations.txt, ODV_grey_levels.txt This is the same information as above, formatted in a way that Ocean Data View https://odv.awi.de can read. In ODV, go to Import > ODV Spreadsheet and accept all default choices. Images The images are provided in a separate, much larger, zip file. They are stored with the format `sample_id/object_id.jpg`, where `sample_id` and `object_id` are the integer identifiers used in the data tables above.