PlanktoScope_reference : plankton images captured with the PlanktoScope

Plankton was imaged with the PlanktoScope in different oceanic regions using different nets and protocol of conservation. This dataset aims to serve as reference for taxonomic identification with the PlanktoScope across 256 plankton taxa from 20µm to 300µm. Reference dataset can also serve as learning set for prediction in Ecotaxa (https://ecotaxa.obs-vlfr.fr/prj/15535). The full images were processed and segmented with the PlanktoScope around each individual. A set of associated features were measured on the objects with skimage.measure. All objects were classified into 256 different classes using the web application EcoTaxa (http://ecotaxa.obs-vlfr.fr). The following dataset corresponds to the 169, 149 objects and their calculated features. The different files provide information about the features of the objects, their taxonomic identification as well as the raw images. taxa.csv.gz Table of the classification of each object in the dataset, with columns: - object_id: unique object identifier in EcoTaxa. - annotation_category: taxonomic name corresponding to the last level of hierarchy - annotation_hierarchy: taxonomic lineage to the category - set: class of the image corresponding to the taxon - img_file_name: local path of the image corresponding to the taxon, named according to the object id features_native.csv.gz Table of morphological features computed by PlanktoScope. All features are computed on the object only, not the background. All area/length measures are in pixels. - object_id: unique object identifier in Ecotaxa And 33 features: - width: width of the smallest rectangle enclosing the object (pixel) - height: height of the smallest rectangle enclosing the object (pixel) - bx: X coordinates of the top left point of the smallest rectangle enclosing the object (pixel) - by: Y coordinates of the top left point of the smallest rectangle enclosing the object (pixel) - circ.: circularity of the object ((4∗π ∗Area)/Perim^2) [0-1] - area_exc: Surface area of the object excluding holes (pixel2) - area: Surface area of the object (pixel2) - %area: Percentage of object’s surface area that is comprised of holes - major: Length of the primary axis of the best fitting ellipse for the object (pixel) - minor: Length of the secondary axis of the best fitting ellipse for the object (pixel) - y: Y position of the center of gravity of the object (pixel) - x: X position of the center of gravity of the object (pixel) - convex_area: The area of the smallest polygon within which all points in the object fit (pixel2) - perim.: The length of the outside boundary of the object (pixel) - elongation: elongation index (major/minor) - perimareaexc: index of the relative complexity of the perimeter (perim/area_exc) - perimmajor: Index of the relative complexity of the perimeter (perim/major) - circex: Circularity of object excluding white pixels ((4 ∗ π ∗ Area_exc)/perim 2) - angle: Angle between the primary axis and a line parallel to the x-axis of the image - bounding_box_area: Area of the smallest box containing the object (pixel2) - eccentricity: Eccentricity of the ellipse that has the same second-moments as the region. Ratio of the focal distance of the ellipse over the major axis length [0-1] - equivalent_diameter: The diameter of a circle with the same area as the object (pixel) - euler_number: Euler characteristic of the set of non-zero pixels. Computed as number of connected components subtracted by number of holes - extent: Ratio of pixels in the object to pixels in the total bounding box - local_centroid_col: Horizontal coordinate of the center of mass of the object (pixel) - local_centroid_row: Vertical coordinate of the center of mass of the object (pixel) - solidity: Ratio of pixels in the object to pixels of the convex hull image (area / convex_area) - meanhue: Mean base color of the object in hue scale (0-360) - meansaturation: Mean saturation of the object [0-100] - meanvalue: Mean brightness of the object [0-100] - stdhue: Standard deviation of base color - stdsaturation: Standard deviation of saturation - stdvalue: Standard deviation of brightness inventory.tsv Tree view of the taxonomy and number of images in each taxon, displayed as text. With columns : - annotation_hierarchy: taxonomic lineage - annotation_category: name of the taxon - n: number of objects in each taxon group map.png Map of the sampling locations, to give an idea of the diversity sampled in this dataset. imgs Directory containing images of each object, named according to the object id object_id and sorted in subdirectories according to their taxon.

ZooCAMNet : plankton images captured with the ZooCAM

Plankton was sampled with a Continuous Underway Fish Egg Sampler (CUFES, 315µm mesh size) at 4 m below the surface, and a WP2 net (200µm mesh size) from 100m to the surface, or 5 m above the sea floor to the surface when the depth was < 100 m, in the Bay of Biscay. The full images were processed with the ZooCAM software and the embedded Matrox Imaging Library (Colas et a., 2018) which generated regions of interest (ROIs) around each individual object and a set of features measured on the object. The same objects were re-processed to compute features with the scikit-image library http://scikit-image.org. The 1, 286, 590 resulting objects were sorted by a limited number of operators, following a common taxonomic guide, into 93 taxa, using the web application EcoTaxa http://ecotaxa.obs-vlfr.fr. For the purpose of training machine learning classifiers, the images in each class were split into training, validation, and test sets, with proportions 70%, 15% and 15%. The archive contains : taxa.csv.gz Table of the classification of each object in the dataset, with columns : - objid : unique object identifier in EcoTaxa (integer number). - taxon_level1 : taxonomic name corresponding to the level 1 classification - lineage_level1 : taxonomic lineage corresponding to the level 1 classification - taxon_level2 : name of the taxon corresponding to the level 2 classification - plankton : if the object is a plankton or not (boolean) - set : class of the image corresponding to the taxon (train : training, val : validation, or test) - img_path : local path of the image corresponding to the taxon (of level 1), named according to the object id features_native.csv.gz Table of morphological features computed by ZooCAM. All features are computed on the object only, not the background. All area/length measures are in pixels. All grey levels are in encoded in 8 bits (0=black, 255=white). With columns : - area : object's surface - area_exc : object surface excluding white pixels - area_based_diameter : object's Area Based Diameter: 2 * (object_area/pi)^(1/2) - meangreyobjet : mean image grey level - modegreyobjet : modal object grey level - sigmagrey : object grey level standard deviation - mingrey : minimum object grey level - maxgrey : maximum object grey level - sumgrey : object grey level integrated density: object_mean*object_area - breadth : breadth of the object along the best fitting ellipsoid minor axis - length : breadth of the object along the best fitting ellipsoid majorr axis - elongation : elongation index: object_length/object_breadth - perim : object's perimeter - minferetdiam : minimum object's feret diameter - maxferetdiam : maximum object's feret diameter - meanferetdiam : average object's feret diameter - feretelongation : elongation index: object_maxferetdiam/object_minferetdiam - compactness : Isoperimetric quotient: the ration of the object's area to the area of a circle having the same perimeter - intercept0, intercept45 , intercept90, intercept135 : the number of times that a transition from background to foreground occurs a the angle 0ø, 45ø, 90ø and 135ø for the entire object - convexhullarea : area of the convex hull of the object - convexhullfillratio : ratio object_area/convexhullarea - convexperimeter : perimeter of the convex hull of the object - n_number_of_runs : number of horizontal strings of consecutive foreground pixels in the object - n_chained_pixels : number of chained pixels in the object - n_convex_hull_points : number of summits of the object's convex hull polygon - n_number_of_holes : number of holes (as closed white pixel area) in the object - roughness : measure of small scale variations of amplitude in the object's grey levels - rectangularity : ratio of the object's area over its best bounding rectangle's area - skewness : skewness of the object's grey level distribution - kurtosis : kurtosis of the object's grey level distribution - fractal_box : fractal dimension of the object's perimeter - hist25, hist50, hist75 : grey level value at quantile 0.25, 0.5 and 0.75 of the object's grey levels normalized cumulative histogram - valhist25, valhist50, valhist75 : sum of grey levels at quantile 0.25, 0.5 and 0.75 of the object's grey levels normalized cumulative histogram - nobj25, nobj50, nobj75 : number of objects after thresholding at the object_valhist25, object_valhist50 and object_valhist75 grey level - symetrieh :index of horizontal symmetry - symetriev : index of vertical symmetry - skelarea : area of the object skeleton - thick_r : maximum object's thickness/mean object's thickness - cdist : distance between the mass and the grey level object's centroids features_skimage.csv.gz Table of morphological features recomputed with skimage.measure.regionprops on the ROIs produced by ZooCAM. See http://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops for documentation. inventory.tsv Tree view of the taxonomy and number of images in each taxon, displayed as text. With columns : - lineage_level1 : taxonomic lineage corresponding to the level 1 classification - taxon_level1 : name of the taxon corresponding to the level 1 classification - n : number of objects in each taxon group map.png Map of the sampling locations, to give an idea of the diversity sampled in this dataset. imgs Directory containing images of each object, named according to the object id objid and sorted in subdirectories according to their taxon.

A global consistent database of plankton and detritus from in situ imaging by the Underwater Vision Profiler 5

Dataset summaryPlankton and detritus are essential components of the Earth’s oceans influencing biogeochemical cycles and carbon sequestration. Climate change impacts their composition and marine ecosystems as a whole. To improve our understanding of these changes, standardized observation methods and integrated global datasets are needed to enhance the accuracy of ecological and climate models. Here, we present a global dataset for plankton and detritus obtained by two versions of the Underwater Vision Profiler 5 (UVP5). This release contains the images classified in 33 homogenized categories, as well as the metadata associated with them, reaching 3,114 profiles and ca. 8 million objects acquired between 2008-2018 at global scale. The geographical distribution of the dataset is unbalanced, with the Equatorial region (30° S - 30° N) being the most represented, followed by the high latitudes in the northern hemisphere and lastly the high latitudes in the Southern Hemisphere. Detritus is the most abundant category in terms of concentration (90%) and biovolume (95%), although its classification in different morphotypes is still not well established. Copepoda was the most abundant taxa within the plankton, with Trichodesmium colonies being the second most abundant. The two versions of UVP5 (SD and HD) have different imagers, resulting in a different effective size range to analyse plankton and detritus from the images (HD objects >600 µm, SD objects >1 mm) and morphological properties (grey levels, etc.) presenting similar patterns, although the ranges may differ. A large number of images of plankton and detritus will be collected in the future by the UVP5, and the public availability of this dataset will help it being utilized as a training set for machine learning and being improved by the scientific community. This will reduce uncertainty by classifying previously unclassified objects and expand the classification categories, ultimately enhancing biodiversity quantification.Data tablesThe data set is organised according to:- samples : Underwater Vision Profiler 5 profiles, taken at a given point in space and time. - objects : individual UVP images, taken at a given depth along the each profile, on which various morphological features were measured and that where then classified taxonomically in EcoTaxa.samples and objects have unique identifiers. The sample_id is used to link the different tables of the data set together. All files are Tab separated values, UTF8 encoded, gzip compressed.samples.tsv.gz - sample_id unique sample identifier - sample_name original sample identifier - project EcoPart project title - lat, lon location [decimal degrees] - datetime date and time of start of profile [ISO 8601: YYYY-MM-DDTHH:MM:SSZ] - pixel_size size of one pixel [mm] - uvp_model version of the UVP: SD: standard definition, ZD: zoomed, HD: high definitionsamples_volume.tsv.gzAlong a profile, the UVP takes many images, each of a fixed volume. The profiles are cut into 5 m depth bins in which the number of images taken is recorded and hence the imaged volume is known. This is necessary to compute concentrations. - sample_id unique sample identifier - mid_depth_bin middle of the depth bin (2.5 = from 0 to 5 m depth) [m] - water_volume_imaged volume imaged = number of full images × unit volume [L]objects.tsv.gz - object_id unique object identifier - object_name original object identifier - sample_id unique sample identifier - depth depth at which the image was taken [m] - mid_depth_bin corresponding depth bin [m]; to match with samples_volumes - taxon original taxonomic name as in EcoTaxa; is not consistent across projects - lineage taxonomic lineage corresponding to that name - classif_author unique, anonymised identifier of the user who performed this classification - classif_datetime date and time at which the classification was - group broader taxonomic name, for which the identification is consistent over the whole dataset - group_lineage taxonomic lineage corresponding to this broader group - area_mm2 measurements on the object, in real worl units (i.e. comparable across the whole dataset) … - major_mm - area measurements on the objet, in [pixels] and therefore not directly comparable among the different UVP models and units - mean … - skeleton_area properties_per_bin.tsv.gzThe information above allows to compute concentrations, biovolumes, and average grey level within a given depth bin. The code to do so is in `summarise_objects_properties.R`. - sample_id unique sample identifier - depth_range range of depth over which the concentration/biovolume are computed: (start,end], in [m] where `(` means not including, `]` means including - group broad taxonomic group - concentration concentration [ind/L] - biovolume biovolume [mm3/L] - avg_grey average grey level of particles [no unit; 0 is black, 255 is white]ODV_biovolumes.txt, ODV_concentrations.txt, ODV_grey_levels.txtThis is the same information as above, formatted in a way that Ocean Data View https://odv.awi.de can read. In ODV, go to Import > ODV Spreadsheet and accept all default choices.ImagesThe images are provided in a separate, much larger, zip file. They are stored with the format `sample_id/object_id.jpg`, where `sample_id` and `object_id` are the integer identifiers used in the data tables above.

UVP5 data sorted with EcoTaxa and MorphoCluster

Here, we provide plankton image data that was sorted with the web applications EcoTaxa and MorphoCluster. The data set was used for image classification tasks as described in Schröder et. al (in preparation) and does not include any geospatial or temporal meta-data. Plankton was imaged using the Underwater Vision Profiler 5 (Picheral et al. 2010) in various regions of the world's oceans between 2012-10-24 and 2017-08-08. This data publication consists of an archive containing "training.csv" (list of 392k training images for classification, validated using EcoTaxa), "validation.csv" (list of 196k validation images for classification, validated using EcoTaxa), "unlabeld.csv" (list of 1M unlabeled images), "morphocluster.csv" (1.2M objects validated using MorphoCluster, a subset of "unlabeled.csv" and "validation.csv") and the image files themselves. The CSV files each contain the columns "object_id" (a unique ID), "image_fn" (the relative filename), and "label" (the assigned name). The training and validation sets were sorted into 65 classes using the web application EcoTaxa (http://ecotaxa.obs-vlfr.fr). This data shows a severe class imbalance; the 10% most populated classes contain more than 80% of the objects and the class sizes span four orders of magnitude. The validation set and a set of additional 1M unlabeled images were sorted during the first trial of MorphoCluster (https://github.com/morphocluster). The images in this data set were sampled during RV Meteor cruises M92, M93, M96, M97, M98, M105, M106, M107, M108, M116, M119, M121, M130, M131, M135, M136, M137 and M138, during RV Maria S Merian cruises MSM22, MSM23, MSM40 and MSM49, during the RV Polarstern cruise PS88b and during the FLUXES1 experiment with RV Sarmiento de Gamboa. The following people have contributed to the sorting of the image data on EcoTaxa: Rainer Kiko, Tristan Biard, Benjamin Blanc, Svenja Christiansen, Justine Courboules, Charlotte Eich, Jannik Faustmann, Christine Gawinski, Augustin Lafond, Aakash Panchal, Marc Picheral, Akanksha Singh and Helena Hauss In Schröder et al. (in preparation), the training set serves as a source for knowledge transfer in the training of the feature extractor. The classification using MorphoCluster was conducted by Rainer Kiko. Used labels are operational and not yet matched to respective EcoTaxa classes.

Morphometry and elemental composition of planktonic Rhizaria

This dataset contains the pictures used for morphometric measurements, as well as the elemental compositon and production rates data, of planktonic Rhizaria. Specimens were collected in the bay of Villefranche-sur-Mer in May 2019 and during the P2107 cruise in the California Current in July-August 2021. Analyses of the data can be found at https://github.com/MnnLgt/Elemental_composition_Rhizaria.

ZooScanNet: plankton images captured with the ZooScan

Plankton was sampled with various nets, from bottom or 500m depth to the surface, in many oceans of the world. Samples were imaged with a ZooScan. The full images were processed with ZooProcess which generated regions of interest (ROIs) around each individual object and a set of associated features measured on the object (see Gorsky et al 2010 for more information). The same objects were re-processed to compute features with the scikit-image toolbox http://scikit-image.org. The 1,451,745 resulting objects were sorted by a limited number of operators, following a common taxonomic guide, into 98 taxa, using the web application EcoTaxa http://ecotaxa.obs-vlfr.fr. For the purpose of training machine learning classifiers, the images in each class were split into training, validation, and test sets, with proportions 70%, 15% and 15%. The folder ZooScanNet_data.tar contains : taxa.csv.gz Table of the classification of each object in the dataset, with columns : - objid: unique object identifier in EcoTaxa (integer number) - taxon_level1: taxonomic name corresponding to the level 1 classification - lineage_level1: taxonomic lineage corresponding to the level 1 classification - taxon_level2: name of the taxon corresponding to the level 2 classification - plankton: if the object is a plankton or not (boolean) - set: class of the image corresponding to the taxon (train : training, val : validation, or test) - img_path: local path of the image corresponding to the taxon (of level 1), named according to the object id features_native.csv.gz Table of metadata of each object including the different features processed by ZooProcess. All features are computed on the object only, not the background. All area/length measures are in pixels. All grey levels are in encoded in 8 bits (0=black, 255=white). With columns: - objid: unique object identifier in EcoTaxa (integer number) And 48 features: - area - mean - stddev - mode - min/max - perim. - width,height - major,minor - circ. - feret - intden - median - skew,kurt - %area - area_exc - fractal - skelarea - slope - histcum1,2,3 - nb1,2,3 - symetrieh,symetriev - symetriehc,symetrievc - convperim,convarea - fcons - thickr: - esd - elongation - range - centroids - sr - perimareaexc - feretareaexc - perimferet/perimmajor - circex - cdexc See the “ZooScan” sheet - OBJECT metadata, annotation and measurements - , at https://doi.org/10.5281/zenodo.14704250 for definitions. features_skimage.csv.gz Table of morphological features recomputed with skimage.measure.regionprops on the ROIs produced by ZooProcess. See http://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops for documentation. inventory.tsv Tree view of the taxonomy and number of images in each taxon, displayed as text. With columns : - lineage_level1: taxonomic lineage corresponding to the level 1 classification - taxon_level1: name of the taxon corresponding to the level 1 classification - n: number of objects in each taxon class 2. Second folder ZooScanNet_imgs.tar contains : imgs Directory containing images of each object, named according to the object id objid and sorted in subdirectories according to their taxon. 3. And : map.png Map of the sampling locations, to give an idea of the diversity sampled in this dataset.

EurOBIS

The European Ocean Biogeographic Information System - EurOBIS - is an online marine biogeographic database compiling data on all living marine creatures. The principle aims of EurOBIS are to centralize the largely scattered biogeographic data on marine species collected by European institutions and to make these data freely available and easily accessible. All data go through a number of quality control procedures before they are made available online, assuring a minimum level of quality necessary to put the data to good use. The available data are either collected within European marine waters or by European researchers and institutes outside Europe. The database focuses on taxonomy and distribution records in space and time; all data can be searched and visualised through a set of online mapping tools. All data are freely available online and easily accessible, without requiring a login or password.

EMODnet Biology

EMODnet Biology provides three keys services and products to users. 1)The data download toolbox allows users to explore available datasets searching by source, geographical area, and/or time period. Datasets can be narrowed down using a taxonomic criteria, whether by species group (e.g. benthos, fish, algae, pigments) or by both scientific and common name. 2) The data catalogue is the easiest way to access nearly 1000 datasets available through EMODnet Biology. Datasets can be filtered by multiple parameters via the advanced search from taxon, to institute, to geographic region. Each of the resulting datasets then links to a detailed fact sheet containing a link to original data provider, recommended citation, policy and other relevant information. Data Products - EMODnet Biology combines different data from datasets with overlapping geographic scope and produces dynamic maps of selected species abundance. The first products are already available and they focus on species whose data records are most complete and span for a longer term.

ICES

The International Council for the Exploration of the Sea (ICES), is a global organization that develops science and advice to support the sustainable use of the oceans. ICES is a network of more than 5,000 scientists from over 690 marine institutes in 20 member countries and beyond. 1,500 scientists participate in our activities annually. ICES has a well-established Data Centre, which manages a number of large dataset collections related to the marine environment. The majority of data – covering the Northeast Atlantic, Baltic Sea, Greenland Sea, and Norwegian Sea – originate from national institutes that are part of the ICES network. The ICES Data Centre provides marine data services to ICES member countries, expert groups, world data centres, regional seas conventions (HELCOM and OSPAR), the European Environment Agency (EEA), Eurostat, and various other European projects and biodiversity portals. ICES aims to provide all data collections online and according to the ICES Data policy, which enables open access to all data that are do not fall under specific commercial or personal privacy concerns.