Matches in Nanopublications for { ?s <http://schema.org/description> ?o ?g. }
- fefac0dc-152e-4acb-9d10-00e755a59cee description "This is a definition file to illustrate the use of an OpenSUSE/Leap:15.3 container with MPIch4.1.2 and a "dummy" LibFabric1.14.0 to run the OSU Micro-benchmarks tests version 7.3" assertion.
- 7407i2444l6p705u84397342_3980ij4j02pp2589r087x3wb9887cd5f description "Data collecteted in 2021 and 2022 by CNR- ISMAR VE within the MAELSTROM project (ask author for password)" assertion.
- 7f467b48-95e1-4c0f-93c6-2cfda36a8600 description "Bathymetric data collected in three diffrent surveys (October 2021, May 2022 and November 2022) in Sacca Fisola, Venice, Italy, by CNR- ISMAR VE within the MAELSTROM project." assertion.
- 1151 description "" assertion.
- c1dff77c-6c59-46c8-8f5e-025155da31e9 description "Data collected in November 2022 by CNR- ISMAR VE within the MAELSTROM project" assertion.
- dd465b46-0217-426a-ba81-4acadf0d12b9 description "Data collected in October 2021 by CNR- ISMAR VE within the MAELSTROM project" assertion.
- dd465b46-0217-426a-ba81-4acadf0d12b9 description "Data collected in October 2021 by CNR- ISMAR VE within the MAELSTROM project with the aim to map marine litter on the seafloor" assertion.
- dd465b46-0217-426a-ba81-4acadf0d12b9 description "Bathymetry metadata description" assertion.
- green-ai-sustainable-shipping-gass-project-funded-green-platform-initiative description "GASS description from Simula website" assertion.
- 012027 description "Using digital twins in voyage performance evaluation is becoming critical for ocean vessels to reduce GHG emissions. A novel GBM approach is proposed in this paper to establish a digital twin model for voyage performance prediction. The weather hindcast data are introduced to enrich noon reports (NR) and automatic identification system (AIS) datasets, which are split into training and validation sets to develop GBM. The NR and AIS datasets collected from a 57000DWT bulk carrier are used to demonstrate the fidelity and capability of the proposed GBM. The voyage performance prediction from the GBM shows better accuracy than those from pure WBM or pure BBMs. An arrival time forecast and a weather routing showcase are also presented to demonstrate the application effects of GBM. The proposed GBM provides a satisfying prediction of ship speed and fuel consumption without mandatory sensor-collected data, thus applicable for a varity of vessels. In those cases where more sensors are available onboard, the proposed approach can incorporate sensor data to improve the model accuracy further." assertion.
- 412dedcc-3783-4f13-ab34-4031e684eb71 description "Private Research Object containing documentation about the GASS project." assertion.
- a9ba4018-6136-4986-bc6a-77ab1f4e279c description "AI-enhanced technology and Go2Market services for enabling sustainable shipping." assertion.
- caf6570d-f6b9-4778-84f0-c7c218da956d description "Folder containing documents about the GASS project." assertion.
- 09f604ba-1d4b-45fe-8816-3333fd3534d0 description "Photo by <a href="https://unsplash.com/@carrier_lost?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Ian Taylor</a> on <a href="https://unsplash.com/photos/blue-and-red-cargo-ship-on-sea-during-daytime-jOqJbvo1P9g?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Unsplash</a>" assertion.
- 978-3-031-51819-5 description "This book is open access, which means that you have free and unlimited access A one-stop reference on Digital Twin, covering basics, essential topics and future directions Provides the design and implementation of Digital Twin for 6G and Internet of Things" assertion.
- 3ec0b51b-0346-4ed8-b731-de744dd3bee2 description "Datasets used in the manuscript CMST 29(1-4) 37-44 (2023) DOI:10.12921/cmst.2023.0000023" assertion.
- 3b70d877-b722-4108-9f2c-13ffded4a078 description "This data sheet contains data used in the manuscript. The data concerns systems of f.c.c. hard spheres with size C1 [001]-nanochannel filled with hard dimers formed by spheres of diferent diameter (equal to sigma’). The length of the dimers is equal to sigma (L=sigma)." assertion.
- 50315613-3195-4d1a-ac95-7906506466e8 description "This data sheet contains reference data used in the manuscript. The data concerns systems of f.c.c. hard spheres with size C2 [001]-nanochannel filled with hard spheres of diferent diameter (equal to sigma’)" assertion.
- 83cf1e74-8269-4e22-ab04-f0e823186300 description "This data sheet contains data used in the manuscript. The data concerns systems of f.c.c. hard spheres with size C2 [001]-nanochannel filled with hard dimers formed by spheres of diferent diameter (equal to sigma’). The length of the dimers is equal to sigma (L=sigma)." assertion.
- a566b59a-255d-4983-ab55-3ae88c3f9356 description "This data sheet contains reference data used in the manuscript. The data concerns systems of f.c.c. hard spheres with size C1 [001]-nanochannel filled with hard spheres of diferent diameter (equal to sigma’)" assertion.
- ac305d86-0cb4-4281-b8ee-5e87946b2fbf description "This data sheet contains data used in the manuscript. The data concerns systems of f.c.c. hard spheres with size C2 [001]-nanochannel filled with hard dimers formed by spheres of diferent diameter (equal to sigma’). The length of the dimers is equal to sigma’ (L=sigma’)." assertion.
- cd6cbce7-1d8e-49ea-b91e-b4047ddcbeec description "This data sheet contains data used in the manuscript. The data concerns systems of f.c.c. hard spheres with size C1 [001]-nanochannel filled with hard dimers formed by spheres of diferent diameter (equal to sigma’). The length of the dimers is equal to sigma’ (L=sigma’)." assertion.
- 13dfbe3b-3132-4089-9ec9-5ae100fb143c description ""While computer science papers frequently include their associated code repositories, establishing a clear link between papers and their corresponding implementations can be challenging due to the number of code repositories used by research publications. In this paper we describe a lightweight method for effectively identifying bidirectional links between papers and repositories from both LaTeX and PDF sources. We have used our approach to analyze more than 14000 PDF and Latex files in the Software Engineering category of Arxiv, generating a dataset of more than 1400 paper-code implementations and assessing current citation practices on it." assertion.
- f7046e84-ab9e-445c-b361-f145e123fa42 description "A transformative three-day hackathon titled Harnessing Digital Twinning for Sustainable Agriculture: Predictive Characterisation and Conservation of Crop Wild Relatives, takes place from January 23rd to 25th, 2024 in Oslo, Norway. The event, promises to unravel the immense potential of digital twinning in revolutionizing global agricultural practices. Dates: 23 January 2024 - 25 January 2024 Location: Oslo, Norway Home page: https://biodt.eu/events/biodt-hackathon-bring-your-own-data-byod-second-end-users-workshop" assertion.
- 17ab9a92-c813-4a43-a7c6-1dc03062a870 description "A test RO" assertion.
- 49f1aee4-f2ff-4dc2-b48b-396343d6fcb0 description "readings" assertion.
- ee8895fd-fe5a-46d7-9228-9d98a2d3a205 description "The Ohio State University (OSU) Micro Benchmarks (OMB) are a widely used suite of benchmarks for measuring and evaluating the performance of MPI operations for point-to-point, multi-pair, and collective communications. These benchmarks are often used for comparing different Message Passing Inerface (MPI) implementations and the underlying network interconnect. Here we use the OSU micro-benchmark (version 7.2) to assess the performance in terms of bandwidth achieved with an Apptainer container between 2 processors on different nodes with OpenMPI (version 4.1.6) on the Norwegian academic High Performance Computers (HPC) located in Tromsø (Fram) and Trondheim (Betzy)." assertion.
- 8d871c23-a90b-47a0-8e62-2bbb450f8054 description "Plot showing the bandwidth as a function of the message size on Fram and Betzy" assertion.
- c1d71471-3c76-43a4-a1fa-2cbe6ee7a84a description "Output of the OSU MPI Get Bandwidth Test with openMPI 4.1.6 on Fram and Betzy" assertion.
- eace71f0-71ce-448e-b417-35ee83993516 description "Research Object for teaching FAIR practices with ROHub at the Digital Scholarship Days (2024). This Research Object contains some resources that are not FAIR and it is used to learn how to check the FAIRness of a Research Object and to improve it." assertion.
- 6a3b5d64-72d8-42c6-9e00-156a82ef3ac0 description "Github repository containing the source of the training material for the FAIR - More than just a buzzword given at the Digital Scholarship days in 2024." assertion.
- e0196100-a91f-4345-8b94-6332c5ef9eb5 description "Website that is rendered automatically from the corresponding github repository." assertion.
- intro.html description "Website that is rendered automatically from the corresponding github repository." assertion.
- DSD_FAIR description "Github repository containing the source of the training material for the FAIR - More than just a buzzword given at the Digital Scholarship days in 2024." assertion.
- e3872496-7fef-4456-8608-8d7b094d7d05 description "Research Object for teaching FAIR practices with ROHub at the Digital Scholarship Days (2024). This Research Object contains some resources that are not FAIR and it is used to learn how to check the FAIRness of a Research Object and to improve it." assertion.
- 97b0167c-0cb4-457d-abe8-41d1a9d1b981 description "The Ohio State University (OSU) Micro Benchmarks (OMB) are a widely used suite of benchmarks for measuring and evaluating the performance of MPI operations for point-to-point, multi-pair, and collective communications. These benchmarks are often used for comparing different Message Passing Inerface (MPI) implementations and the underlying network interconnect. Here we use the OSU micro-benchmark (version 7.2) to assess the performance in terms of bandwidth achieved with an Apptainer container between 2 processors on different nodes with OpenMPI (version 4.1.6) on the Norwegian academic High Performance Computers (HPC) located in Tromsø (Fram) and Trondheim (Betzy)." assertion.
- 38ca3097-841d-44cf-b55e-b614cf4c353f description "Abstract—Open MPI is an open-source implementation of the MPI-3 standard that is developed and maintained by collaborators from academia, industry, and national laboratories. Oak Ridge National Laboratory (ORNL) and Los Alamos National Laboratory (LANL) are collaborating on porting and optimizing Open MPI and related components for use on HPE Cray EX systems, with a focus on the DOE Frontier and Aurora exa-scale systems. A key component of this effort involves development of a new LinkX Open Fabrics Interface (OFI) provider. In this paper, we describe enhancements to Open MPI, OpenPMIx runtime components, and the LinkX OFI provider. Performance results are presented for point to point and collective communication operations using both the vendor CXI provider and the LinkX provider, including results obtained using GPU accelerators. Recommended deployment options for EX systems will be discussed, along with future work." assertion.
- 55d5e2f5-c395-4da5-a7d1-9621c480d0ef description "Plot showing the bandwidth as a function of the message size on Fram and Betzy" assertion.
- abbe45f6-a4c4-4ec4-af82-79bb0a95440e description "Output of the OSU MPI Get Bandwidth Test with openMPI 4.1.6 on Fram and Betzy" assertion.
- s11390-023-2907-5 description "Abstract The Slingshot interconnect designed by HPE/Cray is becoming more relevant in high-performance computing with its deployment on the upcoming exascale systems. In particular, it is the interconnect empowering the first exascale and highest-ranked supercomputer in the world, Frontier. It offers various features such as adaptive routing, congestion control, and isolated workloads. The deployment of newer interconnects sparks interest related to performance, scalability, and any potential bottlenecks as they are critical elements contributing to the scalability across nodes on these systems. In this paper, we delve into the challenges the Slingshot interconnect poses with current state-of-the-art MPI (message passing interface) libraries. In particular, we look at the scalability performance when using Slingshot across nodes. We present a comprehensive evaluation using various MPI and communication libraries including Cray MPICH, OpenMPI + UCX, RCCL, and MVAPICH2 on CPUs and GPUs on the Spock system, an early access cluster deployed with Slingshot-10, AMD MI100 GPUs and AMD Epyc Rome CPUs to emulate the Frontier system. We also evaluate preliminary CPU-b:ed support of MPI libraries on the Slingshot-11 interconnect." assertion.
- 3125b7be-03f9-447e-806f-20beb66f7949 description "The Ohio State University (OSU) Micro Benchmarks (OMB) are a widely used suite of benchmarks for measuring and evaluating the performance of MPI operations for point-to-point, multi-pair, and collective communications. These benchmarks are often used for comparing different Message Passing Inerface (MPI) implementations and the underlying network interconnect. Here we use the OSU micro-benchmark (version 7.2) to assess the performance in terms of bandwidth achieved with an Apptainer container between 2 processors on different nodes with OpenMPI (version 4.1.6) on the Norwegian academic High Performance Computers (HPC) located in Tromsø (Fram) and Trondheim (Betzy)." assertion.
- bdf5c934-6836-4f3c-a2f1-3438b7cd91ae description "Output of the OSU MPI Get Bandwidth Test with openMPI 4.1.6 on Fram and Betzy" assertion.
- ecc14c39-87d8-4cda-8e54-2e5b5a6bd9cc description "Plot showing the bandwidth as a function of the message size on Fram and Betzy" assertion.
- global-fish-tracking-system-gfts description "Link to the official GFTS DESP use case." assertion.
- gfts.minrk.net description "Link to the Pangeo JupyterHub we are using for developing Pangeo Fish. Only users from GFTS can register and authenticate to this JupyterHub" assertion.
- jupyter.central.data.destination-earth.eu description "JupyterHub on Destination Earth Data Lake" assertion.
- DestinE_ESA_GFTS description "These webpages are rendered from GitHub repository and contain all the information about the GFTS project. This includes internal description of the use case, technical documentation, progress, presentations, etc." assertion.
- 2edcfa66-0f59-42f4-aa29-1c5681466424 description "**Use Case topic**: The goal of this use case is the development and implementation of the Global Fish Tracking System (GFTS) to enhance understanding and management of wild fish stocks **Scale of the Use Case (Global/Regional/National)**: Local to Global (various locations worldwide) **Policy addressed**: Fisheries Management Policy **Data Sources used**: Climate Change Adaptation (Climate DT: Routine and On-Demand for some higher resolution tracking), Sea Temperature observation (Satelite, in-situ) Copernicus Marine services (Sea temperature and associated value), Bathymetry (Gebco), biologging fish data **Github Repository**: [https://github.com/destination-earth/DestinE_ESA_GFTS.git](https://github.com/destination-earth/DestinE_ESA_GFTS.git)" assertion.
- e0dc162e-b276-4ba6-ac7d-8122e3cf8daa description "This folder contains presentations or other kind of materials (such as training material) developed and presented during events." assertion.
- f4351d12-996f-42c0-a920-dbc4513691c5 description "This folder contains project documents such as DMP, link to website and github repository, etc." assertion.
- 9b1c9bc2-f28a-449d-be32-0b36fe29ab1c description "This pitcure shows Tina Odaka presenting the Global Fish Tracking System (GFTS) DestinE DESP Use Case at the 8th International Bio-logging Science Symposium, Tokyo, Japan (4-8 March 2024)." assertion.
- a09a17f7-75cb-4d19-8166-aea9308ce506 description "Slide extracted from the presentation to the 3rd Destination Earth User eXchange (2024)." assertion.
- egusphere-egu24-10741 description "Poster presentated at EGU 2024." assertion.
- egusphere-egu24-15500 description "Presentation given at EGU 2024." assertion.
- zenodo.10213946 description "Presentation given at the kick-off meting of the GFTS project." assertion.
- zenodo.10372387 description "Slides presented by Mathieu Woillez at the Roadshow Webinar: DestinE in action – meet the first DESP use cases (13 December 2023)" assertion.
- zenodo.10809819 description "Poster presented at the 8th InternationalBio-logging Science Symposium by Tina Odaka, March 2024." assertion.
- zenodo.11185948 description "Project Management Plan for the Global fish Tracking System Use Case on the DestinE Platform." assertion.
- zenodo.11186084 description "Deliverable 5.2 - Use Case Descriptor for the Global fish Tracking System Use Case on the DestinE Platform." assertion.
- zenodo.11186123 description "The Gobal Fish track system Use case Application on the DestinE Platform" assertion.
- zenodo.11186179 description "Deliverable 5.5 corresponding to the Global Fish tracking System Use Case Promotion Package" assertion.
- zenodo.11186191 description "This report corresponds to the Software Reuse File for the GFTS DestinE Platform Use Case. New version will be uploaded regularly." assertion.
- zenodo.11186227 description "The Software Release Plan for the Global Fish Tracking System DestinE Use Case." assertion.
- zenodo.11186257 description "The Software Requirement Specifications for the Global fish Tracking System DestinE Use Case." assertion.
- zenodo.11186288 description "The Software Verification and Validation Plan for the Global fish Tracking System DestinE Use Case." assertion.
- zenodo.11186318 description "The Software Verification and Validation Report from the Global Fish Tracking System DestinE Use Case." assertion.
- zenodo.13908850 description "Poster presented at the 2nd DestinE User eXchange Conference." assertion.
- pangeo_openeo.png description "Image used to illustrate the joint collaboration between Pangeo and OpenEO." assertion.
- pangeo-openeo-BiDS-2023 description "It points to the rendered version of the Pangeo & OpenEO training material that was generated with Jupyter book. Please bear in mind that it links to the latest version and not necessarily to the exact version used during the training. The bids2023 release is the source code (markdown & Jupyter book) used for the actual training." assertion.
- 80e2215b-49cb-456e-9ae9-803f3bcdbba3 description "BiDS - Big Data from Space Big Data from Space 2023 (BiDS) brings together key actors from industry, academia, EU entities and government to reveal user needs, exchange ideas and showcase latest technical solutions and applications touching all aspects of space and big data technologies. The 2023 edition of BiDS will focus not only on the technologies enabling insight and foresight inferable from big data, but will emphasize how these technologies impact society. More information can be found on the BiDS’23 website. ## Pangeo & OpenEO tutorial The tutorials are divided in 3 parts: - Introduction to Pangeo - Introduction to OpenEO - Unlocking the Power of Space Data with Pangeo & OpenEO The workshop timelines, setup and content are accessible online at [https://pangeo-data.github.io/pangeo-openeo-BiDS-2023](https://pangeo-data.github.io/pangeo-openeo-BiDS-2023)." assertion.
- 0da58fff-a17b-4ec7-97d1-3eb9cb89e1cf description "Jupyter Book source code that was collaboratively developed to deliver Pangeo & OpenEO training at BiDS 2023. All the Jupyter Notebooks are made available under MIT license.The tutorial is written in markdown and can be rendered in HTML using Jupyter book." assertion.
- dd5c3d62-b632-46a1-99e4-761f2e6cb60d description dd5c3d62-b632-46a1-99e4-761f2e6cb60d assertion.
- dd5c3d62-b632-46a1-99e4-761f2e6cb60d description "## Summary HPPIDiscovery is a scientific workflow to augment, predict and perform an insilico curation of host-pathogen Protein-Protein Interactions (PPIs) using graph theory to build new candidate ppis and machine learning to predict and evaluate them by combining multiple PPI detection methods of proteins according to three categories: structural, based on primary aminoacid sequence and functional annotations.<br> HPPIDiscovery contains three main steps: (i) acquirement of pathogen and host proteins information from seed ppis provided by HPIDB search methods, (ii) Model training and generation of new candidate ppis from HPIDB seed proteins' partners, and (iii) Evaluation of new candidate ppis and results exportation. (i) The first step acquires the identification of the taxonomy ids of the host and pathogen organisms in the result files. Then it proceeds parsing and cleaning the HPIDB results and downloading the protein interactions of the found organisms from the STRING database. The string protein identifiers are also mapped using the id mapping tool of uniprot API and we retrieve the uniprot entry ids along with the functional annotations, sequence, domain and kegg enzymes. (ii) The second step builds the training dataset using the non redundant hpidb validated interactions of each genome as positive set and random string low confidence ppis from each genome as negative set. Then, PredPrin tool is executed in the training mode to obtain the model that will evaluate the new candidate PPIs. The new ppis are then generated by performing a pairwise combination of string partners of host and pathogen hpidb proteins. Finally, (iii) in the third step, the predprin tool is used in the test mode to evaluate the new ppis and generate the reports and list of positively predicted ppis. The figure below illustrates the steps of this workflow. ## Requirements: * Edit the configuration file (config.yaml) according to your own data, filling out the following fields: - base_data: location of the organism folders directory, example: /home/user/data/genomes - parameters_file: Since this workflow may perform parallel processing of multiple organisms at the same time, you must prepate a tabulated file containng the genome folder names located in base data, where the hpidb files are located. Example: /home/user/data/params.tsv. It must have the following columns: genome (folder name), hpidb_seed_network (the result exported by one of the search methods available in hpidb database), hpidb_search_method (the type of search used to generate the results) and target_taxon (the target taxon id). The column hpidb_source may have two values: keyword or homology. In the keyword mode, you provide a taxonomy, protein name, publication id or detection method and you save all results (mitab.zip) in the genome folder. Finally, in the homology mode allows the user to search for host pathogen ppis giving as input fasta sequences of a set of proteins of the target pathgen for enrichment (so you have to select the search for a pathogen set) and you save the zip folder results (interaction data) in the genome folder. This option is extremely useful when you are not sure that your organism has validated protein interactions, then it finds validated interactions from the closest proteins in the database. In case of using the homology mode, the identifiers of the pathogens' query fasta sequences must be a Uniprot ID. All the query protein IDs must belong to the same target organism (taxon id). - model_file: path of a previously trained model in joblib format (if you want to train from the known validated PPIs given as seeds, just put a 'None' value) ## Usage Instructions The steps below consider the creation of a sqlite database file with all he tasks events which can be used after to retrieve the execution time taken by the tasks. It is possible run locally too (see luigi's documentation to change the running command). <br><br> * Preparation: 1. ````git clone https://github.com/YasCoMa/hppidiscovery.git```` 2. ````cd hppidiscovery```` 3. ````mkdir luigi_log```` 4. ````luigid --background --logdir luigi_log```` (start luigi server) 5. conda env create -f hp_ppi_augmentation.yml 6. conda activate hp_ppi_augmentation 6.1. (execute ````pip3 install wget```` (it is not installed in the environment)) 7. run ````pwd```` command and get the full path 8. Substitute in config_example.yaml with the full path obtained in the previous step 9. Download SPRINT pre-computed similarities in https://www.csd.uwo.ca/~ilie/SPRINT/precomputed_similarities.zip and unzip it inside workflow_hpAugmentation/predprin/core/sprint/HSP/ 10. ````cd workflow_hpAugmentation/predprin/```` 11. Uncompress annotation_data.zip 12. Uncompress sequence_data.zip 13. ````cd ../../```` 14. ````cd workflow_hpAugmentation```` 15. snake -n (check the plan of jobs, it should return no errors and exceptions) 16. snakemake -j 4 (change this number according the number of genomes to analyse and the amount of cores available in your machine)" assertion.
- 13c69a83-de3f-4379-b137-6a12d45bf6e7 description "## Summary HPPIDiscovery is a scientific workflow to augment, predict and perform an insilico curation of host-pathogen Protein-Protein Interactions (PPIs) using graph theory to build new candidate ppis and machine learning to predict and evaluate them by combining multiple PPI detection methods of proteins according to three categories: structural, based on primary aminoacid sequence and functional annotations.<br> HPPIDiscovery contains three main steps: (i) acquirement of pathogen and host proteins information from seed ppis provided by HPIDB search methods, (ii) Model training and generation of new candidate ppis from HPIDB seed proteins' partners, and (iii) Evaluation of new candidate ppis and results exportation. (i) The first step acquires the identification of the taxonomy ids of the host and pathogen organisms in the result files. Then it proceeds parsing and cleaning the HPIDB results and downloading the protein interactions of the found organisms from the STRING database. The string protein identifiers are also mapped using the id mapping tool of uniprot API and we retrieve the uniprot entry ids along with the functional annotations, sequence, domain and kegg enzymes. (ii) The second step builds the training dataset using the non redundant hpidb validated interactions of each genome as positive set and random string low confidence ppis from each genome as negative set. Then, PredPrin tool is executed in the training mode to obtain the model that will evaluate the new candidate PPIs. The new ppis are then generated by performing a pairwise combination of string partners of host and pathogen hpidb proteins. Finally, (iii) in the third step, the predprin tool is used in the test mode to evaluate the new ppis and generate the reports and list of positively predicted ppis. The figure below illustrates the steps of this workflow. ## Requirements: * Edit the configuration file (config.yaml) according to your own data, filling out the following fields: - base_data: location of the organism folders directory, example: /home/user/data/genomes - parameters_file: Since this workflow may perform parallel processing of multiple organisms at the same time, you must prepate a tabulated file containng the genome folder names located in base data, where the hpidb files are located. Example: /home/user/data/params.tsv. It must have the following columns: genome (folder name), hpidb_seed_network (the result exported by one of the search methods available in hpidb database), hpidb_search_method (the type of search used to generate the results) and target_taxon (the target taxon id). The column hpidb_source may have two values: keyword or homology. In the keyword mode, you provide a taxonomy, protein name, publication id or detection method and you save all results (mitab.zip) in the genome folder. Finally, in the homology mode allows the user to search for host pathogen ppis giving as input fasta sequences of a set of proteins of the target pathgen for enrichment (so you have to select the search for a pathogen set) and you save the zip folder results (interaction data) in the genome folder. This option is extremely useful when you are not sure that your organism has validated protein interactions, then it finds validated interactions from the closest proteins in the database. In case of using the homology mode, the identifiers of the pathogens' query fasta sequences must be a Uniprot ID. All the query protein IDs must belong to the same target organism (taxon id). - model_file: path of a previously trained model in joblib format (if you want to train from the known validated PPIs given as seeds, just put a 'None' value) ## Usage Instructions The steps below consider the creation of a sqlite database file with all he tasks events which can be used after to retrieve the execution time taken by the tasks. It is possible run locally too (see luigi's documentation to change the running command). <br><br> * Preparation: 1. ````git clone https://github.com/YasCoMa/hppidiscovery.git```` 2. ````cd hppidiscovery```` 3. ````mkdir luigi_log```` 4. ````luigid --background --logdir luigi_log```` (start luigi server) 5. conda env create -f hp_ppi_augmentation.yml 6. conda activate hp_ppi_augmentation 6.1. (execute ````pip3 install wget```` (it is not installed in the environment)) 7. run ````pwd```` command and get the full path 8. Substitute in config_example.yaml with the full path obtained in the previous step 9. Download SPRINT pre-computed similarities in https://www.csd.uwo.ca/~ilie/SPRINT/precomputed_similarities.zip and unzip it inside workflow_hpAugmentation/predprin/core/sprint/HSP/ 10. ````cd workflow_hpAugmentation/predprin/```` 11. Uncompress annotation_data.zip 12. Uncompress sequence_data.zip 13. ````cd ../../```` 14. ````cd workflow_hpAugmentation```` 15. snake -n (check the plan of jobs, it should return no errors and exceptions) 16. snakemake -j 4 (change this number according the number of genomes to analyse and the amount of cores available in your machine)" assertion.
- edit?usp=sharing description "This file uses new_atrix.csv as a starting point and filtered by dates e.g. from 1st January 2022 to 31 December 2022. Then a pivot table is created where sort, q and reslabel were selected for column and c for row." assertion.
- edit?usp=sharing description "This google sheet contains the FIP convergence matrix for year 2022. The tab "matrix" is the main tab while FERs are unique list of FERs and Communities tab the list of unique names for communities." assertion.
- 073ab8fc-67b3-4ec7-915e-17ffb47f09c5 description "This Research Object contains all the data used for producing a FIP convergence matrix for the year 2022. The raw data has been fetch from [https://github.com/peta-pico/dsw-nanopub-api](https://github.com/peta-pico/dsw-nanopub-api) on Thursday 5 October 2023. The original matrix called new_matrix.csv ([https://github.com/peta-pico/dsw-nanopub-api/blob/main/tables/new_matrix.csv](https://github.com/peta-pico/dsw-nanopub-api/blob/main/tables/new_matrix.csv)) is stored in the raw data folder for reference. The methodology used to create the FIP convergence Matrix is detailed in the presentation from Barbara Magagna [https://osf.io/de6su/](https://osf.io/de6su/)." assertion.
- 7683f508-3363-4c9a-8eb2-12d31d7e5a4a description "Partial view of the FIP convergence matrix for illustration purposes only." assertion.
- a55b4924-4d0b-4a17-8a5c-22dce26fcf6c description "This CSV file is the result of a SPARQL query executed by a GitHub action." assertion.
- bc3f8893-19ec-4d72-a1fb-20fc92244634 description "PDF file generated from the FIP convergence Matrix google sheet." assertion.
- 0a79368c-5f22-42b6-b315-3e76354918f9 description "This repository contains the python code to reproduce the experiments in Dłotko, Gurnari "Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems"" assertion.
- ed0379fa-4990-4eb0-8375-3c3572847495 description "This repository contains the python code to reproduce the experiments in Dłotko, Gurnari "Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems"" assertion.
- content description "This Notebook provides a workflow of ArcGis toolboxes to identify ML targets from bathynetry." assertion.
- content description "This Notebook provides a workflow of ArcGis toolboxes to identify ML targets from bathynetry." assertion.
- ro-id.EHMJMDN68Q description "It reads and elaborates ASCII file produced with FM Midwater in order to calculate some statistics, mean, standard deviation of backscatter, depth, angle of net for each ping, and to produce a graph with the track of the net" assertion.
- ro-id.IKMY8URJ9Q description "It calculates the sink velocity of a net floating in water starting from water column data." assertion.
- 058215c9-d7d3-4e78-8f49-5655f899d5e3 description "A dedicated workflow in ArcGIS (shared as Jupyter notebook) was developed to identify targets from the bathymetry within the MAELSTROM Project - Smart technology for MArinE Litter SusTainable RemOval and Management. In that framework, the workflow identified marine litter on the seafloor starting from a bathymetric surface collected in Sacca Fisola (Venice Lagoon) on 2021." assertion.
- 2ca9dde1-bb6c-4a17-8a9c-121bbf83ac7b description "Marine Litter Identification from Bathymetry" assertion.
- 515fd558-2e2e-41e6-a613-71146ba0866c description "EASME/EMFF/2017/1.2.1.12/S2/05/SI2.789314 MarGnet official document with description of ROs developed within MarGnet Project" assertion.
- 51d98bb1-2fc4-4468-8f17-c9c05edb7b57 description "Schema of the ArcGIS toolnested in the Jupyter Notebook" assertion.
- ce7f4ce9-0a6e-41a1-b1ec-f95d1f883c5a description "EASME/EMFF/2017/1.2.1.12/S2/05/SI2.789314 MarGnet official document with description of the tool" assertion.
- df584059-663d-4da3-8eec-d01467544383 description "Workflow requirements" assertion.
- fc64e1c2-a7de-41db-a2b2-372b81cda150 description "EASME/EMFF/2017/1.2.1.12/S2/05/SI2.789314 MarGnet official document with description of the application of the tool" assertion.
- 1873-0604.2012018 description "This paper presents a semi-automated method to recognize, spatially delineate and characterise morphometrically pockmarks at the seabed" assertion.
- 1873-0604.2012018 description "This paper presents a semi-automated method to recognize, spatially delineate and characterise morphometrically pockmarks at the seabed" assertion.
- b9f63328-264e-4d28-94b7-397e50cf2dad description "The shape files contain the marine litter targets both manually identified and obtained as output of the ArcGis workflow applied to the input bathymetry" assertion.
- b9f63328-264e-4d28-94b7-397e50cf2dad description "ArcGis workflow output metadata" assertion.
- 7019724e-b5a0-4f7e-a7d6-a1baacac85df description "# ERGA Protein-coding gene annotation workflow. Adapted from the work of Sagane Joye: https://github.com/sdind/genome_annotation_workflow ## Prerequisites The following programs are required to run the workflow and the listed version were tested. It should be noted that older versions of snakemake are not compatible with newer versions of singularity as is noted here: [https://github.com/nextflow-io/nextflow/issues/1659](https://github.com/nextflow-io/nextflow/issues/1659). `conda v 23.7.3` `singularity v 3.7.3` `snakemake v 7.32.3` You will also need to acquire a licence key for Genemark and place this in your home directory with name `~/.gm_key` The key file can be obtained from the following location, where the licence should be read and agreed to: http://topaz.gatech.edu/GeneMark/license_download.cgi ## Workflow The pipeline is based on braker3 and was tested on the following dataset from Drosophila melanogaster: [https://doi.org/10.5281/zenodo.8013373](https://doi.org/10.5281/zenodo.8013373) ### Input data - Reference genome in fasta format - RNAseq data in paired-end zipped fastq format - uniprot fasta sequences in zipped fasta format ### Pipeline steps - **Repeat Model and Mask** Run RepeatModeler using the genome as input, filter any repeats also annotated as protein sequences in the uniprot database and use this filtered libray to mask the genome with RepeatMasker - **Map RNAseq data** Trim any remaining adapter sequences and map the trimmed reads to the input genome - **Run gene prediction software** Use the mapped RNAseq reads and the uniprot sequences to create hints for gene prediction using Braker3 on the masked genome - **Evaluate annotation** Run BUSCO to evaluate the completeness of the annotation produced ### Output data - FastQC reports for input RNAseq data before and after adapter trimming - RepeatMasker report containing quantity of masked sequence and distribution among TE families - Protein-coding gene annotation file in gff3 format - BUSCO summary of annotated sequences ## Setup Your data should be placed in the `data` folder, with the reference genome in the folder `data/ref` and the transcript data in the foler `data/rnaseq`. The config file requires the following to be given: ``` asm: 'absolute path to reference fasta' snakemake_dir_path: 'path to snakemake working directory' name: 'name for project, e.g. mHomSap1' RNA_dir: 'absolute path to rnaseq directory' busco_phylum: 'busco database to use for evaluation e.g. mammalia_odb10' ```" assertion.
- 7019724e-b5a0-4f7e-a7d6-a1baacac85df description "# ERGA Protein-coding gene annotation workflow. Adapted from the work of Sagane Joye: https://github.com/sdind/genome_annotation_workflow ## Prerequisites The following programs are required to run the workflow and the listed version were tested. It should be noted that older versions of snakemake are not compatible with newer versions of singularity as is noted here: [https://github.com/nextflow-io/nextflow/issues/1659](https://github.com/nextflow-io/nextflow/issues/1659). `conda v 23.7.3` `singularity v 3.7.3` `snakemake v 7.32.3` You will also need to acquire a licence key for Genemark and place this in your home directory with name `~/.gm_key` The key file can be obtained from the following location, where the licence should be read and agreed to: http://topaz.gatech.edu/GeneMark/license_download.cgi ## Workflow The pipeline is based on braker3 and was tested on the following dataset from Drosophila melanogaster: [https://doi.org/10.5281/zenodo.8013373](https://doi.org/10.5281/zenodo.8013373) ### Input data - Reference genome in fasta format - RNAseq data in paired-end zipped fastq format - uniprot fasta sequences in zipped fasta format ### Pipeline steps - **Repeat Model and Mask** Run RepeatModeler using the genome as input, filter any repeats also annotated as protein sequences in the uniprot database and use this filtered libray to mask the genome with RepeatMasker - **Map RNAseq data** Trim any remaining adapter sequences and map the trimmed reads to the input genome - **Run gene prediction software** Use the mapped RNAseq reads and the uniprot sequences to create hints for gene prediction using Braker3 on the masked genome - **Evaluate annotation** Run BUSCO to evaluate the completeness of the annotation produced ### Output data - FastQC reports for input RNAseq data before and after adapter trimming - RepeatMasker report containing quantity of masked sequence and distribution among TE families - Protein-coding gene annotation file in gff3 format - BUSCO summary of annotated sequences ## Setup Your data should be placed in the `data` folder, with the reference genome in the folder `data/ref` and the transcript data in the foler `data/rnaseq`. The config file requires the following to be given: ``` asm: 'absolute path to reference fasta' snakemake_dir_path: 'path to snakemake working directory' name: 'name for project, e.g. mHomSap1' RNA_dir: 'absolute path to rnaseq directory' busco_phylum: 'busco database to use for evaluation e.g. mammalia_odb10' ``` " assertion.
- 11f3e069-8d1d-48da-876f-52fd6d255223 description "# ERGA Protein-coding gene annotation workflow. Adapted from the work of Sagane Joye: https://github.com/sdind/genome_annotation_workflow ## Prerequisites The following programs are required to run the workflow and the listed version were tested. It should be noted that older versions of snakemake are not compatible with newer versions of singularity as is noted here: [https://github.com/nextflow-io/nextflow/issues/1659](https://github.com/nextflow-io/nextflow/issues/1659). `conda v 23.7.3` `singularity v 3.7.3` `snakemake v 7.32.3` You will also need to acquire a licence key for Genemark and place this in your home directory with name `~/.gm_key` The key file can be obtained from the following location, where the licence should be read and agreed to: http://topaz.gatech.edu/GeneMark/license_download.cgi ## Workflow The pipeline is based on braker3 and was tested on the following dataset from Drosophila melanogaster: [https://doi.org/10.5281/zenodo.8013373](https://doi.org/10.5281/zenodo.8013373) ### Input data - Reference genome in fasta format - RNAseq data in paired-end zipped fastq format - uniprot fasta sequences in zipped fasta format ### Pipeline steps - **Repeat Model and Mask** Run RepeatModeler using the genome as input, filter any repeats also annotated as protein sequences in the uniprot database and use this filtered libray to mask the genome with RepeatMasker - **Map RNAseq data** Trim any remaining adapter sequences and map the trimmed reads to the input genome - **Run gene prediction software** Use the mapped RNAseq reads and the uniprot sequences to create hints for gene prediction using Braker3 on the masked genome - **Evaluate annotation** Run BUSCO to evaluate the completeness of the annotation produced ### Output data - FastQC reports for input RNAseq data before and after adapter trimming - RepeatMasker report containing quantity of masked sequence and distribution among TE families - Protein-coding gene annotation file in gff3 format - BUSCO summary of annotated sequences ## Setup Your data should be placed in the `data` folder, with the reference genome in the folder `data/ref` and the transcript data in the foler `data/rnaseq`. The config file requires the following to be given: ``` asm: 'absolute path to reference fasta' snakemake_dir_path: 'path to snakemake working directory' name: 'name for project, e.g. mHomSap1' RNA_dir: 'absolute path to rnaseq directory' busco_phylum: 'busco database to use for evaluation e.g. mammalia_odb10' ```" assertion.