Welcome to the kmviz instance of Ocean Read Atlas (ORA). ORA is a web service to explore the biogeography of one or several sequences in a marine environmental context. ORA enables to query DNA sequences against an index of read sequences and determine the similarity to each indexed sample

ORA contains 1,393 read samples from the Tara Ocean project associated with a collection of metadata collected during the sampling campaigns. See Data section for a description of the dataset.

Interface

This new query interface is based on kmviz, a generic interface to query sequence indexes and explore associated metadata.

Before submitting a query, you have to select a database, here TaraOcean. Two parameters are available:
  • z: We recommend 3. See kmindex documentation for details.
  • coverage: The min ratio of shared k-mer between a query sequence and an indexed sample to consider a positive hit.

The documentation is available here: kmviz documentation.

Data

Environmental context files

Registry of all the samples from the Tara Oceans Expedition (2009-2013) have been deposited at PANGAEA. The environmental variables were retrieved from the following databases:

Sequencing reads

Shotgun metagenomic sequences of all the samples from the Tara Oceans Expedition (2009–2013) are available at the European Nucleotide Archive (ENA) under global accession number PRJEB402 (PRJEB1787 and PRJEB9740 for bacteria and archaea, PRJEB1788 for giant viruses, PRJEB4352 and PRJEB9691 for protists, and PRJEB4419 and PRJEB9742 for DNA viruses).

References

  • T. Lemane, N. Lezzoche, J. Lecubin, E. Pelletier, M. Lescot, R. Chikhi, P. Peterlongo (2024). Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORA. Nature Computational Science 10.1038/s43588-024-00596-6

  • T. Lemane, N. Lezzoche, J. Lecubin, E. Pelletier, M. Lescot, R. Chikhi, P. Peterlongo (2023). kmindex and ORA: indexing and real-time user-friendly queries in terabytes-sized complex genomic datasets. bioRxiv 2023.05.31.543043. doi: 10.1101/2023.05.31.543043

  • C. Vernette, J. Lecubin, P. Sanchez, Tara Oceans Coordinators, S. Sunagawa, T.O. Delmont, S.G. Acinas, E. Pelletier, P. Hingamp, M. Lescot (2022) . The Ocean Gene Atlas v2.0: online exploration of the biogeography and phylogeny of plankton genes. Nucleic Acides Research. doi: 10.1093/nar/gkac420.

  • E. Villar, T. Vannier, C. Vernette, M. Lescot, M. Cuenca, A. Alexandre, P. Bachelerie, T. Rosnet, E. Pelletier, S. Sunagawa and P. Hingamp (2018). The Ocean Gene Atlas: exploring the biogeography of plankton genes online. Nucleic Acids Research. doi: 10.1093/nar/gky376

  • Robidou, Lucas, and Pierre Peterlongo. findere: fast and precise approximate membership query. String Processing and Information Retrieval: 28th International Symposium, SPIRE 2021.

  • Téo Lemane, Paul Medvedev, Rayan Chikhi, and Pierre Peterlongo. kmtricks: Efficient and flexible construction of bloom filters for large sequencing data collections. Bioinformatics Advances, 2022.

DASHBOARD