User Guide
Update: This is the new version of run_dbCAN. We add multiple new features and improve the performance of the pipeline. The new version is more user-friendly and more efficient. We recommend users to use the new version of run_dbCAN. If you have any questions or suggestions, please feel free to contact us.
All conda environments can be found at https://github.com/bcb-unl/run_dbcan_new/tree/master/envs
Add a function for downloading database files which is simpler than before.
Import pyrodigal (https://pyrodigal.readthedocs.io/en/stable/) instead of prodigal for input processing, besides, add function for data preprocessing, and now run_dbCAN could support prodigal format, JGI format, and NCBI format with setting parameter.
Import pyHMMER (https://pyhmmer.readthedocs.io/en/stable/) instead of HMMER, which is more efficient and speeds up than HMMER. Redesigned memory usage, now can use less memory, or high memory + high efficiency.
Re-organized the logic and structure of run_dbCAN. Now we split functions into each module and use “CLASS” to handle it, which is easier to update and control. Besides, use python to rewrite almost non-python codes and it’s more readable. Use config to organize all parameters.
Use pandas for data processing.
Add coverage justifications and location information in dbCAN-sub.
Add CAZyme justification in the final result (extra column called “Best Results).
Added a lot of log processing and time reporting, making it more user-friendly.
Re-design the CGCFinder (Now support JGI, NCBI, prodigal formats, and could directly search eukaryotes such as fungi genomes).
Change the blastp search to DIAMOND search in substrate prediction part, which is faster and more efficient.
Update steps for metagenomic data protocols.
Hint:If you want to run from raw reads from metagenome, please refer to Run from Raw Reads: Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol. Otherwise, please refer to any following instruction. Please note that some of the precomputed results have different names from the previous version.
- Database Installation Command
- Quick Start
- Process the input fasta file
- CAZyme Annotation
- Prepare the CGC annotation information
- Run with CGCFinder
- Substrate prediction with CGCs
- CGC substrate visualization
- Run from Raw Reads(Cater 2023): Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol
- Run from Raw Reads(Amelia 2024): Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol
- Run from Raw Reads(Priest 2023): Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol
- Run from Raw Reads(Wastyk 2021): Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol
- Run from Raw Reads(Emilson 2024): Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol
- Run from Raw Reads(Cater 2023): Supplementary Protocol for co-assembly
- Run from Raw Reads(Cater 2023): Supplementary Protocol for subsample
- Run from Raw Reads(Cater 2023): Supplementary Protocol for assembly-free