User Guide

Update: This is the new version of run_dbCAN. We add multiple new features and improve the performance of the pipeline. The new version is more user-friendly and more efficient. We recommend users to use the new version of run_dbCAN. If you have any questions or suggestions, please feel free to contact us.

All conda environments can be found at https://github.com/bcb-unl/run_dbcan_new/tree/master/envs

  1. Add a function for downloading database files which is simpler than before.

  2. Import pyrodigal (https://pyrodigal.readthedocs.io/en/stable/) instead of prodigal for input processing, besides, add function for data preprocessing, and now run_dbCAN could support prodigal format, JGI format, and NCBI format with setting parameter.

  3. Import pyHMMER (https://pyhmmer.readthedocs.io/en/stable/) instead of HMMER, which is more efficient and speeds up than HMMER. Redesigned memory usage, now can use less memory, or high memory + high efficiency.

  4. Re-organized the logic and structure of run_dbCAN. Now we split functions into each module and use “CLASS” to handle it, which is easier to update and control. Besides, use python to rewrite almost non-python codes and it’s more readable. Use config to organize all parameters.

  5. Use pandas for data processing.

  6. Add coverage justifications and location information in dbCAN-sub.

  7. Add CAZyme justification in the final result (extra column called “Best Results).

  8. Added a lot of log processing and time reporting, making it more user-friendly.

  9. Re-design the CGCFinder (Now support JGI, NCBI, prodigal formats, and could directly search eukaryotes such as fungi genomes).

  10. Change the blastp search to DIAMOND search in substrate prediction part, which is faster and more efficient.

  11. Update steps for metagenomic data protocols.

Hint:If you want to run from raw reads from metagenome, please refer to Run from Raw Reads: Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol. Otherwise, please refer to any following instruction. Please note that some of the precomputed results have different names from the previous version.