The four commands provided by the Scrapple CLI are
The genconfig command is used to create a skeleton configuration file, which can be used as the base for further writing the necessary configuration file. This makes it easier to understand the structure of the key/value-based configuration file, and provide the necessary options.
The two positional arguments for the genconfig command are :
The genconfig command creates a basic configuration file with the provided base URL, and creates it as “<project_name>.json”.
The two optional arguments for the genconfig command are :
The extractor could be a scraper or a crawler, and this can be specified in the –type argument. By default, it is a scraper. When the crawler option is provided, it adds the “next” parameter in the skeleton configuration file.
With the –selector option, the selector type to be used can be specified. This can be “css” or “xpath”. By default, it is “xpath”.
Examples :
The generate command is used to generate the Python script corresponding to the specifications in the configuration file. This command is used to create the script that replicates the operation of the run command.
The two positional arguments for the generate command are :
The project name is the name of the configuration file to be used, i.e, “<project_name>.json” is the configuration file used as the specification. The command creates “<output_file_name>.py” as the generated Python script.
The one available optional argument is :
This specifies the output format in which the extracted content is to be stored. This could be “csv” or “json”. By default, it is “json”.
Examples :
The run command is used to run the extractor corresponding to the specifications in the configuration file. This command runs the extractors and stores the extracted content for later use.
The two positional arguments for the generate command are :
The project name is the name of the configuration file to be used, i.e, “<project_name>.json” is the configuration file used as the specification. The command creates “<output_file_name>.json” or “<output_file_name>.csv” which contains the extracted content.
The one available optional argument is :
This specifies the output format in which the extracted content is to be stored. This could be “csv” or “json”. By default, it is “json”.
Examples :
The web command is an added feature, to make it easier to edit the configuration file. It provides a web interface, which contains a form where the configuration file can be filled. It currently supports only editing configuration files for scrapers. Future work includes support for editing configuration files for link crawlers.
The web interface can be opened with the command
$ scrapple web
This starts a Flask web app, which opens on port 5000 on the localhost.