Tutorial: Python
Prerequisites
Overview
You can use Ibis Birdbrain as a Python library, but it is often used through an interactive Python session. You can see how to start one with the birdbrain
CLI with Ibis Birdbrain ready to use.
If you use the CLI, you’ll have access to the ai
/bot
/birdbrain
object, which is an instance of the ibis_birdbrain.ai.Bot
class. This bot has access to numerous tools and can be used to interact with data via Ibis.
Getting help
You can ask the bot for help with its usage:
"how do I use you?") bot(
birdbrain:
I’m birdbrain, your assistant for data analysis projects using Ibis, AI, and many other data tools. Here’s how you can interact with me:
Ask Questions: You can ask me anything related to Ibis, AI, and data analysis. For example, you can ask about how to use certain features in Ibis, how to code in Python, or ask for help in debugging your code.
Run Analysis: You can also ask me to run data analysis for you. You can give me instructions to query data, analyze it, and even generate visualizations.
Learning Resources: If you’re looking for resources to learn more about Ibis, AI, and data analysis, I can provide you with relevant links to documentation, tutorials, and more.
File Operations: I can help with file operations such as reading a file, writing to a file, listing files in a directory, etc.
Translation & Summarization: If you have a large piece of text or a document that you want to summarize, or if you want to translate text from one language to another, I can help with that too.
Remember, I can only provide accurate information and perform tasks based on the data I have access to. I’m here to assist you. Let’s explore the world of data together!
Using a tool
Interact via natural language to have the bot use tools on your behalf.
Interacting with a data
Ibis Birdbrain can read data files (limited to Delta Lake tables and Excel files, currently), registering them as tables in your Ibis backend. If you connect to a data platform with existing tables via Ibis, you will have access to them through the bot.
In this example, we’ll read in a local penguins.xlsx
file and ask the bot about it:
"read in the penguins excel") bot(
birdbrain:
I’ve successfully read in the penguins Excel file. Here’s a preview of the data:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year Adelie Torgersen 39.1 18.7 181 3750 male 2007 Adelie Torgersen 39.5 17.4 186 3800 female 2007 Adelie Torgersen 40.3 18.0 195 3250 female 2007 Adelie Torgersen NaN NaN - - - 2007 Adelie Torgersen 36.7 19.3 193 3450 female 2007 (and so on…) ) Note: Null values are represented as NaN (Not a Number) or ‘-’ for numerical columns, and as NULL for string columns.
"what tables do you have access to? give an overview of the data") bot(
birdbrain:
Currently, I have access to one table named ‘penguins’.
Here’s an overview of the data in it:
- species: This is a string column that gives the species of the penguin.
- island: This is a string column that mentions the island the penguin is from.
- bill_length_mm: This is a float64 column that specifies the length of the penguin’s bill in millimeters. 4. bill_depth_mm: This is a float64 column that specifies the depth of the penguin’s bill in millimeters.
- flipper_length_mm: This is an int64 column that provides the length of the penguin’s flipper in millimeters.
- body_mass_g: This is an int64 column that gives the body mass of the penguin in grams.
- sex: This is a string column that mentions the sex of the penguin.
- year: This is an int64 column that gives the year of observation.
Please note that a string field can contain text, a float64 field can contain any real number (including decimal values), and an int64 field can contain any integer.
Interacting with the Internet via search
"what is this dataset? search the internet and find out more about its origin for me") bot(
birdbrain:
The penguins dataset, a possible alternative to the classic Iris dataset, contains data on three species of penguins from three islands in the Palmer Archipelago, Antarctica [1][2][3].
The data comprises measurements such as bill length, bill depth, flipper length, body mass, along with categorical data such as species, island, and sex [3]. It provides a good mix of numeric and categorical data for various data exploration and machine learning tasks.
The dataset was originally collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER) [2].
The purpose of the Palmer LTER is to study the marine ecosystem in the western Antarctic Peninsula region as a microcosm of global ocean ecosystems, focusing on the processes that link the extent of annual sea ice to the biological dynamics of different trophic levels [2].
Sources: [1] Penguin dataset : The new Iris | Kaggle [2] (https://allisonhorst.github.io/palmerpenguins/) [3] Penguins Dataset Overview — iris alternative | by AbdulMajedRaja RS | Towards Data Science
Interacting with the filesystem
"read in the readme; I don't think it contians much, draft a better version for me based on this session") bot(
birdbrain:
The ‘penguins’ project now has a new and improved README:
Penguins Data Analysis Project
This repository contains an exploration and analysis of the
penguins
dataset. The data was originally collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER).The
penguins
dataset provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica. It includes measurements such as bill length, bill depth, flipper length, body mass, along with categorical data such as species, island, and sex. The dataset presents a rich mix of numeric and categorical data, making it an excellent resource for various data exploration and machine learning tasks.Data
The data for this project is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python. The data is then available for analysis and visualization.
Analysis
Detailed analysis on this data is yet to be performed. The objective is to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.
Contributing
Contributions are welcome. Please read the CONTRIBUTING.md for details.
License
This project is licensed under the terms of the MIT License.
This summary gives a good overview of the data involved in the project, the purpose of the project, and how others can contribute.
"only capitalize the first word in headings; remove the contributing/license stuff, this is buried deep in a repo. simplify and just include basic info, then write the file out for me") bot(
birdbrain:
Here’s a simplified version of the README:
Penguins data analysis project
This project involves an exploration and analysis of the
penguins
dataset. The dataset, originally collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica.Dataset
The
penguins
dataset includes measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.Analysis
The project aims to use this data to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.
I’ll now write this content to the README file.
"write away!") bot(
birdbrain:
I’ve successfully written the updated README to the file
./README.md
.
cat README.md
Penguins data analysis project
This project involves an exploration and analysis of the
penguins
dataset. The dataset, originally collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), provides detailed data on three species of penguins from three islands in the Palmer Archipelago, Antarctica.Dataset
The
penguins
dataset includes measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.Analysis
The project aims to use this data to derive meaningful insights about penguin species, their physical characteristics, and their distribution across the islands.
Saving and loading conversation history
From a new session:
1]: bot("read in the readme") [ins] In [
birdbrain:
The README file contains information about a data analysis project involving the
penguins
dataset. This dataset, collected by Dr. Kristen Gorman and the Palmer Station, Antarctica, Long Term Ecological Research (LTER), includes details on three species of penguins from three islands in the Palmer Archipelago, Antarctica.The dataset contains measurements such as bill length, bill depth, flipper length, body mass, and categorical data such as species, island, and sex. The data is stored in an Excel file named ‘penguins.xlsx’ and is loaded into the project using Python.
The project aims to use this data to derive meaningful insights about the penguin species, their physical characteristics, and their distribution across the islands.
2]: bot.save_history()
[ins] In [
3]: bb_con = ibis.connect("duckdb://birdbrain.ddb")
[ins] In [
4]: bb_con.list_tables()
[ins] In [4]: ['history']
Out[
5]: t = bb_con.table("history")
[ins] In [
6]: t [ins] In [
Out[6]:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ convo_id ┃ convo_timestamp ┃ bot_name ┃ message_content ┃ message_timestamp ┃ function_name ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ string │ timestamp │ string │ string │ timestamp('UTC') │ string │
├────────────────────────────┼────────────────────────────┼───────────┼───────────────────────────┼──────────────────────────────────┼─────────────────────┤
│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ read in the readme │ 2023-10-01 16:40:31.961856+00:00 │ ~ │
│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ ['./requirements.txt', '… │ 2023-10-01 16:40:34.391470+00:00 │ list_files_and_dirs │
│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ # Penguins data analysis… │ 2023-10-01 16:40:36.201490+00:00 │ read_file │
│ cc25b8fd-74b0-4184-bfde-7… │ 2023-10-01 09:40:47.736675 │ birdbrain │ The README file contains… │ 2023-10-01 16:40:43.166277+00:00 │ ~ │ └────────────────────────────┴────────────────────────────┴───────────┴───────────────────────────┴──────────────────────────────────┴─────────────────────┘
Understanding and accessing Bot objects
You can ask the bot what tools it has access to and you should get an accurate answer, but you can also directly check. The Bot
object is a very thin wrapper around a Marvin AI Application. You can access the Bot.ai
attribute to get the underlying AIApplication
object.
1]: print(bot.ai.tools)
[ins] In [
[
Tool(='open_browser',
name='Opens the URL in a web browser.',
description=<function open_browser at 0x11f22fc40>
fn
),
Tool(='search_internet',
name='Searches the internet for the given query.',
description=<function search_internet at 0x104890400>
fn
),
Tool(='webpage_to_str',
name='Reads a webpage link into a string. Useful for summarizing
descriptionwebpages.',
=<function webpage_to_str at 0x11f22fba0>
fn
),
Tool(='summarize_text',
name='Summarizes text',
description=<function summarize_text at 0x11f22fe20>
fn
),
Tool(='translate_text',
name='Translates text',
description=<function translate_text at 0x11f22fec0>
fn
),
Tool(='read_file',
name='Reads a file and returns its content.',
description=<function read_file at 0x11f25d260>
fn
),
Tool(='list_files_and_dirs',
name='Lists all files and directories in a directory.',
description=<function list_files_and_dirs at 0x11f25d440>
fn
),
Tool(='write_file',
name='Writes a file.',
description=<function write_file at 0x11f25d4e0>
fn
),
Tool(='text_to_python',
name='Returns a Python code given text.',
description=<function text_to_python at 0x11f22e3e0>
fn
),
Tool(='fix_python_error',
name='Fixes a Python error in the code.',
description=<function fix_python_error at 0x11f25cb80>
fn
),
Tool(='run_python_code',
name='Execute Python code as a string and return the output',
description=<function run_python_code at 0x11f25ce00>
fn
),
Tool(='list_tables',
name='Returns a list of available tables to query',
description=<function list_tables at 0x12a16e840>
fn
),
Tool(='query_table',
name='Queries the table in the database to answer the question',
description=<function query_table at 0x12a16e980>
fn
),
Tool(='get_table_schema',
name='Returns the schema of a table',
description=<function get_table_schema at 0x12a16e5c0>
fn
),
Tool(='read_delta_table',
name='Reads a Delta Lake table directory from the full filepath\n\n
descriptionfilepath should be of the form: <path>/<table_name>\n ',
=<function read_delta_table at 0x12a16e7a0>
fn
),
Tool(='read_excel_file',
name='Reads an Excel file from the full filepath\n\n filepath
descriptionshould be of the form: <path>/<table_name>.<extension>\n ',
=<function read_excel_file at 0x12a16e8e0>
fn
),
Tool(='read_files_and_summarize',
name='Read a list of files and summarize them.',
description=<function read_files_and_summarize at 0x12a16eb60>
fn
),
Tool(='read_webpage_and_summarize',
name='Read a webpage and summarize it.',
description=<function read_webpage_and_summarize at 0x12a16ea20>
fn
) ]
Next steps
Explore some data with Ibis Birdbrain and let us know how it goes!