Different ways to create Pandas Dataframe
Last Updated :
05 Jul, 2024
It is the most commonly used Pandas object. The pd. DataFrame() function is used to create a DataFrame in Pandas . You can also create Pandas DataFrame in multiple ways. There are several ways to create a Pandas Dataframe in Python . You can create a DataFrame with the following methods:
Different Ways to Create Dataframe in Python
Pandas Create Dataframe Syntax
pandas.DataFrame(data, index, columns)
Parameters:
- data : It is a dataset from which a DataFrame is to be created. It can be a list, dictionary, scalar value, series, and arrays, etc.
- index : It is optional, by default the index of the DataFrame starts from 0 and ends at the last data value(n-1). It defines the row label explicitly.
- columns : This parameter is used to provide column names in the DataFrame. If the column name is not defined by default, it will take a value from 0 to n-1.
Returns:
Now that we have discussed about DataFrame() function, let’s look at different ways to create a DataFrame:
Create an Empty DataFrame using DataFrame() Method
Pandas Create Dataframe can be created by the DataFrame() function of the Pandas library . Just call the function with the DataFrame constructor to create a DataFrame.
Example : Creating an empty DataFrame using the DataFrame() function in Python
Python
# Importing Pandas to create DataFrame
import pandas as pd
# Creating Empty DataFrame and Storing it in variable df
df = pd.DataFrame()
# Printing Empty DataFrame
print(df)
Output:
Empty DataFrame
Columns: []
Index: []
Create DataFrame from lists of lists
To create a Pandas DataFrame from a list of lists, you can use the pd.DataFrame() function. This function takes a list of lists as input and creates a DataFrame with the same number of rows and columns as the input list.
Example : Creating DataFrame from lists of lists using the DataFrame() method
Python
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Age'])
# print dataframe.
print(df)
Output:
Name Age
0 tom 10
1 nick 15
2 juli 14
Create DataFrame from Dictionary of ndArray/Lists
To create DataFrame from a dictionary of ndarrays /lists, all the arrays must be of the same length. If an index is passed then the length index should be equal to the length of the arrays.
If no index is passed, then by default, the index will be range(n) where n is the array length.
Example : Creating DataFrame from a dictionary of ndarray/lists
Python
# Python code demonstrate creating
# DataFrame from dict narray / lists
# By default addresses.
import pandas as pd
# initialize data of lists.
data = {'Name': ['Tom', 'nick', 'krish', 'jack'],
'Age': [20, 21, 19, 18]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
Output:
Name Age
0 Tom 20
1 nick 21
2 krish 19
3 jack 18
Note: While creating DataFrame using a dictionary, the keys of the dictionary will be column names by default. We can also provide column names explicitly using column parameter.
Create DataFrame from List of Dictionaries
Pandas DataFrame can be created by passing lists of dictionaries as input data. By default, dictionary keys will be taken as columns.
Python
# Python code demonstrate how to create
# Pandas DataFrame by lists of dicts.
import pandas as pd
# Initialize data to lists.
data = [{'a': 1, 'b': 2, 'c': 3},
{'a': 10, 'b': 20, 'c': 30}]
# Creates DataFrame.
df = pd.DataFrame(data)
# Print the data
print(df)
Output:
a b c
0 1 2 3
1 10 20 30
Another example is to create a Pandas DataFrame by passing lists of dictionaries and row indexes .
Python
# Python code demonstrate to create
# Pandas DataFrame by passing lists of
# Dictionaries and row indices.
import pandas as pd
# Initialize data of lists
data = [{'b': 2, 'c': 3}, {'a': 10, 'b': 20, 'c': 30}]
# Creates pandas DataFrame by passing
# Lists of dictionaries and row index.
df = pd.DataFrame(data, index=['first', 'second'])
# Print the data
print(df)
Output:
b c a
first 2 3 NaN
second 20 30 10.0
Create DataFrame from a dictionary of Series
To create a dataframe in python from a dictionary of series , a dictionary can be passed to form a DataFrame. The resultant index is the union of all the series of passed indexed.
Example: Creating a DataFrame from a dictionary of series.
Python
# Python code demonstrate creating
# Pandas Dataframe from Dicts of series.
import pandas as pd
# Initialize data to Dicts of series.
d = {'one': pd.Series([10, 20, 30, 40],
index=['a', 'b', 'c', 'd']),
'two': pd.Series([10, 20, 30, 40],
index=['a', 'b', 'c', 'd'])}
# creates Dataframe.
df = pd.DataFrame(d)
# print the data.
print(df)
Output:
one two
a 10 10
b 20 20
c 30 30
d 40 40
Create DataFrame using the zip() function
Two lists can be merged by using the zip() function . Now, create the Pandas DataFrame by calling pd.DataFrame() function.
Example: Creating DataFrame using zip() function.
Python
# Python program to demonstrate creating
# pandas Dataframe from lists using zip.
import pandas as pd
# List1
Name = ['tom', 'krish', 'nick', 'juli']
# List2
Age = [25, 30, 26, 22]
# get the list of tuples from two lists.
# and merge them by using zip().
list_of_tuples = list(zip(Name, Age))
# Assign data to tuples.
list_of_tuples
# Converting lists of tuples into
# pandas Dataframe.
df = pd.DataFrame(list_of_tuples,
columns=['Name', 'Age'])
# Print data.
print(df)
Output:
Name Age
0 tom 25
1 krish 30
2 nick 26
3 juli 22
Create a DataFrame by proving the index label explicitly
To create a DataFrame by providing the index label explicitly, you can use the index parameter of the pd.DataFrame() constructor. The index parameter takes a list of index labels as input, and the DataFrame will use these labels for the rows of the DataFrame.
Example: Creating a DataFrame by proving the index label explicitly
Python
# Python code demonstrate creating
# pandas DataFrame with indexed by
# DataFrame using arrays.
import pandas as pd
# initialize data of lists.
data = {'Name': ['Tom', 'Jack', 'nick', 'juli'],
'marks': [99, 98, 95, 90]}
# Creates pandas DataFrame.
df = pd.DataFrame(data, index=['rank1',
'rank2',
'rank3',
'rank4'])
# print the data
print(df)
Output:
Name marks
rank1 Tom 99
rank2 Jack 98
rank3 nick 95
rank4 juli 90
Conclusion
Python Pandas DataFrame is similar to a table with rows and columns. It is a two-dimensional data structure and is very useful for data analysis and data manipulation.
In this tutorial, we have discussed multiple ways of creating a Pandas DataFrame. With this tutorial, you will be able to handle any complex requirement of creating DataFrame.
Different ways to create Pandas Dataframe – FAQs
What are the methods for DataFrame in Python?
Some common methods for pandas DataFrame include:
head()
: Returns the first n rows. tail()
: Returns the last n rows. info()
: Provides a summary of the DataFrame. describe()
: Generates descriptive statistics. sort_values()
: Sorts the DataFrame by specified columns. groupby()
: Groups the DataFrame using a mapper or by series of columns. merge()
: Merges DataFrame or named series objects with a database-style join. apply()
: Applies a function along the axis of the DataFrame. drop()
: Removes specified labels from rows or columns. pivot_table()
: Creates a pivot table. fillna()
: Fills NA/NaN values. isnull()
: Detects missing values.
Which data types can be used to create DataFrame?
DataFrames can be created using various data types including:
- Dictionaries of arrays, lists, or series.
- Lists of dictionaries.
- 2D NumPy arrays.
- Series.
- Another DataFrame
import pandas as pd
import numpy as np
# From a dictionary of lists
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# From a list of dictionaries
df2 = pd.DataFrame([{'A': 1, 'B': 4}, {'A': 2, 'B': 5}, {'A': 3, 'B': 6}])
# From a 2D NumPy array
df3 = pd.DataFrame(np.array([[1, 4], [2, 5], [3, 6]]), columns=['A', 'B'])
# From a series
df4 = pd.DataFrame({'A': pd.Series([1, 2, 3]), 'B': pd.Series([4, 5, 6])})
How many data types are there in a pandas DataFrame?
A pandas DataFrame can contain multiple data types across its columns, such as:
int64
: Integer values. float64
: Floating-point values. object
: Text or mixed types. datetime64[ns]
: Date and time values. bool
: Boolean values.
You can check the data types of a DataFrame using the dtypes
attribute.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4.0, 5.1, 6.2], 'C': ['x', 'y', 'z']})
print(df.dtypes)
# Output:
# A int64
# B float64
# C object
# dtype: object
Why use DataFrame instead of a dataset?
DataFrames are specifically designed for data manipulation and analysis, offering several advantages over general datasets:
- Integrated handling of missing data.
- Label-based indexing for rows and columns.
- Powerful data alignment and broadcasting.
- Extensive functionality for data manipulation, aggregation, and transformation.
- Better performance for operations involving structured data.
- Integration with a variety of data sources and file formats.
What type is a DataFrame in pandas?
In pandas, a DataFrame is of the type pandas.core.frame.DataFrame.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(type(df)) # Output: <class 'pandas.core.frame.DataFrame'>
Please Login to comment...