Open In App

Different ways to create Pandas Dataframe

Last Updated : 05 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Save
Share
Report

It is the most commonly used Pandas object. The pd. DataFrame() function is used to create a DataFrame in Pandas . You can also create Pandas DataFrame in multiple ways. There are several ways to create a Pandas Dataframe in Python . You can create a DataFrame with the following methods:

Pandas Create Dataframe Syntax

pandas.DataFrame(data, index, columns)

Parameters:

  • data : It is a dataset from which a DataFrame is to be created. It can be a list, dictionary, scalar value, series, and arrays, etc.
  • index : It is optional, by default the index of the DataFrame starts from 0 and ends at the last data value(n-1). It defines the row label explicitly.
  • columns : This parameter is used to provide column names in the DataFrame. If the column name is not defined by default, it will take a value from 0 to n-1.

Returns:

  • DataFrame object

Now that we have discussed about DataFrame() function, let’s look at different ways to create a DataFrame:

Create an Empty DataFrame using DataFrame() Method

Pandas Create Dataframe can be created by the DataFrame() function of the Pandas library . Just call the function with the DataFrame constructor to create a DataFrame.

Pandas DataFrames are essential for effective data handling and analysis in Python. Each method offers unique advantages depending on the data source and format.You can enroll in our Complete Machine Learning & Data Science Program to explore these techniques to leverage the full potential of Pandas for your data-centric tasks.Gain hands-on experience with Pandas DataFrames and learn advanced techniques

Example : Creating an empty DataFrame using the DataFrame() function in Python

Python
# Importing Pandas to create DataFrame
import pandas as pd

# Creating Empty DataFrame and Storing it in variable df
df = pd.DataFrame()

# Printing Empty DataFrame
print(df)

Output:

Empty DataFrame
Columns: []
Index: []

Create DataFrame from lists of lists

To create a Pandas DataFrame from a list of lists, you can use the pd.DataFrame() function. This function takes a list of lists as input and creates a DataFrame with the same number of rows and columns as the input list.

Example : Creating DataFrame from lists of lists using the DataFrame() method

Python
# Import pandas library
import pandas as pd

# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]

# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Age'])

# print dataframe.
print(df)

Output:

 Name  Age
0 tom 10
1 nick 15
2 juli 14

Create DataFrame from Dictionary of  ndArray/Lists

To create DataFrame from a dictionary of ndarrays /lists, all the arrays must be of the same length. If an index is passed then the length index should be equal to the length of the arrays.

If no index is passed, then by default, the index will be range(n) where n is the array length.

Example : Creating DataFrame from a dictionary of ndarray/lists

Python
# Python code demonstrate creating
# DataFrame from dict narray / lists
# By default addresses.

import pandas as pd

# initialize data of lists.
data = {'Name': ['Tom', 'nick', 'krish', 'jack'],
        'Age': [20, 21, 19, 18]}

# Create DataFrame
df = pd.DataFrame(data)

# Print the output.
print(df)

Output:

 Name  Age
0 Tom 20
1 nick 21
2 krish 19
3 jack 18

Note: While creating DataFrame using a dictionary, the keys of the dictionary will be column names by default. We can also provide column names explicitly using column parameter.

Create DataFrame from List of Dictionaries

Pandas DataFrame can be created by passing lists of dictionaries as input data. By default, dictionary keys will be taken as columns.

Python
# Python code demonstrate how to create
# Pandas DataFrame by lists of dicts.
import pandas as pd

# Initialize data to lists.
data = [{'a': 1, 'b': 2, 'c': 3},
        {'a': 10, 'b': 20, 'c': 30}]

# Creates DataFrame.
df = pd.DataFrame(data)

# Print the data
print(df)

Output:

a   b   c
0 1 2 3
1 10 20 30

Another example is to create a Pandas DataFrame by passing lists of dictionaries and row indexes .

Python
# Python code demonstrate to create
# Pandas DataFrame by passing lists of
# Dictionaries and row indices.
import pandas as pd

# Initialize data of lists
data = [{'b': 2, 'c': 3}, {'a': 10, 'b': 20, 'c': 30}]

# Creates pandas DataFrame by passing
# Lists of dictionaries and row index.
df = pd.DataFrame(data, index=['first', 'second'])

# Print the data
print(df)

Output:

b   c     a
first 2 3 NaN
second 20 30 10.0

Create DataFrame from a dictionary of Series

To create a dataframe in python from a dictionary of series , a dictionary can be passed to form a DataFrame. The resultant index is the union of all the series of passed indexed.

Example: Creating a DataFrame from a dictionary of series.

Python
# Python code demonstrate creating
# Pandas Dataframe from Dicts of series.

import pandas as pd

# Initialize data to Dicts of series.
d = {'one': pd.Series([10, 20, 30, 40],
                      index=['a', 'b', 'c', 'd']),
     'two': pd.Series([10, 20, 30, 40],
                      index=['a', 'b', 'c', 'd'])}

# creates Dataframe.
df = pd.DataFrame(d)

# print the data.
print(df)

Output:

   one  two
a 10 10
b 20 20
c 30 30
d 40 40

Create DataFrame using the zip() function

Two lists can be merged by using the zip() function . Now, create the Pandas DataFrame by calling pd.DataFrame() function.

Example: Creating DataFrame using zip() function.

Python
# Python program to demonstrate creating
# pandas Dataframe from lists using zip.

import pandas as pd

# List1
Name = ['tom', 'krish', 'nick', 'juli']

# List2
Age = [25, 30, 26, 22]

# get the list of tuples from two lists.
# and merge them by using zip().
list_of_tuples = list(zip(Name, Age))

# Assign data to tuples.
list_of_tuples


# Converting lists of tuples into
# pandas Dataframe.
df = pd.DataFrame(list_of_tuples,
                  columns=['Name', 'Age'])

# Print data.
print(df)

Output:

 Name  Age
0 tom 25
1 krish 30
2 nick 26
3 juli 22

Create a DataFrame by proving the index label explicitly

To create a DataFrame by providing the index label explicitly, you can use the index parameter of the pd.DataFrame() constructor. The index parameter takes a list of index labels as input, and the DataFrame will use these labels for the rows of the DataFrame.

Example: Creating a DataFrame by proving the index label explicitly

Python
# Python code demonstrate creating
# pandas DataFrame with indexed by

# DataFrame using arrays.
import pandas as pd

# initialize data of lists.
data = {'Name': ['Tom', 'Jack', 'nick', 'juli'],
        'marks': [99, 98, 95, 90]}

# Creates pandas DataFrame.
df = pd.DataFrame(data, index=['rank1',
                               'rank2',
                               'rank3',
                               'rank4'])

# print the data
print(df)

Output:

 Name  marks
rank1 Tom 99
rank2 Jack 98
rank3 nick 95
rank4 juli 90

Conclusion

Python Pandas DataFrame is similar to a table with rows and columns. It is a two-dimensional data structure and is very useful for data analysis and data manipulation.

In this tutorial, we have discussed multiple ways of creating a Pandas DataFrame. With this tutorial, you will be able to handle any complex requirement of creating DataFrame.

Different ways to create Pandas Dataframe – FAQs

What are the methods for DataFrame in Python?

Some common methods for pandas DataFrame include:

  • head() : Returns the first n rows.
  • tail() : Returns the last n rows.
  • info() : Provides a summary of the DataFrame.
  • describe() : Generates descriptive statistics.
  • sort_values() : Sorts the DataFrame by specified columns.
  • groupby() : Groups the DataFrame using a mapper or by series of columns.
  • merge() : Merges DataFrame or named series objects with a database-style join.
  • apply() : Applies a function along the axis of the DataFrame.
  • drop() : Removes specified labels from rows or columns.
  • pivot_table() : Creates a pivot table.
  • fillna() : Fills NA/NaN values.
  • isnull() : Detects missing values.

Which data types can be used to create DataFrame?

DataFrames can be created using various data types including:

  • Dictionaries of arrays, lists, or series.
  • Lists of dictionaries.
  • 2D NumPy arrays.
  • Series.
  • Another DataFrame
import pandas as pd
import numpy as np
# From a dictionary of lists
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# From a list of dictionaries
df2 = pd.DataFrame([{'A': 1, 'B': 4}, {'A': 2, 'B': 5}, {'A': 3, 'B': 6}])
# From a 2D NumPy array
df3 = pd.DataFrame(np.array([[1, 4], [2, 5], [3, 6]]), columns=['A', 'B'])
# From a series
df4 = pd.DataFrame({'A': pd.Series([1, 2, 3]), 'B': pd.Series([4, 5, 6])})

How many data types are there in a pandas DataFrame?

A pandas DataFrame can contain multiple data types across its columns, such as:

  • int64 : Integer values.
  • float64 : Floating-point values.
  • object : Text or mixed types.
  • datetime64[ns] : Date and time values.
  • bool : Boolean values.

You can check the data types of a DataFrame using the dtypes attribute.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4.0, 5.1, 6.2], 'C': ['x', 'y', 'z']})
print(df.dtypes)
# Output:
# A int64
# B float64
# C object
# dtype: object

Why use DataFrame instead of a dataset?

DataFrames are specifically designed for data manipulation and analysis, offering several advantages over general datasets:

  • Integrated handling of missing data.
  • Label-based indexing for rows and columns.
  • Powerful data alignment and broadcasting.
  • Extensive functionality for data manipulation, aggregation, and transformation.
  • Better performance for operations involving structured data.
  • Integration with a variety of data sources and file formats.

What type is a DataFrame in pandas?

In pandas, a DataFrame is of the type pandas.core.frame.DataFrame.

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(type(df)) # Output: <class 'pandas.core.frame.DataFrame'>


Previous Article
Next Article

Similar Reads

Different ways to iterate over rows in Pandas Dataframe
In this article, we will cover how to iterate over rows in a DataFrame in Pandas. Example # Defining a function to applydef print_row(row): print(f"Name: {row['Name']}, Age: {row['Age']}")# Iterating over rows using apply()df.apply(print_row, axis=1)[embed]https://www.youtube.com/watch?v=mT-2AxZLtvw&amp;t=179s[/embed] Pandas Loop through RowsPandas
3 min read
Pandas DataFrame assign() Method | Create new Columns in DataFrame
Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, making importing and analyzing data much easier. The Dataframe.assign() method assigns new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original one
4 min read
Ways to Create NaN Values in Pandas DataFrame
Let's discuss ways of creating NaN values in the Pandas Dataframe. There are various ways to create NaN values in Pandas dataFrame. Those are: Using NumPy Importing csv file having blank values Applying to_numeric function Method 1: Using NumPy C/C++ Code import pandas as pd import numpy as np num = {'number': [1,2,np.nan,6,7,np.nan,np.nan]} df = p
1 min read
Pandas DataFrame hist() Method | Create Histogram in Pandas
A histogram is a graphical representation of the numerical data. Sometimes you'll want to share data insights with someone, and using graphical representations has become the industry standard. Pandas.DataFrame.hist() function plots the histogram of a given Data frame. It is useful in understanding the distribution of numeric variables. This functi
4 min read
Difference Between Spark DataFrame and Pandas DataFrame
Dataframe represents a table of data with rows and columns, Dataframe concepts never change in any Programming language, however, Spark Dataframe and Pandas Dataframe are quite different. In this article, we are going to see the difference between Spark dataframe and Pandas Dataframe. Pandas DataFrame Pandas is an open-source Python library based o
3 min read
Pandas Dataframe.to_numpy() - Convert dataframe to Numpy array
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). This data structure can be converted to NumPy ndarray with the help of the DataFrame.to_numpy() method. In this article we will see how to convert dataframe to numpy array. Syntax of Pandas DataFrame.to_numpy()
3 min read
Convert given Pandas series into a dataframe with its index as another column on the dataframe
First of all, let we understand that what are pandas series. Pandas Series are the type of array data structure. It is one dimensional data structure. It is capable of holding data of any type such as string, integer, float etc. A Series can be created using Series constructor. Syntax: pandas.Series(data, index, dtype, copy) Return: Series object.
1 min read
How to Convert Wide Dataframe to Tidy Dataframe with Pandas stack()?
We might sometimes need a tidy/long-form of data for data analysis. So, in python's library Pandas there are a few ways to reshape a dataframe which is in wide form into a dataframe in long/tidy form. Here, we will discuss converting data from a wide form into a long-form using the pandas function stack(). stack() mainly stacks the specified index
4 min read
Replace values of a DataFrame with the value of another DataFrame in Pandas
In this article, we will learn how we can replace values of a DataFrame with the value of another DataFrame using pandas. It can be done using the DataFrame.replace() method. It is used to replace a regex, string, list, series, number, dictionary, etc. from a DataFrame, Values of the DataFrame method are get replaced with another value dynamically.
4 min read
Converting Pandas Dataframe To Dask Dataframe
In this article, we will delve into the process of converting a Pandas DataFrame to a Dask DataFrame in Python through several straightforward methods. This conversion is particularly crucial when dealing with large datasets, as Dask provides parallel and distributed computing capabilities, allowing for efficient handling of substantial data volume
3 min read
three90RightbarBannerImg