When it comes to select data on a DataFrame, Pandas loc
and iloc
are two top favorites. They are quick, fast, easy to read, and sometimes interchangeable.
In this post, we’ll explore the differences between loc
and iloc
, take a looks at their similarities, and check how to perform data selection with them. We will go over the following topics:
loc
and iloc
loc
and iloc
are interchangeable when labels are 0-based integersloc
and iloc
The main distinction between loc
and iloc
is:
loc
is label-based, which means that you have to specify rows and columns based on their row and column labels.iloc
is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).Here are some differences and similarities between loc
and iloc
:
For demonstration, we create a DataFrame and load it with the Day column as the index.
df = pd.read_csv('data/data.csv', index_col=['Day'])
Both loc
and iloc
allow input to be a single value. We can use the following syntax for data selection:
loc[row_label, column_label]
iloc[row_position, column_position]
For example, let’s say we would like to retrieve Friday’s temperature value.
With loc
, we can pass the row label 'Fri'
and the column label 'Temperature'
.
# To get Friday's temperature
df.loc['Fri', 'Temperature']
10.51
The equivalent iloc
statement should take the row number 4
and the column number 1
.
# The equivalent `iloc` statement
df.iloc[4, 1]
10.51
We can also use :
to return all data. For example, to get all rows:
# To get all rows
df.loc[:, 'Temperature']
Day
Mon 12.79
Tue 19.67
Wed 17.51
Thu 14.44
Fri 10.51
Sat 11.07
Sun 17.50
Name: Temperature, dtype: float64
# The equivalent `iloc` statement
df.iloc[:, 1]
And to get all columns:
# To get all columns
df.loc['Fri', :]
Weather Shower
Temperature 10.51
Wind 26
Humidity 79
Name: Fri, dtype: object
# The equivalent `iloc` statement
df.iloc[4, :]
Note that the above 2 outputs are Series. loc
and iloc
will return a Series when the result is 1-dimensional data.
We can pass a list of labels to loc
to select multiple rows or columns:
# Multiple rows
df.loc[['Thu', 'Fri'], 'Temperature']
Day
Thu 14.44
Fri 10.51
Name: Temperature, dtype: float64
# Multiple columns
df.loc['Fri', ['Temperature', 'Wind']]
Temperature 10.51
Wind 26
Name: Fri, dtype: object
Similarly, a list of integer values can be passed to iloc
to select multiple rows or columns. Here are the equivalent statements using iloc
:
df.iloc[[3, 4], 1]Day
Thu 14.44
Fri 10.51
Name: Temperature, dtype: float64
df.iloc[4, [1, 2]]
Temperature 10.51
Wind 26
Name: Fri, dtype: object
All the above outputs are Series because their results are 1-dimensional data.
The output will be a DataFrame when the result is 2-dimensional data, for example, to access multiple rows and columns
# Multiple rows and columns
rows = ['Thu', 'Fri']
cols=['Temperature','Wind']
df.loc[rows, cols]
The equivalent iloc
statement is:
rows = [3, 4]
cols = [1, 2]
df.iloc[rows, cols]
Slice (written as start:stop:step
) is a powerful technique that allows selecting a range of data. It is very useful when we want to select everything in between two items.
loc
with sliceWith loc
, we can use the syntax A:B
to select data from label A to label B (Both A and B are included):
# Slicing column labels
rows=['Thu', 'Fri']
df.loc[rows, 'Temperature':'Humidity' ]
# Slicing row labels
cols = ['Temperature', 'Wind']
df.loc['Mon':'Thu', cols]
We can use the syntax A:B:S
to select data from label A to label B with step size S (Both A and B are included):
# Slicing with step
df.loc['Mon':'Fri':2 , :]
iloc
with sliceWith iloc
, we can also use the syntax n:m
to select data from position n (included) to position m (excluded). However, the main difference here is that the endpoint (m) is excluded from the iloc
result.
For example, selecting columns from positions 0 up to 3 (excluded):
df.iloc[[1, 2], 0 : 3]
Similarly, we can use the syntax n:m:s
to select data from position n (included) to position m (excluded) with step size s. Notes that the endpoint m is excluded.
df.iloc[0:4:2, :]
loc
with conditions
Often we would like to filter the data based on conditions. For example, we may need to find the rows where humidity is greater than 50.
With loc
, we just need to pass the condition to the loc
statement.
# One condition
df.loc[df.Humidity > 50, :]
Sometimes, we may need to use multiple conditions to filter our data. For example, find all the rows where humidity is more than 50 and the weather is Shower:
## multiple conditions
df.loc[
(df.Humidity > 50) & (df.Weather == 'Shower'),
['Temperature','Wind'],
]
iloc
with conditions
For iloc
, we will get a ValueError if pass the condition straight into the statement:
# Getting ValueError
df.iloc[df.Humidity > 50, :]
We get the error because iloc cannot accept a boolean Series. It only accepts a boolean list. We can use the list() function to convert a Series into a boolean list.
# Single condition
df.iloc[list(df.Humidity > 50)]
Similarly, we can use list()
to convert the output of multiple conditions into a boolean list:
## multiple conditions
df.iloc[
list((df.Humidity > 50) & (df.Weather == 'Shower')),
:,
]
loc
with callable
loc
accepts a callable as an indexer. The callable must be a function with one argument that returns valid output for indexing.
For example to select columns
# Selecting columns
df.loc[:, lambda df: ['Humidity', 'Wind']]
And to filter data with a callable:
# With condition
df.loc[lambda df: df.Humidity > 50, :]
iloc
with callable
iloc
can also take a callable as an indexer.
df.iloc[lambda df: [0,1], :]
To filter data with callable, iloc
will require list()
to convert the output of conditions into a boolean list:
df.iloc[lambda df: list(df.Humidity > 50), :]
loc
and iloc
are interchangeable when labels are 0-based integersFor demonstration, let’s create a DataFrame with 0-based integers as headers and index labels.
df = pd.read_csv(
'data/data.csv',
header=None,
skiprows=[0],
)
With header=None
, the Pandas will generate 0-based integer values as headers. With skiprows=[0]
, those headers Weather, Temperature, etc we have been using will be skipped.
Now, loc
, a label-based data selector, can accept a single integer and a list of integer values. For example:
df.loc[1, 2]
19.67
df.loc[1, [1, 2]]
1 Sunny
2 19.67
Name: 1, dtype: object
The reason they are working is that those integer values (1
and 2
) are interpreted as labels of the index. This use is not an integer position along with the index and is a bit confusing.
In this case, loc
and iloc
are interchangeable when selecting via a single value or a list of values.
df.loc[1, 2] == df.iloc[1, 2]
True
df.loc[1, [1, 2]] == df.iloc[1, [1, 2]]
1 True
2 True
Name: 1, dtype: bool
Note that loc
and iloc
will return different results when selecting via slice and conditions. They are essentially different because:
iloc
result, but included in loc
loc
accepts boolean Series, but iloc
can only accept a boolean list.Finally, here is a summary
loc
is label based and allowed inputs are:
'A'
or 2
(Note that 2
is interpreted as a label of the index.)['A', 'B', 'C']
or [1, 2, 3]
(Note that 1, 2, 3
are interpreted as labels of the index.)'A':'C'
(Both are included)callable
function with one argumentiloc
is integer position based and allowed inputs are:
2
.[1, 2, 3]
.1:7
(the endpoint 7
is excluded)callable
function with one argumentloc
and iloc
are interchangeable when the labels of Pandas DataFrame are 0-based integers
I hope this article will help you to save time in learning Pandas data selection. I recommend you to check out the documentation to know about other things you can do.
Source:
https://towardsdatascience.com/how-to-use-loc-and-iloc-for-selecting-data-in-pandas-bd09cb4c3d79