To solve this error, check the shape of the object you’re trying to assign the df columns (using np.shape
). The second (or the last) dimension must match the number of columns you’re trying to assign to. For example, if you try to assign a 2-column numpy array to 3 columns, you’ll see this error.
A general workaround (for case 1 and case 2 below) is to cast the object you’re trying to assign to a DataFrame and join()
it to df
, i.e. instead of (1), use (2).
df[cols] = vals # (1)
df = df.join(vals) if isinstance(vals, pd.DataFrame) else df.join(pd.DataFrame(vals)) # (2)
If you’re trying to replace values in an existing column and got this error (case 3(a) below), convert the object to list and assign.
df[cols] = vals.values.tolist()
If you have duplicate columns (case 3(b) below), then there’s no easy fix. You’ll have to make the dimensions match manually.
This error occurs in 3 cases:
Case 1: When you try to assign a list-like object (e.g. lists, tuples, sets, numpy arrays, and pandas Series) to a list of DataFrame column(s) as new arrays1 but the number of columns doesn’t match the second (or last) dimension (found using np.shape
) of the list-like object. So the following reproduces this error:
df = pd.DataFrame({'A': [0, 1]})
cols, vals = ['B'], [[2], [4, 5]]
df[cols] = vals # number of columns is 1 but the list has shape (2,)
Note that if the columns are not given as list, pandas Series, numpy array or Pandas Index, this error won’t occur. So the following doesn’t reproduce the error:
df[('B',)] = vals # the column is given as a tuple
One interesting edge case occurs when the list-like object is multi-dimensional (but not a numpy array). In that case, under the hood, the object is cast to a pandas DataFrame first and is checked if its last dimension matches the number of columns. This produces the following interesting case:
# the error occurs below because pd.DataFrame(vals1) has shape (2, 2) and len(['B']) != 2
vals1 = [[[2], [3]], [[4], [5]]]
df[cols] = vals1
# no error below because pd.DataFrame(vals2) has shape (2, 1) and len(['B']) == 1
vals2 = [[[[2], [3]]], [[[4], [5]]]]
df[cols] = vals2
Case 2: When you try to assign a DataFrame to a list (or pandas Series or numpy array or pandas Index) of columns but the respective numbers of columns don’t match. This case is what caused the error in the OP. The following reproduce the error:
df = pd.DataFrame({'A': [0, 1]})
df[['B']] = pd.DataFrame([[2, 3], [4]]) # a 2-column df is trying to be assigned to a single column
df[['B', 'C']] = pd.DataFrame([[2], [4]]) # a single column df is trying to be assigned to 2 columns
Case 3: When you try to replace the values of existing column(s) by a DataFrame (or a list-like object) whose number of columns doesn’t match the number of columns it’s replacing. So the following reproduce the error:
# case 3(a)
df1 = pd.DataFrame({'A': [0, 1]})
df1['A'] = pd.DataFrame([[2, 3], [4, 5]]) # df1 has a single column named 'A' but a 2-column-df is trying to be assigned
# case 3(b): duplicate column names matter too
df2 = pd.DataFrame([[0, 1], [2, 3]], columns=['A','A'])
df2['A'] = pd.DataFrame([[2], [4]]) # df2 has 2 columns named 'A' but a single column df is being assigned
1: df.loc[:, cols] = vals
may overwrite data inplace, so this won’t produce the error but will create columns of NaN values.
Are you getting the “ValueError: columns must be same length as key” error in Python while working with pandas DataFrames?
The “ValueError: columns must be the same length as key” error is raised in Python when working with pandas DataFrame to convert a list into a column. The number of columns should be the same as the data columns; otherwise, you’ll get this ValueError.
In this article, we’ll discuss why we get the ValueError: column must be the same length as the key and how to fix it, along with practical examples. Furthermore, we’ll discuss converting a list into a DataFrame column in Python.
Let’s get straight into the topic and see how it works!
We get this error when the number of columns doesn’t match the data column in Python. Let us say you have a list of students, and you want to convert that list into two columns, students and teacher, which isn’t valid to do. Because one list means one column, you cannot give two names to a single column.
For further understanding, let’s have a look at a practical example.
Code
import pandas as pd # create a list of students Students = ["James", "John", "Jenny", "Marry",] # convert the list a DataFrame column df1 = pd.DataFrame(Students, columns=['Students']) print(df2)
Output
Students 0 James 1 John 2 Jenny 3 Marry
Now let’s try to split the above column Students into two and see how it works:
Code
import pandas as pd # create a nested list Students = ["James", "John", "Jenny", "Marry"] # convert the list into DataFrame columns 'Students' and 'Teachers' df = pd.DataFrame(Students, columns=['Studnets', 'Teachers']) print(df)
Output
See, we are getting the ValueError: column must be the same length as the key, and the reason behind this is that we cannot represent a single column with two names in pandas DataFrame.
How to Fix the “ValueError: Column Must be Same Length as Key” Error in Python?
To fix the ValueError, we need to provide a valid number of columns to the data. Like a single list will represent a single column, whereas multiple lists will represent multiple columns of the same length.
Let’s see an example for further understanding:
Code
import pandas as pd # create a nested list Data = [["Albert", "Jasica"], ["Heisenberg", "Shane"]] # convert the nested list a DataFrame columns' Students' and 'Teachers' df = pd.DataFrame(Data, columns=['Students', 'Teachers']) print(df)
Output
Students Teachers 0 Albert Jasica 1 Heisenberg Shane
In the above example, we have created a nested list that can be represented in multiple columns in a DataFrame. The nested list has two lists, so we represent it in two columns.
Furthermore, we can also use a dictionary to convert it into a DataFrame column in Python. Let’s see an example of it:
Code
import pandas as pd # create a data dictionary dic = {'Students' : ["James", "John", "Jenny", "Marry"], "Teachers" : ["Albert", "Jasica", "Tom", "Heisenberg"]} # create a DataFrame df = pd.DataFrame(dic,columns=['Students','Teachers']) print("********** Data **********") print(df)
Output
********** Data ********** Students Teachers 0 James Albert 1 John Jasica 2 Jenny Tom 3 Marry Heisenberg
In the above example, we’ve created a dictionary we two key-value pairs to each represent a column which is in the next step converted into a dictionary with the names of the columns as Students and Teacher.
Conclusion
To conclude the article on how to fix the “ValueError: columns must be the same length as key” error, we’ve discussed why it raises the ValueError and how to fix it along with practical examples. Furthermore, as a solution part, we’ve seen how to represent a nested list into a data frame with column names as labels.
To extend the solution of this ValueError, we’ve also discussed how to use a dictionary to convert it into a DataFrame and represent the data in columns with user-defined labels as column names.
Time to explore more 💡; Can we represent a nested dictionary into a data frame? If yes, please comment down the code for it.
Zeeshan is a detail-oriented software engineer and technical content writer with a Bachelor’s in Computer Software Engineering and certifications in SEO and content writing. Thus, he has a passion for creating high-quality, SEO-optimized technical content to help companies and individuals document ideas to make their lives easier with software solutions. With over 150 published articles in various niches, including computer sciences and programming languages such as C++, Java, Python, HTML, CSS, and Ruby, he has a proven track record of delivering well-researched and engaging technical content.
Estimated reading time: 3 minutes
Are you looking to learn python, and in the process coming across this error and trying to understand why it occurs?
In essence, this usually occurs when you have more than one data frames and in the process of writing your program you are trying to use the data frames and their data, but there is a mismatch in the no of items in each that the program cannot process until it is fixed.
A common scenario where this may happen is when you are joining data frames or splitting out data, these will be demonstrated below.
Scenario 1 – Joining data frames
Where we have df1[[‘a’]] = df2 we are assigning the values on the left side of the equals sign to what is on the right.
When we look at the right-hand side it has three columns, the left-hand side has one.
As a result the error “ValueError: Columns must be same length as key” will appear, as per the below.
import pandas as pd
list1 = [1,2,3]
list2 = [[4,5,6],[7,8,9]]
df1 = pd.DataFrame(list1,columns=['column1'])
df2 = pd.DataFrame(list2,columns=['column2','column3','column4'])
df1[['a']] = df2
The above code throws the below error:
The objective here is to have all the columns from the right-hand side, beside the columns from the left-hand side as follows:
What we have done is make both sides equal regards the no of columns to be shown from df2
Essentially we are taking the column from DF1, and then bringing in the three columns from DF2.
The columna, columnb, columnc below correspond to the three columns in DF2, and will store the data from them.
The fix for this issue is : df1[[‘columna’,’columnb’,’columnc’]] = df2
print (df1)
Scenario 2 – Splitting out data
There may be an occasion when you have a python list, and you need to split out the values of that list into separate columns.
new_list1 = ['1 2 3']
df1_newlist = pd.DataFrame(new_list1,columns=['column1'])
In the above, we have created a list, with three values that are part of one string. Here what we are looking to do is create a new column with the below code:
df1_newlist[["column1"]] = df1_newlist["column1"].str.split(" ", expand=True) #Splitting based on the space between the values.
print(df1_newlist)
When we run the above it throws the following valueerror:
The reason it throws the error is that the logic has three values to be split out into three columns, but we have only defined one column in df1_newlist[[“column1”]]
To fix this, we run the below code:
df1_newlist[["column1","column2","column3"]] = df1_newlist["column1"].str.split(" ", expand=True) #Splitting based on the space between the values.
print(df1_newlist)
This returns the following output, with the problem fixed!
To fix the ValueError: columns must be same length as key error in Pandas, make sure that the number of keys and the number of values in each row match and that each key corresponds to a unique value.
Python raises a “ValueError: columns must be same length as key” error in Pandas when you try to create a DataFrame, and the number of columns and keys do not match.
Why ValueError occurs in Pandas?
- When you attempt to assign a list-like object (For example lists, tuples, sets, numpy arrays, and pandas Series) to a list of DataFrame columns as new arrays but the number of columns doesn’t match the second (or last) dimension (found using
np.shape
) of the list-like object. - When you attempt to assign a DataFrame to a list (or pandas Series or numpy array or pandas Index) of columns but the respective numbers of columns don’t match.
- When you attempt to replace the values of an existing column with a DataFrame (or a list-like object) whose number of columns doesn’t match the number of columns it’s replacing.
Python code that generates the error
import pandas as pd
list1 = [11, 21, 19]
list2 = [[46, 51, 61], [71, 81, 91]]
df1 = pd.DataFrame(list1, columns=['column1'])
df2 = pd.DataFrame(list2, columns=['column2', 'column3', 'column4'])
df1[['a']] = df2
Output
In the above code example, the interpreter raised a ValueError: Columns must be same length as key error because the number of columns in df2(3 columns) is different from the number of rows in df1(1 row).
Code that fixes the error
Pandas DataFrame requires that the number of columns matches the number of values for each row.
import pandas as pd
list1 = [11, 21, 19]
list2 = [[46, 51, 61], [71, 81, 91]]
df1 = pd.DataFrame(list1, columns=['column1'])
# Increase the number of rows in df1 to match the number of columns in df2
df1 = pd.concat([df1] * len(list2), ignore_index=True)
df2 = pd.DataFrame(list2, columns=['column2','column3','column4'])
df1[['column2', 'column3', 'column4']] = df2
print(df1)
Output
column1 column2 column3 column4
0 11 46.0 51.0 61.0
1 21 71.0 81.0 91.0
2 19 NaN NaN NaN
3 11 NaN NaN NaN
4 21 NaN NaN NaN
5 19 NaN NaN NaN
In this code example, a new DataFrame df1 with the same number of rows as df2 by concatenating df1 with itself multiple times and then adding the columns from df2 to df1. This ensures that the number of columns and rows match and averts the ValueError from being raised.
If the values are not there in the column, NaN will be placed.
You can also check the shape of the object you’re trying to assign the df columns using the np.shape.
The second (or the last) dimension must match the number of columns you’re trying to assign to. For example, if you try to assign a 2-column numpy array to 3 columns, you’ll see the ValueError.
I hope this article helped you resolve your error.
This error happens when you try to assign a data frame as columns to another data frame, and the number of column names provided is not equal to the number of columns of the assignee data frame. e.g. given a simple data frame as follows:
df = pd.DataFrame({'x': [1,2,3]}) df # x #0 1 #1 2 #2 3
And a second data frame:
df1 = pd.DataFrame([[1, 2], [3, 4], [5, 6]]) df1 # 0 1 #0 1 2 #1 3 4 #2 5 6
It is possible to assign df1
to df
as columns as follows:
df[['a', 'b']] = df1 df # x a b #0 1 1 2 #1 2 3 4 #2 3 5 6
Notice how the content of df1
has been assigned as two separate columns to df
.
Now given a third data frame with only one column, on the other hand:
df2 = pd.DataFrame([[1], [3], [5]]) df2 # 0 #0 1 #1 3 #2 5
If you do df[['a', 'b']] = df2
, you will get an error:
ValueError: Columns must be same length as key
The reason being that df2
has only one column, but on the left side of the assignment, you are expecting two columns to be assigned, hence the error.
Solution:
Whenever you encounter this error, just make sure you check the number of columns to be assigned to is the same of the number of columns in the assignee data frame. In our above example, for instance, if you have ['a', 'b
‘] on the left side as columns, then make sure you also have a data frame that has two columns on the right side.
Bonus: Want to play around pandas and python in the browser without going through any complex set up, try the PyConsole browser extension:
- Install chrome extension
- Install firefox extension
Please consider writing a review and sharing it with other people if you like it 🙂