Pandas query function not working with spaces in column names
- Bhushan Pant
- 2018-06-05 10:11
- 5
I have a dataframe with spaces in column names. I am trying to use query
method to get the results. It is working fine with 'c' column but getting error for 'a b'
import pandas as pd a = pd.DataFrame(columns=["a b", "c"]) a["a b"] = [1,2,3,4] a["c"] = [5,6,7,8] a.query('a b==5')
For this I am getting this error:
a b ==5 ^ SyntaxError: invalid syntax
I don't want to fill up space with other characters like '_' etc.
There is one hack using pandasql to put variable name inside brackets example: [a b]
5 Answers
Pandas 0.25+
As described here:
DataFrame.query()
andDataFrame.eval()
now supports quoting column names with backticks to refer to names with spaces (GH6508)
So you can use:
a.query('`a b`==5')
Pandas pre-0.25
You cannot use pd.DataFrame.query
if you have whitespace in your column name. Consider what would happen if you had columns named a
, b
and a b
; there would be ambiguity as to what you require.
Instead, you can use pd.DataFrame.loc
:
df = df.loc[df['a b'] == 5]
Since you are only filtering rows, you can omit .loc
accessor altogether:
df = df[df['a b'] == 5]
pandas query not equal, Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. Learn more Pandas DataFrame: access multiple items with not equal to, =! Now, how can I select all trains except DeutscheBahn and British Rails and SNCF. How can I simultaneously choose the items not these?
jpp
2019-05-21 10:42
It is not possible yet. Check github issue #6508:
Note that in reality .query is just a nice-to-have interface, in fact it has very specific guarantees, meaning its meant to parse like a query language, and not a fully general interface.
Reason is for query
need string to be a valid python expression, so column names must be valid python identifiers.
Solution is boolean indexing
:
df = df[df['a b'] == 5]
pandas query vs loc, The query function seams more efficient than the loc function. DF2: 2K records x 6 columns. The loc function seams much more efficient than the query function. Both queries return a single record. The simulation was done by running the same operation in a loop 10K times. Running python 2.7 and pandas 0.16.0. Any recommendations to improve the query speed? Motivating query() and eval() : Compound Expressions¶. We've seen previously that NumPy and Pandas support fast vectorized operations; for example, when
jezrael
2018-06-05 10:22
I am afraid that the query method does not accept column name with empty space. In any case you can query the dataframe in this way:
import pandas as pd a = pd.DataFrame({'a b':[1,2,3,4], 'c':[5,6,7,8]}) a[a['a b']==1]
pandas query date, You can use pd.Timestamp to perform a query and a local reference. import pandas as pd import numpy as np df = pd.DataFrame() ts = pd.Timestamp df['date'] = np.array(np.arange(10) + datetime.now().timestamp(), dtype='M8[s]') print(df) print(df.query('date > @ts("20190515T071320")') with the output A step-by-step Python code example that shows how to select Pandas DataFrame rows between two First, lets ensure the 'birth_date' column is in date format.
DTT
2018-06-05 10:31
Instead of using the pandas.query function I would create a condition in this case to lookup values and where the condition is True. For example:
import pandas as pd a = pd.DataFrame(columns=["a b", "c"]) a["a b"] = [1,2,3,5] a["c"] = [5,6,7,8] #a.query('a b==5') Remove the query because it cannot lookup columns with spaces in the name. condition = a['a b'] == 5 print(a['a b'][condition])
output: 3 5
We see that at index 3 your condition evaluates to True (if you want the specific index and not Series of Boolean values).
pandas read_csv column names with spaces, Thank for the comment. I normally use dot to access my columns (df.col_name) but just know this trick to access the column names with space by using df[column name with space"]. Thx. – theteddyboy Oct 12 '16 at 10:31 Why do they have to make the column names uppercase, with spaces, and whitespace all around? Do they like doing this to you? They probably
Simeon Ikudabo
2019-11-20 10:41
From pandas
0.25
onward you will be able to escape column names with backticks so you can dopandas.DataFrame.query to allow column name with space · Issue , From pandas 0.25 onward you will be able to escape column names with backticks so you can do a.query('`a b` == 5'). Query function gives error if column names have front slash #12858 Users shouldn't have to do this and it's a serious problem of pandas. DG.
Jarno
2019-05-20 06:16