Question:

How to create new column in Data Frame?

Problem:

I have a data frame in Python Pandas, like below:

The column my_date is in datetime format.
In my real data frame, I have many more columns.

id	my_date	col1
111	2023-05-15	1
111	2023-05-14	11
111	2023-05-13	2
222	2023-10-11	3
222	2023-10-12	55

And I need to create a new column named col_x where for each id will be values 1,2,3 and so on based on date in column my_date. So, for each id in column col_x has to be 1 in the early date from my_date and so on.

So, as a result, I need something like below:

id	my_date	col1	col_x
111	2023-05-15	1	3
111	2023-05-14	11	2
111	2023-05-13	2	1
222	2023-10-11	3	1
222	2023-12-12	55	2

How can I do that in Python Pandas?

Solution:

Looks like you could use a >groupby.rank:

df['col_x'] = df.groupby('id')['my_date'].rank('dense').astype(int)

This is more or less equivalent to finding the sorting order ((e.g.,using >numpy.argsort, or rank('first')), except it will rank the duplicate values equally:

df['col_x'] = df.groupby('id')['my_date'].transform(np.argsort).add(1)

Output:

id my_date col1 col_x

0 111 2023-05-15 1 3

1 111 2023-05-14 11 2

2 111 2023-05-13 2 1

3 222 2023-10-11 3 1

4 222 2023-10-12 55 2

Example with duplicate values:

id my_date col1 rank('dense') argsort+1 rank('first')

0 111 2023-05-15 1 3 4 4

1 111 2023-05-14 11 2 2 2