Question:
How .transform handle the splitted groups?

Solution: 

.transform does not handle the split groups, however, it does run some checks to see the type of output. You use .transform to return something of a similar shape as the input rather than using it for its side effects (you should use> apply as shown later).


A custom function might make the following more clear:


def f(group):

    print('---')

    print(group.name)  # with `transform` this shouldn't give the group name

    print(group)

    print('===')

    

df.groupby('subject').transform(f)



Output:

---                           # first group

level

0    hard

1    None

Name: level, dtype: object

===

---                           # internal pandas check (not a real group)

a

  level

0  hard

1  None

===

---                           # second group

level

2    None

3    easy

Name: level, dtype: object

===

---                           # third group

level

4    None

Name: level, dtype: object

===

---                           # fourth group

level

5    medium

Name: level, dtype: object

===


In comparison, using> apply does give the group names that you can use for this kind of operation:


df.groupby('subject').apply(f)


---

a

  subject level

0       a  hard

1       a  None

===

---

b

  subject level

2       b  None

3       b  easy

===

---

c

  subject level

4       c  None

===

---

d

  subject   level

5       d  medium

===


Avoid using .transform to manually work on groups.


The below code is another example. The current Series name is returned by group .name in transform; observe what occurs when multiple columns are used.:


df = pd.DataFrame({'subject': ['a', 'a', 'b', 'b', 'c', 'd'],

                   'level': ['hard', None, None, 'easy', None, 'medium'],

                   'level2': ['hard', None, None, 'easy', None, 'medium']

                  })

df.groupby('subject').transform(lambda g: print(g.name))

  

level    # first group, column "level"

level2   # first group, column "level2"

a        # some internal check run only once

level    # second group, column "level"

level2   # second group, column "level2"

level    # etc.

level2

level

level2


In comparison, apply would return each group as DataFrame:


df.groupby('subject').apply(lambda g: print(g.name))


a

b

c

d


Suggested blogs:

>Avoid code duplication between django Q expression and Python

>How can I iterate over enum Flag alias?

>How to train a deep learning in case validation with Pytorch result NaN or high loss?s

>Fixing QVariant output issue in Python

>Removing ASCII formatted characters from my CAN messages in Python

>Why `import` does not work in this case (in `exec`)?

>Adding new column/columns to the existing table in a migration-Laravel


Ritu Singh

Ritu Singh

Submit
0 Answers