Pandas Introduction 3
Concat and Append
1 | import pandas as pd |
1 | def make_df(cols, ind): |
review: numpy.concatenate
1 | x = [[1, 2], |
use .concat to combine
1 | ser1 = pd.Series(['A', 'B', 'C'], index=[1, 2, 3]) |
and multilevel one:
1 | df1 = make_df('AB', [1, 2]) |
when the indexes are the same:
1 | x = make_df('AB', [0, 1]) |
ignore:
1 | print(x); print(y); print(pd.concat([x, y], ignore_index=True)) |
add more indexes:
1 | print(x); print(y); print(pd.concat([x, y], keys=['x', 'y'])) |
join:
1 | df5 = make_df('ABC', [1, 2]) |
to deal with NaN:
inner and outer:
1 | print(df5); print(df6); |
axe:
1 | print(df5); print(df6); |
append:df1.append(df2) = pd.concat([df1, df2])
merge
pd.merge:
one to one:
1 | df1 = pd.DataFrame({'employee': ['Bob', 'Jake', 'Lisa', 'Sue'], |
many to one:
1 | df4 = pd.DataFrame({'group': ['Accounting', 'Engineering', 'HR'], |
many to many:
1 | df5 = pd.DataFrame({'group': ['Accounting', 'Accounting', |
use of on
1 | print(df1); print(df2); print(pd.merge(df1, df2, on='employee')) |
when two dataframes use different column names but the sam data,we can use left_on and right_on
1 | df3 = pd.DataFrame({'name': ['Bob', 'Jake', 'Lisa', 'Sue'], 'salary': [70000, 80000, 120000, 90000]}) |
left_index and right_index
1 | df1a = df1.set_index('employee') |
merge of different dataframes:
1 | df6 = pd.DataFrame({'name': ['Peter', 'Paul', 'Mary'], |
suffixes: the column names are repeated
1 | df8 = pd.DataFrame({'name': ['Bob', 'Jake', 'Lisa', 'Sue'], |