Pandas - Manipulating data row-wise
import pandas as pd
iris = pd.read_csv("archive.ics.uci.edu/ml/machine-learning-dat..")
df = iris.copy()
df.columns = ['sl', 'sw', 'pl', 'pw', 'flower_type']
1) deleting a particular row
df.drop(0, inplace = True)
df.drop(3, inplace = True)
df.head()
sl | sw | pl | pw | flower_type | |
1 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
2 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5.4 | 3.9 | 1.7 | 0.4 | Iris-setosa |
5 | 4.6 | 3.4 | 1.4 | 0.3 | Iris-setosa |
6 | 5.0 | 3.4 | 1.5 | 0.2 | Iris-setosa |
It deletes the row with the label 3. By default, an extra column from 0 to n - 1 labeled column is added, which gets mistaken to be index values. In reality, it is just a label. Hence, if try to run the same above code again, it will throw an error as the row with label = 3 will get deleted in the first run.
If we don't use the inplace = True argument, all the changes made will be in the copy of df file not on the original df file.
2) see all the labels
print(df.index)
Int64Index([ 1, 2, 4, 5, 6, 7, 8, 9, 10, 11,
...
139, 140, 141, 142, 143, 144, 145, 146, 147, 148],
dtype='int64', length=147)
3) deleting particular row based on position or actual index value
df.drop(df.index[0], inplace = True)
df.head()
sl | sw | pl | pw | flower_type | |
2 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5.4 | 3.9 | 1.7 | 0.4 | Iris-setosa |
5 | 4.6 | 3.4 | 1.4 | 0.3 | Iris-setosa |
6 | 5.0 | 3.4 | 1.5 | 0.2 | Iris-setosa |
7 | 4.4 | 2.9 | 1.4 | 0.2 | Iris-setosa |
4) deleting multiple rows at the same time
df.drop(df.index[[0, 1]], inplace = True)
df.head()
sl | sw | pl | pw | flower_type | |
5 | 4.6 | 3.4 | 1.4 | 0.3 | Iris-setosa |
6 | 5.0 | 3.4 | 1.5 | 0.2 | Iris-setosa |
7 | 4.4 | 2.9 | 1.4 | 0.2 | Iris-setosa |
8 | 4.9 | 3.1 | 1.5 | 0.1 | Iris-setosa |
9 | 5.4 | 3.7 | 1.5 | 0.2 | Iris-setosa |
5) running a particular condition
5 False
6 False
7 False
8 False
9 True
...
144 True
145 True
146 True
147 True
148 True
Name: sl, Length: 144, dtype: bool
This way returns in a true or false format
/ Better representation method
sl | sw | pl | pw | flower_type | |
9 | 5.4 | 3.7 | 1.5 | 0.2 | Iris-setosa |
13 | 5.8 | 4.0 | 1.2 | 0.2 | Iris-setosa |
14 | 5.7 | 4.4 | 1.5 | 0.4 | Iris-setosa |
15 | 5.4 | 3.9 | 1.3 | 0.4 | Iris-setosa |
16 | 5.1 | 3.5 | 1.4 | 0.3 | Iris-setosa |
... | ... | ... | ... | ... | ... |
144 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
145 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
146 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
147 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
148 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
116 rows × 5 columns
This method returns only those rows which satisfies the given condtion.
df[df.flower_type == 'Iris-setosa']
sl | sw | pl | pw | flower_type | |
5 | 4.6 | 3.4 | 1.4 | 0.3 | Iris-setosa |
6 | 5.0 | 3.4 | 1.5 | 0.2 | Iris-setosa |
7 | 4.4 | 2.9 | 1.4 | 0.2 | Iris-setosa |
8 | 4.9 | 3.1 | 1.5 | 0.1 | Iris-setosa |
9 | 5.4 | 3.7 | 1.5 | 0.2 | Iris-setosa |
10 | 4.8 | 3.4 | 1.6 | 0.2 | Iris-setosa |
11 | 4.8 | 3.0 | 1.4 | 0.1 | Iris-setosa |
12 | 4.3 | 3.0 | 1.1 | 0.1 | Iris-setosa |
13 | 5.8 | 4.0 | 1.2 | 0.2 | Iris-setosa |
14 | 5.7 | 4.4 | 1.5 | 0.4 | Iris-setosa |
15 | 5.4 | 3.9 | 1.3 | 0.4 | Iris-setosa |
16 | 5.1 | 3.5 | 1.4 | 0.3 | Iris-setosa |
17 | 5.7 | 3.8 | 1.7 | 0.3 | Iris-setosa |
18 | 5.1 | 3.8 | 1.5 | 0.3 | Iris-setosa |
19 | 5.4 | 3.4 | 1.7 | 0.2 | Iris-setosa |
20 | 5.1 | 3.7 | 1.5 | 0.4 | Iris-setosa |
21 | 4.6 | 3.6 | 1.0 | 0.2 | Iris-setosa |
22 | 5.1 | 3.3 | 1.7 | 0.5 | Iris-setosa |
23 | 4.8 | 3.4 | 1.9 | 0.2 | Iris-setosa |
24 | 5.0 | 3.0 | 1.6 | 0.2 | Iris-setosa |
25 | 5.0 | 3.4 | 1.6 | 0.4 | Iris-setosa |
26 | 5.2 | 3.5 | 1.5 | 0.2 | Iris-setosa |
27 | 5.2 | 3.4 | 1.4 | 0.2 | Iris-setosa |
28 | 4.7 | 3.2 | 1.6 | 0.2 | Iris-setosa |
29 | 4.8 | 3.1 | 1.6 | 0.2 | Iris-setosa |
30 | 5.4 | 3.4 | 1.5 | 0.4 | Iris-setosa |
31 | 5.2 | 4.1 | 1.5 | 0.1 | Iris-setosa |
32 | 5.5 | 4.2 | 1.4 | 0.2 | Iris-setosa |
33 | 4.9 | 3.1 | 1.5 | 0.1 | Iris-setosa |
34 | 5.0 | 3.2 | 1.2 | 0.2 | Iris-setosa |
35 | 5.5 | 3.5 | 1.3 | 0.2 | Iris-setosa |
36 | 4.9 | 3.1 | 1.5 | 0.1 | Iris-setosa |
37 | 4.4 | 3.0 | 1.3 | 0.2 | Iris-setosa |
38 | 5.1 | 3.4 | 1.5 | 0.2 | Iris-setosa |
39 | 5.0 | 3.5 | 1.3 | 0.3 | Iris-setosa |
40 | 4.5 | 2.3 | 1.3 | 0.3 | Iris-setosa |
41 | 4.4 | 3.2 | 1.3 | 0.2 | Iris-setosa |
42 | 5.0 | 3.5 | 1.6 | 0.6 | Iris-setosa |
43 | 5.1 | 3.8 | 1.9 | 0.4 | Iris-setosa |
44 | 4.8 | 3.0 | 1.4 | 0.3 | Iris-setosa |
45 | 5.1 | 3.8 | 1.6 | 0.2 | Iris-setosa |
46 | 4.6 | 3.2 | 1.4 | 0.2 | Iris-setosa |
47 | 5.3 | 3.7 | 1.5 | 0.2 | Iris-setosa |
48 | 5.0 | 3.3 | 1.4 | 0.2 | Iris-setosa |
Generate more detailed information
df[df.flower_type == 'Iris-setosa'].describe()
sl | sw | pl | pw | |
count | 44.000000 | 44.000000 | 44.000000 | 44.000000 |
mean | 5.013636 | 3.422727 | 1.465909 | 0.245455 |
std | 0.362543 | 0.389313 | 0.179071 | 0.110925 |
min | 4.300000 | 2.300000 | 1.000000 | 0.100000 |
25% | 4.800000 | 3.175000 | 1.400000 | 0.200000 |
50% | 5.000000 | 3.400000 | 1.500000 | 0.200000 |
75% | 5.200000 | 3.700000 | 1.600000 | 0.300000 |
max | 5.800000 | 4.400000 | 1.900000 | 0.600000 |
6) checking a particular row
print(df.iloc[0]) # position based
print(df.loc[5]) # label based
sl 4.6
sw 3.4
pl 1.4
pw 0.3
flower_type Iris-setosa
Name: 5, dtype: object
sl 4.6
sw 3.4
pl 1.4
pw 0.3
flower_type Iris-setosa
Name: 5, dtype: object
7) adding a row
df.loc[0] = [1, 2, 3, 4, 'Iris-sertosa']
df.tail()
sl | sw | pl | pw | flower_type | |
145 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
146 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
147 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
148 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
0 | 1.0 | 2.0 | 3.0 | 4.0 | Iris-sertosa |
Adds a row to the last with the label name = 0 and provided data.