Pandas - Manipulating data row-wise

import pandas as pd

iris = pd.read_csv("archive.ics.uci.edu/ml/machine-learning-dat..")

df = iris.copy()

df.columns = ['sl', 'sw', 'pl', 'pw', 'flower_type']

1) deleting a particular row

df.drop(0, inplace = True)

df.drop(3, inplace = True)

df.head()

	sl	sw	pl	pw	flower_type
1	4.7	3.2	1.3	0.2	Iris-setosa
2	4.6	3.1	1.5	0.2	Iris-setosa
4	5.4	3.9	1.7	0.4	Iris-setosa
5	4.6	3.4	1.4	0.3	Iris-setosa
6	5.0	3.4	1.5	0.2	Iris-setosa

It deletes the row with the label 3. By default, an extra column from 0 to n - 1 labeled column is added, which gets mistaken to be index values. In reality, it is just a label. Hence, if try to run the same above code again, it will throw an error as the row with label = 3 will get deleted in the first run.

If we don't use the inplace = True argument, all the changes made will be in the copy of df file not on the original df file.

2) see all the labels

print(df.index)

Int64Index([  1,   2,   4,   5,   6,   7,   8,   9,  10,  11,
            ...
            139, 140, 141, 142, 143, 144, 145, 146, 147, 148],
           dtype='int64', length=147)

3) deleting particular row based on position or actual index value

df.drop(df.index[0], inplace = True)

df.head()

	sl	sw	pl	pw	flower_type
2	4.6	3.1	1.5	0.2	Iris-setosa
4	5.4	3.9	1.7	0.4	Iris-setosa
5	4.6	3.4	1.4	0.3	Iris-setosa
6	5.0	3.4	1.5	0.2	Iris-setosa
7	4.4	2.9	1.4	0.2	Iris-setosa

4) deleting multiple rows at the same time

df.drop(df.index[[0, 1]], inplace = True)

df.head()

	sl	sw	pl	pw	flower_type
5	4.6	3.4	1.4	0.3	Iris-setosa
6	5.0	3.4	1.5	0.2	Iris-setosa
7	4.4	2.9	1.4	0.2	Iris-setosa
8	4.9	3.1	1.5	0.1	Iris-setosa
9	5.4	3.7	1.5	0.2	Iris-setosa

5) running a particular condition

df.sl > 5

5      False
6      False
7      False
8      False
9       True
       ...  
144     True
145     True
146     True
147     True
148     True
Name: sl, Length: 144, dtype: bool

This way returns in a true or false format

/ Better representation method

df[df.sl > 5]

	sl	sw	pl	pw	flower_type
9	5.4	3.7	1.5	0.2	Iris-setosa
13	5.8	4.0	1.2	0.2	Iris-setosa
14	5.7	4.4	1.5	0.4	Iris-setosa
15	5.4	3.9	1.3	0.4	Iris-setosa
16	5.1	3.5	1.4	0.3	Iris-setosa
...	...	...	...	...	...
144	6.7	3.0	5.2	2.3	Iris-virginica
145	6.3	2.5	5.0	1.9	Iris-virginica
146	6.5	3.0	5.2	2.0	Iris-virginica
147	6.2	3.4	5.4	2.3	Iris-virginica
148	5.9	3.0	5.1	1.8	Iris-virginica

116 rows × 5 columns

This method returns only those rows which satisfies the given condtion.

df[df.flower_type == 'Iris-setosa']

	sl	sw	pl	pw	flower_type
5	4.6	3.4	1.4	0.3	Iris-setosa
6	5.0	3.4	1.5	0.2	Iris-setosa
7	4.4	2.9	1.4	0.2	Iris-setosa
8	4.9	3.1	1.5	0.1	Iris-setosa
9	5.4	3.7	1.5	0.2	Iris-setosa
10	4.8	3.4	1.6	0.2	Iris-setosa
11	4.8	3.0	1.4	0.1	Iris-setosa
12	4.3	3.0	1.1	0.1	Iris-setosa
13	5.8	4.0	1.2	0.2	Iris-setosa
14	5.7	4.4	1.5	0.4	Iris-setosa
15	5.4	3.9	1.3	0.4	Iris-setosa
16	5.1	3.5	1.4	0.3	Iris-setosa
17	5.7	3.8	1.7	0.3	Iris-setosa
18	5.1	3.8	1.5	0.3	Iris-setosa
19	5.4	3.4	1.7	0.2	Iris-setosa
20	5.1	3.7	1.5	0.4	Iris-setosa
21	4.6	3.6	1.0	0.2	Iris-setosa
22	5.1	3.3	1.7	0.5	Iris-setosa
23	4.8	3.4	1.9	0.2	Iris-setosa
24	5.0	3.0	1.6	0.2	Iris-setosa
25	5.0	3.4	1.6	0.4	Iris-setosa
26	5.2	3.5	1.5	0.2	Iris-setosa
27	5.2	3.4	1.4	0.2	Iris-setosa
28	4.7	3.2	1.6	0.2	Iris-setosa
29	4.8	3.1	1.6	0.2	Iris-setosa
30	5.4	3.4	1.5	0.4	Iris-setosa
31	5.2	4.1	1.5	0.1	Iris-setosa
32	5.5	4.2	1.4	0.2	Iris-setosa
33	4.9	3.1	1.5	0.1	Iris-setosa
34	5.0	3.2	1.2	0.2	Iris-setosa
35	5.5	3.5	1.3	0.2	Iris-setosa
36	4.9	3.1	1.5	0.1	Iris-setosa
37	4.4	3.0	1.3	0.2	Iris-setosa
38	5.1	3.4	1.5	0.2	Iris-setosa
39	5.0	3.5	1.3	0.3	Iris-setosa
40	4.5	2.3	1.3	0.3	Iris-setosa
41	4.4	3.2	1.3	0.2	Iris-setosa
42	5.0	3.5	1.6	0.6	Iris-setosa
43	5.1	3.8	1.9	0.4	Iris-setosa
44	4.8	3.0	1.4	0.3	Iris-setosa
45	5.1	3.8	1.6	0.2	Iris-setosa
46	4.6	3.2	1.4	0.2	Iris-setosa
47	5.3	3.7	1.5	0.2	Iris-setosa
48	5.0	3.3	1.4	0.2	Iris-setosa

Generate more detailed information

df[df.flower_type == 'Iris-setosa'].describe()

	sl	sw	pl	pw
count	44.000000	44.000000	44.000000	44.000000
mean	5.013636	3.422727	1.465909	0.245455
std	0.362543	0.389313	0.179071	0.110925
min	4.300000	2.300000	1.000000	0.100000
25%	4.800000	3.175000	1.400000	0.200000
50%	5.000000	3.400000	1.500000	0.200000
75%	5.200000	3.700000	1.600000	0.300000
max	5.800000	4.400000	1.900000	0.600000

6) checking a particular row

print(df.iloc[0]) # position based

print(df.loc[5]) # label based


sl                     4.6
sw                     3.4
pl                     1.4
pw                     0.3
flower_type    Iris-setosa

Name: 5, dtype: object
sl                     4.6
sw                     3.4
pl                     1.4
pw                     0.3
flower_type    Iris-setosa
Name: 5, dtype: object

7) adding a row

df.loc[0] = [1, 2, 3, 4, 'Iris-sertosa']

df.tail()

	sl	sw	pl	pw	flower_type
145	6.3	2.5	5.0	1.9	Iris-virginica
146	6.5	3.0	5.2	2.0	Iris-virginica
147	6.2	3.4	5.4	2.3	Iris-virginica
148	5.9	3.0	5.1	1.8	Iris-virginica
0	1.0	2.0	3.0	4.0	Iris-sertosa

Adds a row to the last with the label name = 0 and provided data.