Python Pandas
Pandas is a package used for managing data.
Pandas main use is that it created two new data types storing data: Series and DataFrame.
Pandas DataFrame likes an excel spreadsheet that is storing some data.
A dataFrame is made up of several Series. Each column of a dataFrame is a Series. We can name each column and row of a DataFrame.
A pandas dataFrame is very similar to a Dataframe in R.
Similar to Numpy arrays, a DataFrame is a more robust data type for storing data than lists of lists. DataFrames are more flexible than Numpy arrays.
A Numpy array can create a matrix with all entries of the same data type. In a DataFrame, each column can have its own data type.
Pandas stand for Panel Data and are the core library for data manipulation and data analysis.
It consists of single-dimensional and multidimensional data structures for data manipulation.
Pandas Data Structures
1.Single-dimensional
2.Series object
3.Multi-dimensional
4.Data-frame
Pandas Series Object
Series Object is a one-dimensional labeled array.
import numpy as np
import pandas as pd
s1=pd.Series([10,20,30,40,50])
s1
0 10
1 20
2 30
3 40
4 50
dtype: int64
type(s1)
pandas.core.series.Series
Changing Index
import pandas as pd
s1=pd.Series([10,20,30,40,50],index=['a','b','c','d','e'])
s1
a 10
b 20
c 30
d 40
e 50
dtype: int64
Extracting Individual Elements
Extracting a single element
import pandas as pd
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1[5]
60
Extracting elements from back
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1[-5:]
4 50
5 60
6 70
7 80
8 90
dtype: int64
Extracting a sequence of elements
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1[:5]
0 10
1 20
2 30
3 40
4 50
dtype: int64
Basic Math Operations on Series
Adding a scalar value to Series elements
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1+1
0 11
1 21
2 31
3 41
4 51
5 61
6 71
7 81
8 91
dtype: int64
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1*2
0 20
1 40
2 60
3 80
4 100
5 120
6 140
7 160
8 180
dtype: int64
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1-2
0 8
1 18
2 28
3 38
4 48
5 58
6 68
7 78
8 88
dtype: int64
Adding two Series objects
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s2=pd.Series([2,4,6,8,10,12,14,16,18])
s1+s2
0 12
1 24
2 36
3 48
4 60
5 72
6 84
7 96
8 108
dtype: int64
Pandas DataFrame
DataFrame is a two-dimensional labeled data-structure.
A dataFrame comprises of rows of columns.
Creating DataFrame
This is how we can create a DataFrame.
import pandas as pd
df=pd.DataFrame({"name":['John','Bob','Anne'],"Marks":[75,74,70]})
df
DataFrame In-Built Functions
head()
Shape()
describe()
tail()
import pandas as pd
iris=pd.read_csv('iris.csv')
iris.head()
iris.tail()
iris.describe()
iris.tail()
iris.iloc[0:3,0:2]
iris.loc[0:3,("sepal_length","petal_length")]
Dropping columns and rows
iris.drop('sepal_length',axis=1)
iris.drop([1,2,3],axis=0)
Pandas Functions
More Pandas Functions
mean()
iris.mean()
sepal_length 5.843333
sepal_width 3.054000
petal_length 3.758667
petal_width 1.198667
dtype: float64
min()
iris.min()
sepal_length 4.3
sepal_width 2
petal_length 1
petal_width 0.1
species setosa
dtype: object
median()
iris.median()
sepal_length 5.80
sepal_width 3.00
petal_length 4.35
petal_width 1.30
dtype: float64
max()
iris.max()
sepal_length 7.9
sepal_width 4.4
petal_length 6.9
petal_width 2.5
species virginica
dtype: object
iris['species'].value_counts()
setosa 50
versicolor 50
virginica 50
Name: species, dtype: int64
Python Matplotlib
we can create bar-plots, scatter-plots, histograms, and a lot more with matplotlib.
Matplotlib is a plotting library used for 2D graphics in the python programming language.
Line plot
import numpy as np
from matplotlib import pyplot as plt
x=np.arange(5,50,5)
x
array([ 5,10,15,20,25,30,35,40,45])
y=2*x
y
array([10, 20, 30, 40, 50, 60, 70, 80, 90])
plt.plot(x,y)
plt.show()
Adding Title and labels
plt.plot(x,y)
plt.title("Line plot")
plt.xlabel("x-label")
plt.ylabel("y-label")
plt.show()
Changing Line Aesthetics
plt.plot(x,y,color='r',linestyle=':',linewidth=2)
plt.show()
Adding two lines in the same plot
x=np.arange(1,11)
y1=2*x
y2=3*x
plt.plot(x,y1,color='g',linestyle=':',linewidth=2)
plt.plot(x,y2,color='r',linestyle='-',linewidth=2)
plt.title("Line plot")
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.grid(True)
plt.show()
Adding sub-plots
import numpy as np
from matplotlib import pyplot as plt
x=np.arange(1,10)
x
y=2*x
y
plt.subplot(2,1,1)
plt.plot(x,y1,color='purple',linestyle=':')
plt.subplot(2,1,2)
plt.plot(x,y2,color='black',linestyle='-')
plt.show()
Bar plot
from matplotlib import pyplot as plt
import numpy as np
student={'sam':79,'john':35,'Bob':60,'smith':54,'virat':69}
names=list(student.keys())
values=list(student.values())
plt.bar(names, values)
plt.title('student')
plt.xlabel('Names')
plt.ylabel('Marks')
plt.show()
Horizontal Bar plot
plt.barh(names,values,color='purple')
plt.title('student')
plt.xlabel('Names')
plt.ylabel('Marks')
plt.show()
Scatter plot
x=[5,10,15,20,25,30,35,40,45,]
y=[5,4,1,2,9,8,6,3,7]
plt.scatter(x,y)
plt.show()
Changing mark aesthetics
plt.scatter(x,y,marker='*',color='purple')
plt.show()
Adding two markers in the same plot
x=[5,10,15,20,25,30,35,40,45,]
y1=[5,4,1,2,9,8,6,3,7]
y2=[9,7,5,3,1,2,4,6,8]
plt.scatter(x,y1,marker='*',color='purple',s=200)
plt.scatter(x,y2,marker='.',color='black',s=300)
plt.show()
Adding sub plots
plt.subplot(2,1,1)
plt.scatter(x,y1,marker='*',color='purple',s=200)
plt.subplot(2,1,2)
plt.scatter(x,y2,marker='.',color='black',s=300)
plt.show()
Histogram
creating data
data=[1,2,3,4,3,3,5,5,3,9,9,5,5,5,5,8,8,8,6,6,6,]
Making histogram
plt.hist(data)
plt.show()
Working with a datasset
iris=pd.read_csv('iris.csv')
iris
plt.hist(iris['petal_length'],bins=40,color="purple")
plt.show()
Box-plot and Violin plot
Creating data
one=[1,2,3,4,5,6,7,8,9]
two=[1,2,3,4,5,4,3,2,1]
three=[6,7,8,9,8,7,6,5,4]
data=list([one,two,three])
plt.boxplot(data)
plt.show()
Violin plot
creating data
one=[1,2,3,4,5,6,7,8,9]
two=[1,2,3,4,5,4,3,2,1]
three=[6,7,8,9,8,7,6,5,4]
data=list([one,two,three])
plt.violinplot(data)
plt.show()
Piechart and Doughnut chart
pie chart
Creating data
fruit=['Apple','Orange','Mango','guava']
Quantity=[25,50,75,100]
plt.pie(Quantity,labels=fruit)
plt.show()
doughnut chart
creating data
fruit =['Apple','orange','mango','guava']
quantity=[50,25,75,100]
Making plot
plt.pie(quantity,labels=fruit,radius=2)
plt.pie([1],colors=['w'],radius=1)
plt.show()