Python Pandas and Matplotlip

Python Pandas

Python Pandas

Pandas is a package used for managing data.
Pandas main use is that it created two new data types storing data: Series and DataFrame.
Pandas DataFrame likes an excel spreadsheet that is storing some data.
A dataFrame is made up of several Series. Each column of a dataFrame is a Series. We can name each column and row of a DataFrame.
A pandas dataFrame is very similar to a Dataframe in R. 
Similar to Numpy arrays, a DataFrame is a more robust data type for storing data than lists of lists. DataFrames are more flexible than Numpy arrays.
A Numpy array can create a matrix with all entries of the same data type. In a DataFrame, each column can have its own data type. 
Pandas stand for Panel Data and are the core library for data manipulation and data analysis.
It consists of single-dimensional and multidimensional data structures for data manipulation.

Pandas Data Structures 

1.Single-dimensional
2.Series object
3.Multi-dimensional
4.Data-frame

Pandas Series Object

Series Object is a one-dimensional labeled array.
import numpy as np
import pandas as pd
s1=pd.Series([10,20,30,40,50])
s1
0    10
1    20
2    30
3    40
4    50
dtype: int64
type(s1)
pandas.core.series.Series

Changing Index
import pandas as pd
s1=pd.Series([10,20,30,40,50],index=['a','b','c','d','e'])
s1
a    10
b    20
c    30
d    40
e    50
dtype: int64

Extracting Individual Elements

Extracting a single element
import pandas as pd
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1[5]
60
Extracting elements from back
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1[-5:]
4    50
5    60
6    70
7    80
8    90
dtype: int64
Extracting a sequence of elements
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1[:5]
0    10
1    20
2    30
3    40
4    50
dtype: int64

Basic Math Operations on Series

Adding a scalar value to Series elements
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1+1
0    11
1    21
2    31
3    41
4    51
5    61
6    71
7    81
8    91
dtype: int64
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1*2
0     20
1     40
2     60
3     80
4    100
5    120
6    140
7    160
8    180
dtype: int64
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s1-2
0     8
1    18
2    28
3    38
4    48
5    58
6    68
7    78
8    88
dtype: int64
Adding two Series objects
s1=pd.Series([10,20,30,40,50,60,70,80,90])
s2=pd.Series([2,4,6,8,10,12,14,16,18])
s1+s2
0     12
1     24
2     36
3     48
4     60
5     72
6     84
7     96
8    108
dtype: int64

Pandas DataFrame

DataFrame is a two-dimensional labeled data-structure.
A dataFrame comprises of rows of columns.
Creating DataFrame 
This is how we can create a DataFrame.
import pandas as pd
df=pd.DataFrame({"name":['John','Bob','Anne'],"Marks":[75,74,70]})
df
DataFrame built in functions
DataFrame In-Built Functions
head()
Shape()
describe()
tail()
import pandas as pd
iris=pd.read_csv('iris.csv')
iris.head()
DataFrame built in functions
iris.tail()
DataFrame built in functions
iris.describe()
DataFrame built in functions
iris.tail()
DataFrame built in functions
iris.iloc[0:3,0:2]
Pandas
iris.loc[0:3,("sepal_length","petal_length")]
Pandas

Dropping columns and rows
iris.drop('sepal_length',axis=1)
iris.drop([1,2,3],axis=0)

Pandas Functions

More Pandas Functions
mean()
iris.mean()
sepal_length    5.843333
sepal_width     3.054000
petal_length    3.758667
petal_width     1.198667
dtype: float64
min()
iris.min()
sepal_length       4.3
sepal_width          2
petal_length         1
petal_width        0.1
species         setosa
dtype: object
median()
iris.median()
sepal_length    5.80
sepal_width     3.00
petal_length    4.35
petal_width     1.30
dtype: float64
max()
iris.max()
sepal_length          7.9
sepal_width           4.4
petal_length          6.9
petal_width           2.5
species         virginica
dtype: object
iris['species'].value_counts()
setosa        50
versicolor    50
virginica     50
Name: species, dtype: int64

Python Matplotlib

Python Matplotlib

we can create bar-plots, scatter-plots, histograms, and a lot more with matplotlib.
Matplotlib is a plotting library used for 2D graphics in the python programming language.

Line plot

import numpy as np
from matplotlib import pyplot as plt
x=np.arange(5,50,5)
x
array([ 5,10,15,20,25,30,35,40,45])
y=2*x
y
array([10, 20, 30, 40, 50, 60, 70, 80, 90])
plt.plot(x,y)
plt.show()
Matplotlib
Adding Title and labels
plt.plot(x,y)
plt.title("Line plot")
plt.xlabel("x-label")
plt.ylabel("y-label")
plt.show()
Matplotlib
Changing Line Aesthetics
plt.plot(x,y,color='r',linestyle=':',linewidth=2)
plt.show()
Matplotlib
Adding two lines in the same plot
x=np.arange(1,11)
y1=2*x
y2=3*x
plt.plot(x,y1,color='g',linestyle=':',linewidth=2)
plt.plot(x,y2,color='r',linestyle='-',linewidth=2)
plt.title("Line plot")
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.grid(True)
plt.show()
Matplotlib
Adding sub-plots
import numpy as np
from matplotlib import pyplot as plt
x=np.arange(1,10)
x
y=2*x
y
plt.subplot(2,1,1)
plt.plot(x,y1,color='purple',linestyle=':')
plt.subplot(2,1,2)
plt.plot(x,y2,color='black',linestyle='-')
plt.show()
Matplotlib

Bar plot

from matplotlib import pyplot as plt
import numpy as np
student={'sam':79,'john':35,'Bob':60,'smith':54,'virat':69}
names=list(student.keys())
values=list(student.values())
plt.bar(names, values)
plt.title('student')
plt.xlabel('Names')
plt.ylabel('Marks')
plt.show()
Matplotlib
Horizontal Bar plot
plt.barh(names,values,color='purple')
plt.title('student')
plt.xlabel('Names')
plt.ylabel('Marks')
plt.show()
Matplotlib

Scatter plot

x=[5,10,15,20,25,30,35,40,45,]
y=[5,4,1,2,9,8,6,3,7]
plt.scatter(x,y)
plt.show()
Matplotlib
Changing mark aesthetics
plt.scatter(x,y,marker='*',color='purple')
plt.show()
Matplotlib
Adding two markers in the same plot
x=[5,10,15,20,25,30,35,40,45,]
y1=[5,4,1,2,9,8,6,3,7]
y2=[9,7,5,3,1,2,4,6,8]
plt.scatter(x,y1,marker='*',color='purple',s=200)
plt.scatter(x,y2,marker='.',color='black',s=300)
plt.show()
Matplotlib
Adding sub plots
plt.subplot(2,1,1)
plt.scatter(x,y1,marker='*',color='purple',s=200)
plt.subplot(2,1,2)
plt.scatter(x,y2,marker='.',color='black',s=300)
plt.show()
Matplotlib

Histogram

creating data
data=[1,2,3,4,3,3,5,5,3,9,9,5,5,5,5,8,8,8,6,6,6,]
Making histogram
plt.hist(data)
plt.show()
Histogram
Working with a datasset
iris=pd.read_csv('iris.csv')
iris
plt.hist(iris['petal_length'],bins=40,color="purple")
plt.show()
Histogram

Box-plot and Violin plot

Creating data
one=[1,2,3,4,5,6,7,8,9]
two=[1,2,3,4,5,4,3,2,1]
three=[6,7,8,9,8,7,6,5,4]
data=list([one,two,three])
plt.boxplot(data)
plt.show()
Box plot
Violin plot
creating data
one=[1,2,3,4,5,6,7,8,9]
two=[1,2,3,4,5,4,3,2,1]
three=[6,7,8,9,8,7,6,5,4]
data=list([one,two,three])
plt.violinplot(data)
plt.show()
Violin plot

Piechart and Doughnut chart

pie chart
Creating data
fruit=['Apple','Orange','Mango','guava']
Quantity=[25,50,75,100]
plt.pie(Quantity,labels=fruit)
plt.show()
Piechart
doughnut chart
creating data
fruit =['Apple','orange','mango','guava']
quantity=[50,25,75,100]
Making plot
plt.pie(quantity,labels=fruit,radius=2)
plt.pie([1],colors=['w'],radius=1)
plt.show()
doughnut chart