Amazon 50 Best Sellers — Data Analysis

Amazon 50 Best Seller Data Analysis using Plotly, Seaborn and Matplotlib.

download dataset from kaggle:

import all libraries

import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
import plotly.offline as pyo
import plotly.graph_objs as go
import plotly.express as px
%matplotlib inline

download data from kaggle, if you want to practice with it .

now we are reading the csv file , through panda.

df=pd.read_csv("amazon-top-50-bestselling-books-2009-2019/bestsellers.csv")#show only 10 rows df.head(10)# show info about data df.info()# next #Finding the distribution of Fiction vs Non Fiction books over years
sns.lineplot(x='Year',y='Reviews',hue='Genre',data=df)

Output

<AxesSubplot:xlabel='Year', ylabel='Reviews'>

The Above graph inidicates that Fiction genre had more reviews compared to Non Fiction. This gives us an idea that Fiction is more famous than non fiction.

#Prices of books across years for Fiction vs Non Fiction
g=sns.FacetGrid(data=df,col='Genre')
g.map(sns.lineplot,'Year','Price')

Out:

<seaborn.axisgrid.FacetGrid at 0x7f7bb8d8fa10>

Average price of books.

#Average Price of books
df.groupby('Genre').agg({"Price":"mean"})
# compare the user rating for Genre
g=sns.FacetGrid(data=df,col='Genre')
g.map(sns.histplot,'User Rating',kde=False,bins=50)

OutPut:

<seaborn.axisgrid.FacetGrid at 0x7f7bb8cdb9d0>

there are no rating below 3 for Non Fiction.

rating=df[df['User Rating']>4]
print("Number of Books with Rating 4 + is",rating['Name'].count())
Number of Books with Rating 4 + is 529# get the author with their average rating for the books published over the years
avg_rating=df.groupby(['Genre','Author']).agg({"User Rating":"mean","Reviews":"sum","Price":"mean"}).sort_values(["User Rating","Reviews"]).reset_index()
# taking the mean of user rating over the year, we'll find the count of authors by their user rating
avg_rating['User Rating']=avg_rating['User Rating'].round(1)
sns.histplot(avg_rating["User Rating"],kde=False)

Output:

<AxesSubplot:xlabel='User Rating', ylabel='Count'>

The above Histogram suggests that the number of Authors with an average rating of 4.8 is the highest

avg_rating["Rank"]=avg_rating.groupby('Genre')["User Rating"].rank("dense",ascending=False)
#top Authors with the highest rating
rank1=avg_rating[avg_rating.Rank==1]
fig=px.bar(rank1,x="Author",y="User Rating",color="Genre",barmode="group",title="Authors with Highest Rating")
fig.show()

OKKK , Now it’s your turn to play with data set .

See More tutorial.

data science & ml learner from heart & entrepreneur and designer