# Author : Apolinario (Sam) Ortega - founder of invbat.com-A.I + chatbot <admin@invbat.com)
# Date created: 6/15/2020

# license : BSD 3 clause

# comment : Lesson 19 is my attempt to understand how pandas implement the input/output of data  i/o  api
# pd.read_csv() function to convert .csv data input to panda format output [this is a data reader function]
# pd.to_csv() function to convert pandas data input to .csv format output [ this is a data writer function]
# 
# pd.read_excel() function to convert .xlsx data input to panda format output [this is a data reader function]
# pd.to_excel() function to convert pandas data input to .xlsx format output [ this is a data writer function]
#
# pd.read_json() function to convert .json data input to panda format output [this is a data reader function]
# pd.to_json() function to convert pandas data input to .json format output [ this is a data writer function]
#
# pd.read_html() function to convert .html data input to panda format output [this is a data reader function]
# pd.to_html() function to convert pandas data input to .html format output [ this is a data writer function]
#
# for Hadoop file
# pd.read_hdf() function to convert .hdf data input to panda format output [this is a data reader function]
# pd.to_hdf() function to convert pandas data input to .hdf format output [ this is a data writer function]
#
# for R
# pd.read_feather() function to convert .feather data input to panda format output [this is a data reader function]
# pd.to_feather() function to convert pandas data input to .feather format output [ this is a data writer function]
#
# for Hadoop file
# pd.read_parquet() function to convert .parquet data input to panda format output [this is a data reader function]
# pd.to_parquet() function to convert pandas data input to .parquet format output [ this is a data writer function]
#
# for Stata software - general purpose statistical package for research, economics,political science
# pd.read_stata() function to convert .dta data input to panda format output [this is a data reader function]
# pd.to_stata() function to convert pandas data input to .dta format output [ this is a data writer function]
#
# for SAS data input
# pd.read_sas() function to convert .sd2 data input to panda format output [this is a data reader function]
#
# for IBM
# pd.read_spss() function to convert .spss data input to panda format output [this is a data reader function]
#
# for Python format
# pd.read_pickle() function to convert .pickle data input to panda format output [this is a data reader function]
# pd.to_pickle() function to convert pandas data input to .pickle format output [ this is a data writer function]
#
# for sql
# pd.read_sql() function to convert .sql data input to panda format output [this is a data reader function]
# pd.to_sql() function to convert pandas data input to .sql format output [ this is a data writer function]
#
# for Google Big Query file
# pd.read_gbq() function to convert .gbq data input to panda format output [this is a data reader function]
# pd.to_gbq() function to convert pandas data input to .gbq format output [ this is a data writer function]
#

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# comment # Do shift + enter

# comment : Show me how to use pd.read_csv() function

# pd.read_csv('/Users/invbat/projects/tips.csv')
List_All = pd.read_csv('/Users/invbat/projects/tips.csv')

# Give me the list of 10 rows  using .head(10) function, list of 20 rows , list of 60 rows
# Most of the business question is asking for give me the report of top 10 product or top 60 products
# pandas dataframe can list up to 60 rows.

List_All.head(10)

# comment # Do shift + enter

# comment : I want you to show me the 10 least saleable products. It means from your sorted list, report the last
# 10 record. Pandas has .tail(20) function that can do that data extraction.
# .tail function display clearly 60 records
List_All.tail(10)

# comment # Do shift + enter
# The dataframe column name or fieldname must be sorted first. See next code. sort the total_bill in descending order.

# comment : I want to sort by descending order the total bills and by sex. Show me how to do it

List_All.sort_values(by=['total_bill', 'sex'], ascending=False).head()

# comment # Do shift + enter
# comment : Show me only the top 5 total bills generated by male customer. Solution use .head() 5 records is the default
# comment : Now I want the list of top 10 total bills generated by male customer. Solution .head(10). See next code

# comment : I want to sort by descending order the total bills and by sex. Show me how to do it
# comment : Show me the top 20 total bills generated by male customer. Solution see code below
List_All.sort_values(by=['total_bill', 'sex'], ascending=False).head(20)

# comment # Do shift + enter

# comment : the code did not comply with the requirements top 20 male customer highest total bills. See next code

# comment : Show me the top 20 male customer generated and their top 20 total bills. Solution see code below
List_All.sort_values(by=[ 'sex','total_bill'], ascending=False).head(20)

# comment # Do shift + enter

# comment : the code now comply with the requirements specified above.

# I want you to rename the column name of the CSV as 'Check_Bill' , 'Tax' , 'Gender' ,'Smoker',
# 'Day', 'Meal_Time' ,'Table_Size' 
# Show me how to do it

pd.read_csv('/Users/invbat/projects/tips.csv', names=[ 'Check_Bill' , 'Tax' , 'Gender' ,'Smoker',
                                                    'Day', 'Meal_Time' ,'Table_Size' ], header=0)

# comment # Do shift + enter

# I want you to rename the column name of the CSV as 'Check_Bill' , 'Tax' , 'Gender' ,'Smoker',
# 'Day', 'Meal_Time' ,'Table_Size' 
# Show me how to do it . Next I want you to show me how to save this to tip_new.csv

tip_new = pd.read_csv('/Users/invbat/projects/tips.csv', names=[ 'Check_Bill' , 'Tax' , 'Gender' ,'Smoker',
                                                    'Day', 'Meal_Time' ,'Table_Size' ], header=0)
# make sure you add index=False because you will have Unnamed: Column when you display your new tip_new.csv
tip_new.to_csv('tip_new.csv',index=True) # the default index = true , even the word index = true is not specified.

# comment # Do shift + enter
# check in the project folder if the tip_new was stored. Yes, I verified it was stored - good job.
# Now use that new file  tip_new and show me the new column names. See next line of code

# comment: show me the tip_new table
tip_new = pd.read_csv('/Users/invbat/projects/tip_new.csv')
tip_new.head()

# comment # Do shift + enter
# comment # How to remove the Unnamed Column name? Answer : save again your tip_new but this time add index=false

# This is the solution code to remove the unnamed column

tip_new = pd.read_csv('/Users/invbat/projects/tips.csv', names=[ 'Check_Bill' , 'Tax' , 'Gender' ,'Smoker',
                                                    'Day', 'Meal_Time' ,'Table_Size' ], header=0)
# make sure you add index=False because you will have Unnamed: Column when you display your new tip_new.csv
tip_new.to_csv('tip_new2.csv', index=False)

# read the new table tip_new2
tip_new2 = pd.read_csv('/Users/invbat/projects/tip_new2.csv')

tip_new2.head()

# comment # Do shift + enter
# check in the project folder if the tip_new was stored. Yes, I verified it was stored - good job.
# Now use that new file  tip_new2 and show me the new column names. The unnamed column is now remove.

# comment : Show me how to use pd.read_excel() function

# pd.read_excel('/Users/invbat/projects/mpg.xlsx')
mpg = pd.read_excel('/Users/invbat/projects/mpg.xlsx')
mpg.head()

# comment # Do shift + enter

# show me the index of mpg table. see code below
mpg.index

# comment # Do shift + enter
# comment : the index number start with 0 and the last number is 398 which is also the total number of records
# comment : So I want to know quickly how many rows of data my table has, I can just use .index

RangeIndex(start=0, stop=398, step=1)

# show me the list of fieldname or columns names of mpg table. see code below
mpg.columns

# comment # Do shift + enter
# comment : list of columns names from left to right in sequential order.

Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'model_year', 'origin', 'name'],
      dtype='object')

# show me how to transpose your column names to row. see code below
mpg.T

# comment # Do shift + enter

# Show the descriptive statistical summary of your mpg table. see code below
mpg.describe()

# comment # Do shift + enter
# Using .describe() function you can use it to see which field name has missing data. By looking at the count summary
# the count of observation or record for horsepower is 392 it means 6 missing data.

# show me sorting the column fieldname in descending order. see solution below
mpg.sort_index(axis=1, ascending=False)

# comment # Do shift + enter

# show me sorting rows in descending order. see solution below
mpg.sort_index(axis=0, ascending=False)  # default is ascending order.

# comment # Do shift + enter

# show me sorting by column name or fieldname = name . see solution below
mpg.sort_values(by='name')  # ascending order is the default. 
sorted_name = mpg.sort_values(by='name')
sorted_name
# comment # Do shift + enter

# Show me the top 10 sorted name. see solution below
sorted_name.head(10)  # solution 1. To enable remove the number sign #. Then do shift + enter to run
# sorted_name[0:9]      # solution 2. To enable remove the number sign #.

# comment # Do shift + enter

# show me origin, name, and horsepower.  see solution below
name_country = mpg.loc[:, ['origin', 'name','horsepower']]
name_country.head(10)

# comment # Do shift + enter

# show me how to extract specific rows and display only mpg to horsepower column. Here use horizontal index 0 for mpg
# 1 for cyclinders, 2 for displacement, and 3 for horsepower add 1 because it starts at index 0.
mpg.iloc[1:3, 0:4]

# show me origin, name, and horsepower.From record 2 to 5.  see solution below
name_country = mpg.loc[2:5, ['origin', 'name','horsepower']]
name_country.head()

# comment # Do shift + enter

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4
5	25.29	4.71	Male	No	Sun	Dinner	4
6	8.77	2.00	Male	No	Sun	Dinner	2
7	26.88	3.12	Male	No	Sun	Dinner	4
8	15.04	1.96	Male	No	Sun	Dinner	2
9	14.78	3.23	Male	No	Sun	Dinner	2

	total_bill	tip	sex	smoker	day	time	size
234	15.53	3.00	Male	Yes	Sat	Dinner	2
235	10.07	1.25	Male	No	Sat	Dinner	2
236	12.60	1.00	Male	Yes	Sat	Dinner	2
237	32.83	1.17	Male	Yes	Sat	Dinner	2
238	35.83	4.67	Female	No	Sat	Dinner	3
239	29.03	5.92	Male	No	Sat	Dinner	3
240	27.18	2.00	Female	Yes	Sat	Dinner	2
241	22.67	2.00	Male	Yes	Sat	Dinner	2
242	17.82	1.75	Male	No	Sat	Dinner	2
243	18.78	3.00	Female	No	Thur	Dinner	2

	total_bill	tip	sex	smoker	day	time	size
170	50.81	10.00	Male	Yes	Sat	Dinner	3
212	48.33	9.00	Male	No	Sat	Dinner	4
59	48.27	6.73	Male	No	Sat	Dinner	4
156	48.17	5.00	Male	No	Sun	Dinner	6
182	45.35	3.50	Male	Yes	Sun	Dinner	3

	total_bill	tip	sex	smoker	day	time	size
170	50.81	10.00	Male	Yes	Sat	Dinner	3
212	48.33	9.00	Male	No	Sat	Dinner	4
59	48.27	6.73	Male	No	Sat	Dinner	4
156	48.17	5.00	Male	No	Sun	Dinner	6
182	45.35	3.50	Male	Yes	Sun	Dinner	3
102	44.30	2.50	Female	Yes	Sat	Dinner	3
197	43.11	5.00	Female	Yes	Thur	Lunch	4
142	41.19	5.00	Male	No	Thur	Lunch	5
184	40.55	3.00	Male	Yes	Sun	Dinner	2
95	40.17	4.73	Male	Yes	Fri	Dinner	4
23	39.42	7.58	Male	No	Sat	Dinner	4
207	38.73	3.00	Male	Yes	Sat	Dinner	4
112	38.07	4.00	Male	No	Sun	Dinner	3
56	38.01	3.00	Male	Yes	Sat	Dinner	4
238	35.83	4.67	Female	No	Sat	Dinner	3
11	35.26	5.00	Female	No	Sun	Dinner	4
85	34.83	5.17	Female	No	Thur	Lunch	4
52	34.81	5.20	Female	No	Sun	Dinner	4
180	34.65	3.68	Male	Yes	Sun	Dinner	4
179	34.63	3.55	Male	Yes	Sun	Dinner	2

	total_bill	tip	sex	smoker	day	time	size
170	50.81	10.00	Male	Yes	Sat	Dinner	3
212	48.33	9.00	Male	No	Sat	Dinner	4
59	48.27	6.73	Male	No	Sat	Dinner	4
156	48.17	5.00	Male	No	Sun	Dinner	6
182	45.35	3.50	Male	Yes	Sun	Dinner	3
142	41.19	5.00	Male	No	Thur	Lunch	5
184	40.55	3.00	Male	Yes	Sun	Dinner	2
95	40.17	4.73	Male	Yes	Fri	Dinner	4
23	39.42	7.58	Male	No	Sat	Dinner	4
207	38.73	3.00	Male	Yes	Sat	Dinner	4
112	38.07	4.00	Male	No	Sun	Dinner	3
56	38.01	3.00	Male	Yes	Sat	Dinner	4
180	34.65	3.68	Male	Yes	Sun	Dinner	4
179	34.63	3.55	Male	Yes	Sun	Dinner	2
141	34.30	6.70	Male	No	Thur	Lunch	6
175	32.90	3.11	Male	Yes	Sun	Dinner	2
237	32.83	1.17	Male	Yes	Sat	Dinner	2
83	32.68	5.00	Male	Yes	Thur	Lunch	2
47	32.40	6.00	Male	No	Sun	Dinner	4
173	31.85	3.18	Male	Yes	Sun	Dinner	2

ASK WHAT YOU NEED USING VOICE OR TEXT
search pattern instruction

Enter your access code above or ask what you need

INVBAT.COM -A.I.
The Personal Memory Assistant Company

BECAUSE MOST OF US FORGET

Click here to advertise

How to use Python Python numpy and sklearn libraries
get me the tutorial Python numpy and sklearn libraries
show me the tutorial Python numpy and sklearn libraries

INVBAT.COM -A.I.
The Personal Memory Assistant Company

BECAUSE MOST OF US FORGET

import pandas as pd and numpy as np, seaborn as sns
and matplotlib.pyplot as plt - Part 2

INVBAT.COM - A.I. a disruptive innovation in computing
and web search technology.

Tweet

Copyright 2021 INVBAT.COM - A.I. + Chatbot The Personal Memory Assistant Company

INVenting Brain Assistant Tools (INVBAT)

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
0	18.0	8	307.0	130.0	3504	12.0	70	usa	chevrolet chevelle malibu
1	15.0	8	350.0	165.0	3693	11.5	70	usa	buick skylark 320
2	18.0	8	318.0	150.0	3436	11.0	70	usa	plymouth satellite
3	16.0	8	304.0	150.0	3433	12.0	70	usa	amc rebel sst
4	17.0	8	302.0	140.0	3449	10.5	70	usa	ford torino

	0	1	2	3	4	5	6	7	8	9	...	388	389	390	391	392	393	394	395	396	397
mpg	18	15	18	16	17	15	14	14	14	15	...	26	22	32	36	27	27	44	32	28	31
cylinders	8	8	8	8	8	8	8	8	8	8	...	4	6	4	4	4	4	4	4	4	4
displacement	307	350	318	304	302	429	454	440	455	390	...	156	232	144	135	151	140	97	135	120	119
horsepower	130	165	150	150	140	198	220	215	225	190	...	92	112	96	84	90	86	52	84	79	82
weight	3504	3693	3436	3433	3449	4341	4354	4312	4425	3850	...	2585	2835	2665	2370	2950	2790	2130	2295	2625	2720
acceleration	12	11.5	11	12	10.5	10	9	8.5	10	8.5	...	14.5	14.7	13.9	13	17.3	15.6	24.6	11.6	18.6	19.4
model_year	70	70	70	70	70	70	70	70	70	70	...	82	82	82	82	82	82	82	82	82	82
origin	usa	usa	usa	usa	usa	usa	usa	usa	usa	usa	...	usa	usa	japan	usa	usa	usa	europe	usa	usa	usa
name	chevrolet chevelle malibu	buick skylark 320	plymouth satellite	amc rebel sst	ford torino	ford galaxie 500	chevrolet impala	plymouth fury iii	pontiac catalina	amc ambassador dpl	...	chrysler lebaron medallion	ford granada l	toyota celica gt	dodge charger 2.2	chevrolet camaro	ford mustang gl	vw pickup	dodge rampage	ford ranger	chevy s-10

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year
count	398.000000	398.000000	398.000000	392.000000	398.000000	398.000000	398.000000
mean	23.514573	5.454774	193.425879	104.469388	2970.424623	15.568090	76.010050
std	7.815984	1.701004	104.269838	38.491160	846.841774	2.757689	3.697627
min	9.000000	3.000000	68.000000	46.000000	1613.000000	8.000000	70.000000
25%	17.500000	4.000000	104.250000	75.000000	2223.750000	13.825000	73.000000
50%	23.000000	4.000000	148.500000	93.500000	2803.500000	15.500000	76.000000
75%	29.000000	8.000000	262.000000	126.000000	3608.000000	17.175000	79.000000
max	46.600000	8.000000	455.000000	230.000000	5140.000000	24.800000	82.000000

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
397	31.0	4	119.0	82.0	2720	19.4	82	usa	chevy s-10
396	28.0	4	120.0	79.0	2625	18.6	82	usa	ford ranger
395	32.0	4	135.0	84.0	2295	11.6	82	usa	dodge rampage
394	44.0	4	97.0	52.0	2130	24.6	82	europe	vw pickup
393	27.0	4	140.0	86.0	2790	15.6	82	usa	ford mustang gl
...	...	...	...	...	...	...	...	...	...
4	17.0	8	302.0	140.0	3449	10.5	70	usa	ford torino
3	16.0	8	304.0	150.0	3433	12.0	70	usa	amc rebel sst
2	18.0	8	318.0	150.0	3436	11.0	70	usa	plymouth satellite
1	15.0	8	350.0	165.0	3693	11.5	70	usa	buick skylark 320
0	18.0	8	307.0	130.0	3504	12.0	70	usa	chevrolet chevelle malibu

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
96	13.0	8	360.0	175.0	3821	11.0	73	usa	amc ambassador brougham
9	15.0	8	390.0	190.0	3850	8.5	70	usa	amc ambassador dpl
66	17.0	8	304.0	150.0	3672	11.5	72	usa	amc ambassador sst
315	24.3	4	151.0	90.0	3003	20.1	80	usa	amc concord
257	19.4	6	232.0	90.0	3210	17.2	78	usa	amc concord
...	...	...	...	...	...	...	...	...	...
394	44.0	4	97.0	52.0	2130	24.6	82	europe	vw pickup
309	41.5	4	98.0	76.0	2144	14.7	80	europe	vw rabbit
197	29.0	4	90.0	70.0	1937	14.2	76	europe	vw rabbit
325	44.3	4	90.0	48.0	2085	21.7	80	europe	vw rabbit c (diesel)
293	31.9	4	89.0	71.0	1925	14.0	79	europe	vw rabbit custom

ASK WHAT YOU NEED USING VOICE OR TEXT search pattern instruction

Enter your access code above or ask what you need

INVBAT.COM -A.I. The Personal Memory Assistant Company

BECAUSE MOST OF US FORGET

Click here to advertise

How to use Python Python numpy and sklearn libraries get me the tutorial Python numpy and sklearn libraries show me the tutorial Python numpy and sklearn libraries

INVBAT.COM -A.I. The Personal Memory Assistant Company

BECAUSE MOST OF US FORGET

import pandas as pd and numpy as np, seaborn as sns and matplotlib.pyplot as plt - Part 2

INVBAT.COM - A.I. a disruptive innovation in computing and web search technology.

Tweet

Copyright 2021 INVBAT.COM - A.I. + Chatbot The Personal Memory Assistant Company INVenting Brain Assistant Tools (INVBAT)

ASK WHAT YOU NEED USING VOICE OR TEXT
search pattern instruction

INVBAT.COM -A.I.
The Personal Memory Assistant Company

How to use Python Python numpy and sklearn libraries
get me the tutorial Python numpy and sklearn libraries
show me the tutorial Python numpy and sklearn libraries

INVBAT.COM -A.I.
The Personal Memory Assistant Company

import pandas as pd and numpy as np, seaborn as sns
and matplotlib.pyplot as plt - Part 2

INVBAT.COM - A.I. a disruptive innovation in computing
and web search technology.

Copyright 2021 INVBAT.COM - A.I. + Chatbot The Personal Memory Assistant Company

INVenting Brain Assistant Tools (INVBAT)