1607人加入学习
(21人评价)
Python数据分析 - Pandas玩转Excel

Python数据分析轻松学

价格 $99美元

在执行books['Date'].at[i] = start + timedelta(days=i)时会报错

unsupported type for timedelta days component: numpy.int64

最终修改为books['Date'].at[i] = start + timedelta(days=int(i))可以正常运行

[展开全文]

# 蔓藤教育,pandas操作excel的行和列等

import pandas as pd

# 创建一个序列,在pandas中,数据帧DataFram和序列Series是最基本的数据结构

# pandas中的序列的第一种方法,由python中的字典改变而来

d = {'x': 100, 'y': 200, 'z': 300}

print(d)

s1 = pd.Series(d)

 

# 创建序列的第二种方法

L1 = [100, 200, 300]

L2 = ['x', 'y', 'z']

s2 = pd.Series(L1, index=L2)

#s2.to_excel('output3.xlsx', header=None)

# 序列在excel中可能是行,也可能是列,根据传递进DataFrame中序列的方法不同,在excel中呈现行或列不同

 

s3 = pd.Series([1, 2, 3], index=[1, 2, 3], name='A')

s4 = pd.Series([10, 20, 30], index=[1, 2, 3], name='B')

s5 = pd.Series([100, 200, 300], index=[1, 2, 3], name='C')

 

# 当以字典形式将序列传递进数据帧时,在excel中以列呈现

 

# 创建ExcelWriter对象实现向excel中不同sheet中追加写入内容,否则会被覆盖

writer = pd.ExcelWriter('output.xlsx')

 

df = pd.DataFrame({s3.name: s3, s4.name: s4, s5.name: s5})

#df.to_excel('output3.xlsx', sheet_name='sheet1')

df.to_excel(writer, sheet_name='字典传入为列')

 

# 当以列表List形式传入时,以行的形式存在

df1 = pd.DataFrame([s3, s4, s5])

#df1.to_excel('output3.xlsx', sheet_name='sheet2')

df1.to_excel(writer, sheet_name='列表传入为行')

 

writer.save()

 

[展开全文]

笔记,不错。

不能直接贴屏?!

图片要用文件上传模式,一般。

 

[展开全文]
itbj00 · 2019-03-02 · 读取文件 0

一、涉及知识点

1、pandas读取已经存在的Excel文件

2、涉及的BIF操作:

df=pd.read_excel('F:/sdf/sdf/output.xlsx',header=None)

pd.to_excel('F:/sdf/sdf/output1.xlsx')

df.shape

df.columns

df.header(5)

df.tail(5)

二、问题点

1、读取xls文件异常(xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'Test Tim'

原因:源文件损坏,或者格式有问题(虽然是以.xls结尾,但实际上内容格式有问题的)

[展开全文]

一、涉及知识点

1、Excel和pandas创建excel格式的数据源

2、数据格式有很多,常见的如下:CSV文件、数据库表格、文件格式(Excel、PDF、Word等)、HTML文件、JSON文件、文本文件、XML文件

 

二、遇到的问题及原因

1、pandas库找不到(AttributeError: module 'pandas' has no attribute 'DataFrame'

原因:1)未安装pandas库

          2)pycharm中解释器配置错误(项目多时需要配置虚拟环境,每个虚拟环境中挂载的库都不一样,pycharm的setting中会有默认的配置,需检查是否错误)

 

[展开全文]

import pandas as pd

people=pd.dread_excle('C:/Temp/people.xlsx')

print(people.shape)

print(people.columns)

print('==================')

print(people.tail(3))

[展开全文]
kuokung · 2019-01-18 · 读取文件 0

import pandas as pd

df=pd.DataFrame[{'ID':[1,2,3],'Name':['Tim','Victor','N}]

df.to_excel('C:/Temp/output.xlsx')

print('Done!')

[展开全文]
kuokung · 2019-01-18 · 创建文件 0

饿,TIM老师的这个方程其实是可以化简的,也不是很复杂

import pandas as pd
import numpy as np

# def get_circumcircle_area(l,h):
#     r = np.sqrt(l**2+h**2)/2
#     return r**2*np.pi

# def get_circumcircle_area(l,h):
#     return np.pi*(l**2+h**2)/4
#
# def wrapper(row):
#      return get_circumcircle_area(row['Length'],row['Height'])

rects = pd.read_excel('D:/Temp/Rectangles.xlsx',index_col='ID')
rects['CA'] = rects.apply(lambda row:np.pi*(row['Length']**2+row['Height']**2)/4,axis=1)

print(rects)

 

这样的话一行就可以了

[展开全文]

https://www.jianshu.com/p/2d49cb87626b

对于group详细的解析

[展开全文]

字符串前加f是用于格式化字符串的,

print(f'#{row.ID}student{row.Name}has an invalid score{row.Score}.')

就相当于

print('# %d student{row.Name}has an invalid score %s.' % (row.ID,row.Score))

 

具体上网搜一下python 字符串格式化的三种方法就好了

[展开全文]
students['2017'].sort_values(ascending=True).plot.pie(fontsize=6,startangle=-270)

可以用在乱序的情况

[展开全文]
老师资料里matplotlib里面的代码应该用的不是同一个文件了,要改一下

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

students = pd.read_excel('D:/Temp/Students10.xlsx')
students.sort_values(by='2017', inplace=True, ascending=False)
students.index = range(0, len(students))
print(students)

bar_width = 0.7
x_pos = np.arange(len(students) * 2, step=2)
plt.bar(x_pos, students['2016'], color='orange', width=bar_width)
plt.bar(x_pos + bar_width, students['2017'], color='red', width=bar_width)

plt.xticks(x_pos + bar_width / 2, students['Field'], rotation='90')
plt.title('International Student by Field', fontsize=16)
plt.xlabel('Field')
plt.ylabel('Number')
plt.tight_layout()
plt.show()
[展开全文]

matplotlib感觉和matlab很像

[展开全文]

def age_18_to_30(a):

      return  18<=a<=30

def  level_a(s):
     return  85<=s<=100

students=pd.read_excel('C:/temp/students.xlsx',index_col='ID')

students.loc[students['Age'].apply(age_18_to_30)].loc[students['Score'].apply(level_a)]

[展开全文]

任务13:绘制折线趋势图、叠加区域图

!!!!本课下载下来的excel附件名字是Order.xlsx. 编辑代码时注意修改!!!

本课代码:

import pandas as pd
import matplotlib.pyplot as plt

weeks = pd.read_excel('C:/Temp/Orders.xlsx', index_col='Week')
print(weeks)
print(weeks.columns)

#weeks.plot.area(y=['Accessories', 'Bikes', 'Clothing', 'Components', 'Grand Total'])
weeks.plot.bar(y=['Accessories', 'Bikes', 'Clothing', 'Components', 'Grand Total'], stacked=True)
plt.title('Sales Weekly Trend', fontsize=16, fontweight='bold')
plt.ylabel('Total', fontsize=12, fontweight='bold')
plt.xticks(weeks.index, fontsize=8)
plt.show()

打印结果:

       Accessories         Bikes      ...         Components   Grand Total
Week                                  ...                                 
1      9939.465500  2.258337e+06      ...       7.872110e+04  2.356639e+06
2     12626.660000  6.005350e+05      ...       0.000000e+00  6.204234e+05
3     14414.950000  5.547708e+05      ...       0.000000e+00  5.759616e+05
4     12924.580000  5.892557e+05      ...       0.000000e+00  6.083717e+05
5     40443.498516  5.749222e+06      ...       4.709014e+05  6.360041e+06
6     13735.460000  5.539423e+05      ...       0.000000e+00  5.753385e+05
7     13588.800000  6.053847e+05      ...       0.000000e+00  6.247596e+05
8     13997.810000  5.320056e+05      ...       0.000000e+00  5.526938e+05
9     52392.263204  4.701389e+06      ...       6.852023e+05  5.567191e+06
10    14276.640000  5.815496e+05      ...       0.000000e+00  6.037474e+05
11    13584.320000  6.169319e+05      ...       0.000000e+00  6.372021e+05
12    14128.770000  5.985606e+05      ...       0.000000e+00  6.204505e+05
13    34372.148628  5.154563e+06      ...       6.173474e+05  5.899177e+06
14    58097.712659  3.361458e+06      ...       5.296900e+05  4.041540e+06
15    16287.020000  6.655744e+05      ...       0.000000e+00  6.894546e+05
16    15990.960000  6.749113e+05      ...       0.000000e+00  6.991723e+05
17    15120.710000  6.581411e+05      ...       0.000000e+00  6.795171e+05
18    67753.596201  5.739679e+06      ...       1.091392e+06  7.059584e+06
19    15416.040000  7.537814e+05      ...       0.000000e+00  7.770648e+05
20    16113.010000  7.323520e+05      ...       0.000000e+00  7.571492e+05
21    15903.150000  7.380932e+05      ...       0.000000e+00  7.620668e+05
22    57090.139277  4.388471e+06      ...       1.137457e+06  5.746683e+06
23    11900.146000  8.310402e+05      ...       3.152596e+04  8.824418e+05
24    11336.820000  4.095025e+05      ...       0.000000e+00  4.251703e+05
25    10573.210000  4.065446e+05      ...       0.000000e+00  4.232483e+05
26    29376.532664  3.101888e+06      ...       7.994845e+05  4.039316e+06
27    72003.211677  4.932875e+06      ...       1.043760e+06  6.191153e+06
28    11621.900000  4.086479e+05      ...       0.000000e+00  4.258483e+05
29    11640.460000  4.056193e+05      ...       0.000000e+00  4.225788e+05
30    12359.920000  4.156482e+05      ...       0.000000e+00  4.325679e+05
31    81276.364307  5.475372e+06      ...       1.593068e+06  7.363333e+06
32    15208.996500  1.494933e+06      ...       1.025128e+05  1.623856e+06
33    13187.100000  3.938290e+05      ...       0.000000e+00  4.135150e+05
34    13046.980000  4.383911e+05      ...       0.000000e+00  4.571066e+05
35    50187.449516  3.732927e+06      ...       5.842441e+05  4.495606e+06
36    15063.164000  1.173016e+06      ...       6.180437e+04  1.260751e+06
37    11505.920000  4.814773e+05      ...       0.000000e+00  4.985188e+05
38    13170.670000  4.883039e+05      ...       0.000000e+00  5.066807e+05
39    11278.600000  3.675468e+05      ...       0.000000e+00  3.845985e+05
40    70170.520320  7.878476e+06      ...       1.177747e+06  9.303972e+06
41    12441.460000  4.658615e+05      ...       0.000000e+00  4.834701e+05
42    12924.950000  4.715758e+05      ...       0.000000e+00  4.907546e+05
43    13314.310000  4.931541e+05      ...       0.000000e+00  5.127108e+05
44    62745.172581  5.156642e+06      ...       9.100149e+05  6.286165e+06
45    19630.003108  2.199453e+06      ...       1.478740e+05  2.381990e+06
46    14822.180000  7.092621e+05      ...       0.000000e+00  7.304781e+05
47    13728.300000  6.623922e+05      ...       0.000000e+00  6.830832e+05
48    36075.469500  3.240381e+06      ...       2.592590e+05  3.616344e+06
49    13642.418500  1.227307e+06      ...       2.369806e+04  1.273325e+06
50    12388.980000  5.217178e+05      ...       0.000000e+00  5.419306e+05
51    12852.400000  5.264518e+05      ...       0.000000e+00  5.455141e+05
52    13085.680000  5.412245e+05      ...       0.000000e+00  5.604452e+05
53    31315.891268  4.790804e+06      ...       4.568886e+05  5.375680e+06

[53 rows x 5 columns]
Index(['Accessories', 'Bikes', 'Clothing', 'Components', 'Grand Total'], dtype='object')

[展开全文]

 Series序列类似于dictionary

生成一个序列

d={'x':100,'y':200,'z':300}#字典/键值对

print(d.keys())#所有键

print(d.values())#所有值

print(d['x'])#具体一个值

s1=pd.Series(d)#字典转为序列

-----------------------------------

第二种创建序列方法

s1=pd.Series([100,200,300],index=['x','y','z'])

s1=pd.Series(data,index,name)#序列既不是行也不是列

s2=pd.Series(data,index,name)#序列只有加入DataFrame才能有行和列的区别,而这种区别是建立在加入时的方法不同上:两种加入法:dictonary 或者list

 

pd.DataFrame({s1.name:s1,s2.name:s2})#以dictonary加入当作列         

pd.DataFrame([s1,s2])   

pd.DataFrame(list)#以list加入当作行

index的作用:

对齐原则

有共同值则对齐,没有共同值的位置给一个NaN空

--------------------------------------

下节学习:使用Series进行快速填充

[展开全文]

授课教师

Tim老师

课程特色

视频(30)
下载资料(25)

学员动态

zoehuang 开始学习 Code for 002
zoehuang 开始学习 读取文件
zoehuang 开始学习 创建文件