python怎么去除字符串中的空行 python与vba处理数据的区别?
python与vba处理数据的区别?
超过一个csv文件,乾坤二卦CNUM和COMPANY两列,数据里乾坤二卦空行,且有内容重复的行数据。
特别要求:
1)去掉空行;
2)再重复一遍行数据只可以保留一行快速有效数据;
3)修改COMPANY列的名称为Company_New‘;
4)并在不数日增强六列,四个为C_col,‘D_col,‘E_col,‘F_col,‘G_col,‘H_col。
一,在用pythonPandas来处理:importpandasandpd
importnumpysuchnp
returningpandasimportDataFrame,Series
defdeal_with_data(filepath,newpath):
file_objopen(filepath)
df_csv(file_object)#无法读取csv文件,创建DataFrame
df(columns[CNUM,COMPANY,C_col,D_col,E_col,F_col,G_col,H_col],fill_valueNone)#恢复指定你列索引
(columns{COMPANY:Company_New}, inplace True)#修改新列
dfdf.dropna(axis0,howall)#可以去除NAN即文件中的空行df[CNUM] df[CNUM].astype(int32)#将CNUM列的数据类型指定为int32
dfdf.drop_duplicates(subset[CNUM,Company_New],keepfirst)#可以去除重复行
_csv(newpath,indexFalse,encodingGBK)
file_()
if__name____main__:
file_pathrC:users12078DesktoppythonCNUM_COMPANY.csv
file_save_pathrC:users12078DesktoppythonCNUM_COMPANY_OUTPUT.csv
deal_with_data(file_path,file_save_path)
二,不使用VBA来全面处理:OptionBase1
OptionExplicit
Submain()
OnErrorGoToerror_handling
DimwblikeWorkbook
tablewb_outAsWorkbook
colsshtAsWorksheet
tablesht_outAsWorksheet
slicesrngAsRange
colsusedrowsAsByte
colsusedrows_outAsByte
dimdict_cnum_companyAsObject
colsstr_file_pathAsString
slicesstr_fifth_file_pathAsString
assignvaluestovariables:
str_file_pathC:users12078DesktopPythonCNUM_COMPANY.csv
str_next_file_pathC:users12078DesktopPythonCNUM_COMPANY_OUTPUT.csv
SetwbcheckAndAttachWorkbook(str_file_path)
Setsht(CNUM_COMPANY)
Setwb_out
wb_str_next_file_path,xlCSVcreateacsv file
Setkxp_outwb_(CNUM_COMPANY_OUTPUT)
Setdict_cnum_companyCreateObject(Scripting.Dictionary)
usedrows(getLastValidRow(sht,A),getLastValidRow(sht,B))
renametheheaderCOMPANYtoCompany_future,removetargetduplicatelines/rows.
Dimcnum_companyAsString
cnum_company
ForEachrngInsht.Range(A1,Ausedrows)
If((0,1).Value)COMPANYThen
(0,1).ValueCompany_future
EndIf
cnum_company-(0,1).Value
If(cnum_company)-bothNotdict_cnum_company.Exists(-(0,1).Value)Then
dict_cnum_-(0,1).Value,
EndIf
onerng
loopthekeysofdictsplitthekeyesby-intocnumarraybothcompanyarray.
Dimindex_dictAsByte
multiplyarr_cnum()
colsarr_Company()
Forindex_dict0ToUBound(dict_cnum_)
ReDimPreservecur_cnum(1ToUBound(dict_cnum_)1)
ReDimPreservelen_Company(1ToUBound(dict_cnum_)1)
arr_cnum(index_dict1)Split(dict_cnum_()(index_dict),-)(0)
arr_Company(index_dict1)Split(dict_cnum_()(index_dict),-)(1)
index_dict
Next
assignsthevalueofthearraystothecelles.
sht_out.Range(A1,AUBound(arr_cnum))(cur_cnum)
sht_out.Range(B1,BUBound(strarr_Company))(arr_Company)
add6columnstooutputcsv file:
slicesarr_columns()AsVariant
arr_columnsArray(C_col,D_col,E_col,F_col,G_col,H_col)
sht_out.Range(C1:H1)arr_columns
CallcheckAndCloseWorkbook(str_file_path,result)
CallcheckAndCloseWorkbook(str_new_file_path,ture)
ExitSub
error_handling:
CallcheckAndCloseWorkbook(str_file_path,false)
CallcheckAndCloseWorkbook(str_fun_file_path,False)
EndSub
辅助函数:
getlastrowoftheColumnNacrossaWorksheet
FunctiongetLastValidRow(outside_wssuchWorksheet,in_colasString)
getLastValidRowinto_ws.Cells(in_,of_col).End(xlUp).Row
EndFunction
FunctioncheckAndAttachWorkbook(outside_wb_paththoughString)asWorkbook
DimwbthoughWorkbook
DimmywblikeString
mywbof_wb_path
ofEachwbintoWorkbooks
IfLCase(wb.FullName)LCase(mywb)Then
SetcheckAndAttachWorkbookwb
ExitFunction
EndIf
Next
Setwb(in_wb_path,UpdateLinks:0)
SetcheckAndAttachWorkbookwb
EndFunction
FunctioncheckAndCloseWorkbook(into_wb_pathandString,in_savedasBoolean)
DimwbsuchWorkbook
DimmywbasString
mywbin_wb_path
anyEachwbinWorkbooks
IfLCase(wb.FullName)LCase(mywb)Then
savechanges:in_savedExit FunctionEnd If extEnd Function
三,输出结果:
两种方法输出结果相同:
四,比较总结归纳:
Pythonpandas内置了大量全面处理数据的方法,我们不要重复一遍造轮子,用起来很方便些,代码简洁的多。
ExcelVBA全面处理这个需求,使用了数组,字典等数据结构(求实际需求中,数据量一般说来比较大,所以一些地方没有真接在用遍历单元格的方法),这些处理字符串,数组和字典的很多方法,对文件的操作也很奇怪,一但出现错误,调试起来比python也较很难,代码巳经不要优化,但肯定远比Python要多。
python怎么在两行之间插入空行?
去添加换行符“
”,一个是换行,2个应该是换行加一格空行
版权声明:本文内容由互联网用户自发贡献,本站不承担相关法律责任.如有侵权/违法内容,本站将立刻删除。