成功解决pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 - 小众知识

成功解决pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2

2013年01月27日 14:18:05 苏内容
  标签: pandas
阅读:10770

问题描述:

Traceback (most recent call last):
  File "C:/Users/Lenovo/Desktop/水泥数据/dataprocess1.py", line 8, in <module>
    data1 = pd.read_csv("doubledata.xlsx")
  File "D:\Users\Lenovo\miniconda3\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f    return _read(filepath_or_buffer, kwds)
  File "D:\Users\Lenovo\miniconda3\lib\site-packages\pandas\io\parsers.py", line 454, in _read
    data = parser.read(nrows)
  File "D:\Users\Lenovo\miniconda3\lib\site-packages\pandas\io\parsers.py", line 1133, in read
    ret = self._engine.read(nrows)
  File "D:\Users\Lenovo\miniconda3\lib\site-packages\pandas\io\parsers.py", line 2037, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
  File "pandas\_libs\parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas\_libs\parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas\_libs\parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 

报错在:

pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 21

源码位置:

data1 = pd.read_csv("doubledata.xlsx")1

问题是:数据格式出错

解决方案:

  1. 数据后缀转为.csv格式,只需打开转存的时候,改一下后缀即可。

  2. 添加参数:

data1 = pd.read_csv("doubledata.xlsx",error_bad_lines=False)




看这个报错信息意思应该是:它是按照第0行的列长度来读取,然后后面某些行的列长度超出了就会出现报错,如上报错截图,需要210列,但是line4有281列,超出了。然后我百度后解决方法基本都是在read_csv()中添加error_bad_lines=False,但是这个只是把超出210列的行给删了。

但是我想保留所有数据,然后又进行长时间的查询,最后拼凑后形成如下解决方式:

  1. data=[]
  2. with open('false8.csv', 'r',encoding='utf-8-sig') as f_input:
  3. for line in f_input:
  4. data.append(list(line.strip().split(',')))
  5. data
  1. dataset=pd.DataFrame(data)
  2. dataset

先将csv按行读取,按“,”分割形成2维数组,然后再转成dataframe型,结果如下:


扩展阅读
© CopyRight 2010-2021, PREDREAM.ORG, Inc.All Rights Reserved. 京ICP备13045924号-1