想从NCBI中批量获取一些基因组信息,运行Python后提示“AttributeError: 'NoneType' object has no attribute 'group' 怎么修改啊?
问题描述:
我目前有许多基因ID,想从NCBI
(https://www.ncbi.nlm.nih.gov/assem
中批量搜索这些基因的相关信息,形成文件。
获取信息部分内容如下
基因ID:GCF_002583515.1,GCF_002560475.1,GCF_002566665.1,GCF_002552335.1,GCF_000293525.1,GCF_002567745.1
从网上看了一些别人编的代码
我运行后提示:
哪里有问题啊! 毕不了业了! 望有人回复!
# -*- coding: utf-8 -*-
import urllib.request
import re
id_list = ["GCF_002583515.1", "GCF_002560475.1", "GCF_002566665.1", "GCF_002552335.1", "GCF_000293525.1", "GCF_002567745.1"]
for search_id in id_list:
real_search_id = re.sub(' ', '+', search_id)
url = r'https://www.ncbi.nlm.nih.gov/assembly/?term=' + real_search_id + '&report=full&format=text'
response = urllib.request.urlopen(url).read().decode("utf-8")
Organism_name = re.search("^Organ.*",response,re.M).group()
Taxonomy_check = re.search("^Taxonomy.*",response,re.M).group()
Infraspecific_name = re.search("^Infraspecific.*",response,re.M).group()
BioSample = re.search("^BioSample.*",response,re.M).group()
BioProject = re.search("^BioProject.*",response,re.M).group()
Submitter = re.search("^Submitter.*",response,re.M).group()
Date = re.search("^Date.*",response,re.M).group()
Assembly_type = re.search("^Assembly_type.*",response,re.M).group()
Assembly_level = re.search("^Assembly_level.*",response,re.M).group()
Genome_representation = re.search("^Genome_representation.*",response,re.M).group()
Global_statistics = re.search("^Global_statistics.*",response,re.M).group()
Global_statistics_context = re.search("(^ .*\n)+",response,re.M).group()
write_context = Organism_name + "\n" + Taxonomy_check + "\n" + Infraspecific_name + "\n" + BioSample + "\n" + BioProject + "\n" + Submitter + "\n" + Date + "\n" + Assembly_type + "\n" + Assembly_level + "\n" + Genome_representation + "\n" + Global_statistics + "\n" + Global_statistics_context
filename = r'project/liuyao/123/NCBI' + search_id + '.txt'
with open(filename, 'w') as f:
f.write(write_context)
答
AttributeError: 'NoneType' object has no attribute 'group' 说明你某行代码的正则匹配没有数据,这个可能是爬虫的网站更新或被反爬了,导致数据发生变化,正则匹配不上了。
答
可试试使用 if...else语句过滤或用try/except异常处理。比如这行:
Organism_name = re.search("^Organ.",response,re.M).group()
写成:
Organism_name = re.search("^Organ.",response,re.M).group() if re.search("^Organ.*",response,re.M) else "