想从NCBI中批量获取一些基因组信息,运行Python后提示“AttributeError: 'NoneType' object has no attribute 'group' 怎么修改啊?

问题描述:

我目前有许多基因ID,想从NCBI
https://www.ncbi.nlm.nih.gov/assem
中批量搜索这些基因的相关信息,形成文件。
获取信息部分内容如下

img

基因ID:GCF_002583515.1,GCF_002560475.1,GCF_002566665.1,GCF_002552335.1,GCF_000293525.1,GCF_002567745.1

从网上看了一些别人编的代码
我运行后提示:

img


哪里有问题啊! 毕不了业了! 望有人回复!

# -*- coding: utf-8 -*-
import urllib.request
import re
id_list = ["GCF_002583515.1", "GCF_002560475.1", "GCF_002566665.1", "GCF_002552335.1", "GCF_000293525.1", "GCF_002567745.1"]
for search_id in id_list:
    real_search_id = re.sub(' ', '+', search_id)
    url = r'https://www.ncbi.nlm.nih.gov/assembly/?term=' + real_search_id + '&report=full&format=text'
    response = urllib.request.urlopen(url).read().decode("utf-8")
    Organism_name = re.search("^Organ.*",response,re.M).group()
    Taxonomy_check = re.search("^Taxonomy.*",response,re.M).group()
    Infraspecific_name = re.search("^Infraspecific.*",response,re.M).group()
    BioSample = re.search("^BioSample.*",response,re.M).group()
    BioProject = re.search("^BioProject.*",response,re.M).group()
    Submitter = re.search("^Submitter.*",response,re.M).group()
    Date = re.search("^Date.*",response,re.M).group()
    Assembly_type = re.search("^Assembly_type.*",response,re.M).group()
    Assembly_level = re.search("^Assembly_level.*",response,re.M).group()
    Genome_representation = re.search("^Genome_representation.*",response,re.M).group()
    Global_statistics = re.search("^Global_statistics.*",response,re.M).group()
    Global_statistics_context = re.search("(^    .*\n)+",response,re.M).group()
    write_context = Organism_name + "\n" + Taxonomy_check + "\n" + Infraspecific_name + "\n" + BioSample + "\n" + BioProject + "\n" + Submitter + "\n" + Date + "\n" + Assembly_type + "\n" + Assembly_level + "\n" + Genome_representation + "\n" + Global_statistics + "\n" + Global_statistics_context
    filename = r'project/liuyao/123/NCBI' + search_id + '.txt'
    with open(filename, 'w') as f:
        f.write(write_context)

AttributeError: 'NoneType' object has no attribute 'group' 说明你某行代码的正则匹配没有数据,这个可能是爬虫的网站更新或被反爬了,导致数据发生变化,正则匹配不上了。

可试试使用 if...else语句过滤或用try/except异常处理。比如这行:
Organism_name = re.search("^Organ.",response,re.M).group()
写成:
Organism_name = re.search("^Organ.
",response,re.M).group() if re.search("^Organ.*",response,re.M) else "