urllib2.HTTPError Python

问题描述:

我有一个带有GI号的文件,想从ncbi中获取FASTA序列.

I have a file with GI numbers and would like to get FASTA sequences from ncbi.

from Bio import Entrez
import time
Entrez.email ="eigtw59tyjrt403@gmail.com"
f = open("C:\\bioinformatics\\gilist.txt")
for line in iter(f):
    handle = Entrez.efetch(db="nucleotide", id=line, retmode="xml")
    records = Entrez.read(handle)
    print ">GI "+line.rstrip()+" "+records[0]["GBSeq_primary-accession"]+" "+records[0]["GBSeq_definition"]+"\n"+records[0]["GBSeq_sequence"]
    time.sleep(1) # to make sure not many requests go per second to ncbi
f.close()

此脚本运行正常,但是经过几个序列后,我突然收到此错误消息.

This script runs fine but I suddenly get this error message after a few sequences.

Traceback (most recent call last):
  File "C:/Users/Ankur/PycharmProjects/ncbiseq/getncbiSeq.py", line 7, in <module>
    handle = Entrez.efetch(db="nucleotide", id=line, retmode="xml")
  File "C:\Python27\lib\site-packages\Bio\Entrez\__init__.py", line 139, in efetch
    return _open(cgi, variables)
  File "C:\Python27\lib\site-packages\Bio\Entrez\__init__.py", line 455, in _open
    raise exception
urllib2.HTTPError: HTTP Error 500: Internal Server Error

我当然可以使用http://www.ncbi.nlm.nih.gov/sites/batchentrez,但是我正在尝试创建管道并希望自动化.

Of course I can use http://www.ncbi.nlm.nih.gov/sites/batchentrez but I am trying to create a pipeline and would like something automated.

如何防止ncbi踢我出去"

How can I prevent ncbi from "kicking me out"

这是正常"的Entrez API临时失败,即使您应用了所有Entrez API规则也可能发生. Biopython文档在本节.

It's a "normal" Entrez API temporary fail which can occur even if you've applied all Entrez API rules. Biopython documentation explains a way to handle it in this section.

有时您会从Entrez收到间歇性错误,HTTPError 5XX,我们使用了一个尝试(暂停重试块除外)来解决此问题.例如,

Sometimes you will get intermittent errors from Entrez, HTTPError 5XX, we use a try except pause retry block to address this. For example,

# This assumes you have already run a search as shown above,
# and set the variables count, webenv, query_key

try:
    from urllib.error import HTTPError  # for Python 3
except ImportError:
    from urllib2 import HTTPError  # for Python 2

batch_size = 3
out_handle = open("orchid_rpl16.fasta", "w")
for start in range(0, count, batch_size):
    end = min(count, start+batch_size)
    print("Going to download record %i to %i" % (start+1, end))
    attempt = 0
    while attempt < 3:
        attempt += 1
        try:
            fetch_handle = Entrez.efetch(db="nucleotide",
                                         rettype="fasta", retmode="text",
                                         retstart=start, retmax=batch_size,
                                         webenv=webenv, query_key=query_key,
                                         idtype="acc")
        except HTTPError as err:
            if 500 <= err.code <= 599:
                print("Received error from server %s" % err)
                print("Attempt %i of 3" % attempt)
                time.sleep(15)
            else:
                raise
    data = fetch_handle.read()
    fetch_handle.close()
    out_handle.write(data)
out_handle.close()

因此,您不必为这个错误感到内just,只需抓住它即可.

So you don't to feel guilty about this error and just have to catch it.