通过 Tweepy 删除推文中的换行符

问题描述:

我正在寻找从 Twitter API 中提取数据并创建一个管道分隔文件,我可以对其进行进一步处理.我的代码目前看起来像这样:

I'm looking pull data from the Twitter API and create a pipe separated file that I can do further processing on. My code currently looks like this:

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)

out_file = "tweets.txt"

tweets = api.search(q='foo')
o = open(out_file, 'a')

for tweet in tweets:
        id = str(tweet.id)
        user = tweet.user.screen_name
        post = tweet.text
        post = post.encode('ascii', 'ignore')
        post = post.strip('|') # so pipes in tweets don't create unwanted separators
        post = post.strip('\r\n')
        record = id + "|" + user + "|" + post
        print>>o, record

当用户的推文包含换行符时,我遇到了一个问题,这使得输出数据看起来像这样:

I have a problem when a user's tweet includes line breaks which makes the output data look like this:

473565810326601730|usera|this is a tweet 
473565810325865901|userb|some other example 
406478015419876422|userc|line 
separated 
tweet
431658790543289758|userd|one more tweet

我想去掉第三条推文的换行符.除了上述之外,我还尝试过 post.strip('\n') 和 post.strip('0x0D 0x0A') 但似乎都不起作用.有什么想法吗?

I want to strip out the line breaks on the third tweet. I've tried post.strip('\n') and post.strip('0x0D 0x0A') in addition to the above but none seem to work. Any ideas?

那是因为 strip 返回删除了前导尾随字符的字符串副本".

That is because strip returns "a copy of the string with leading and trailing characters removed".

您应该使用 replace 作为新行和管道:

You should use replace for the new line and for the pipe:

post = post.replace('|', ' ')
post = post.replace('\n', ' ')