在txt文件中，查找并处理重复的单词解决方法

在txt文件中，查找并处理重复的单词
有一个文本文件内容如下：
========分割线========

你好，很好
23 26003 测试你好
24 26666 视频网络的性能等等
25 26003 测试你好
26 10023 这个测试句子
很好：是的
27 55210 没有了

========分割线========

现在要找出文件中的重复项如：

23 26003 测试你好
25 26003 测试你好

并在重复的后面加上: (重复)
即：
23 26003 测试你好
25 26003 测试你好(重复)

注意：这两句除了前面的序号不同后面的字都相同

最后的文件是这样子：
========分割线========

你好，很好
23 26003 测试你好
24 26666 视频网络的性能等等
25 26003 测试你好(重复)
26 10023 这个测试句子
很好：是的
27 55210 没有了

========分割线========

写代码实现上面的功能，这个东西我想了很长时间没办法，求各位帮帮忙

------解决方案--------------------

C/C++ code

#include   "stdafx.h" 
#include <fstream>
#include <string>
#include <set>

using namespace std;

bool checkFile(const char* pSrcFile, const char* pDestFile)
{
    ifstream inf(pSrcFile);
    ofstream outf(pDestFile);
    set<string> setLine;
    if(inf && outf)
    {
        string strLine;
        while(getline(inf, strLine))
        {
            bool bDuplicate(false);
            int nStart = strLine.find_first_of(' ', 0);//first space
            if(nStart != string::npos)
            {
                int nEnd = strLine.find_first_not_of(' ', nStart);//first space end
                if(nEnd != string::npos)
                {
                    string strSub = strLine.substr(nEnd, strLine.size() - nEnd);
                    if(setLine.find(strSub) != setLine.end())
                    {
                        bDuplicate = true;
                    }
                    else
                    {
                        setLine.insert(strSub);
                    }
                }
            }

            if(bDuplicate)
            {
                strLine += "(重复)";
            }
            strLine += "\n";
            outf.write(strLine.c_str(), strLine.size());
        }

        outf.flush();

        return true;
    }

    return false;
}

int main(int argc, char *argv[])
{
    checkFile("c:\\test1.txt", "c:\\test2.txt");

    system("pause");

    return 0;
}

------解决方案--------------------

C/C++ code


#include <iostream>
#include <string>
#include <fstream>
#include <vector>
using namespace std;
int main()
{
    vector<string> v;
    fstream file;
    string filename;
    cout<<"Input file's name:";
    cin>>filename;
    string str;
    file.open(filename.c_str());
    if(!file.is_open())
    {
        cout<<"fail to open this file!"<<endl;
        return 1;
    }
    while(file>>str)
    {
        v.push_back(str);
        getline(file,str);
        if(str==" ")
            continue;
        v.push_back(str);
    }
    vector<string>::iterator iter=v.begin();
    while(iter!=v.end())
    {
        vector<string>::iterator it=iter;
        for(++it;it!=v.end();it++)
        {
            if((*iter)==(*it))
            {
                (*it)=(*it)+"(重复)";
            }
        }
        iter++;
    }
    int k=0;
    for(iter=v.begin();iter!=v.end();iter++)
    {
        if(k%2==0)
            cout<<endl;
        k++;
        cout<<*iter<<"  ";
    }
    cout<<endl;
    return 1;
}

------解决方案--------------------
这个文本不复杂，可以自己解析。

读取 line；
分解获得序号以后的 string，
和 vector 中已有的string 比较（find 方法查找一下），如有重复，追加"(重复)"，push_back()添加到vector中
continue ...

子串的获得，find方法等的使用，看：www.cppreference.com/cppstring/index.html

在txt文件中，查找并处理重复的单词解决方法

相关推荐