Scala:如何遍历流/迭代器将结果收集到几个不同的集合中

问题描述:

我正在浏览的日志文件太大而无法容纳到内存中并收集2种类型的表达式,下面的迭代代码片段有哪些更好的功能替代?

I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?

def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
  val lines : Iterator[String] = io.Source.fromFile(file).getLines()

  val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
  val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty

  for (line <- lines){
    line match {
      case errorPat(date,ip)=> errors.append((ip,date))
      case loginPat(date,user,ip,id) =>logins.put(ip, id)
      case _ => ""
    }
  }

  errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
}


这是一个可能的解决方案:

Here is a possible solution:

def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
  val lines = Source.fromFile(file).getLines
  val (err, log) = lines.collect {
        case errorPat(inf, ip) => (Some((ip, inf)), None)
        case loginPat(_, _, ip, id) => (None, Some((ip, id)))
      }.toList.unzip
  val ip2id = log.flatten.toMap
  err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
}