可以用什么字符来解析Java段落?

可以用什么字符来解析Java段落?

问题描述:

我相信大家会从这个人那里得到一个好笑,但对于我的生活,我找不到一个分隔符,它将指示一个新段落何时开始出现在一串文本中。字和行?容易腻,但段落似乎更难找到。我连续尝试了两个换行符,段落中断和换行符的Unicode表示,没有运气。

I'm sure folks will get a good laugh out of this one, but for the life of me I cannot find a seperator that will indicate when a new paragraph has begun in a string of text. Word, and line? Easy peasy, but paragraph seems to be much harder to find. I've tried two line breaks in a row, the Unicode representation of paragraph break and line break, with no luck.

编辑:我为我原来的含糊不清道歉题。要回答一些问题,它是最初在Windows上创建的基本文本文件。我正在测试一些代码,用于使用RIM eclipse插件使用Blackberry JDE 4.5打开和分析它的内容。虽然文件的来源是windows(至少在可预见的未来)并且是基本文本,但我无法控制它们的创建方式(它是第三方来源,我不能访问它的创建方式)

I apologize for the vagueness of my original question. To answer some of the questions, it is a basic text file originally created on windows. I'm testing some code for opening and analyzing it's contents with the Blackberry JDE 4.5 using the RIM eclipse plugin. While the source of the file will be windows (at least for the foreseeable future) and be basic text, I have no control over how they are created (it's a third party source that I dont' have access to the way it is created)

普通用法中没有这样的段落符号。

There is no such paragraph break character in common usage.

假设一行中有两个或多个换行符(带有可选的水平空格)表示段落中断,您可能会离开。但是这个规则有很多例外。例如,当段落

You might be able to get away with assuming that two or more line breaks in a row (with optional horizontal whitespace) indicates a paragraph break. But there are numerous exceptions to this "rule". For example, when a paragraph


  • 被浮动数字打断时,或

  • 包含项目符号

然后继续...就像这样。对于那种事情,可能没有解决方案。

and then continues on ... like this one. For that kind of thing, there is probably no solution.

编辑每个@Aiden的评论如下。 (现在很明显,这与OP无关,但它可能与通过谷歌等发现问题的其他人相关)

EDIT per @Aiden's comment below. (It is now clear that this is not relevant to the OP, but it may be relevant to others who find the question via Google, etc)

而不是试图逆转从文本中设计段落,也许您应该考虑指定您的输入应该在(例如) Markdown 语法;即由StackOverflow支持。 Markdown Wiki 包含许多语言(包括Java)的markdown解析器实现的链接。

Instead of trying to reverse engineer paragraphs from text, perhaps you should consider specifying that your input should be in (for example) Markdown syntax; i.e. as supported by StackOverflow. The Markdown Wiki includes links to markdown parser implementations in many languages, including Java.

(这假设你可以控制你试图解析为段落的文本的输入格式,等等。)

(This assumes that you have some control over the input format of the text you are trying to parse into paragraphs, etcetera.)