Some questions about the Curve Segmentation
Last night, I finished the implementation of the curve segmentation algorithm in Java. I am thinking about some questions about the application of this algorithm in the topic segmentation.
1. Now we used MDS to extract the most important feature in the inputting text stream, and some information is lost during this extraction. The amount of lost information depends upon the degree of the complexity of data. That is, much information could be lost, when MDS can not find the MOST important feature: no dominant feature exist at all, or not only one feature is important of all etc.
Since the following procedure to do the curve segmentation is on the basis of the dominant features extracted through MDS, it's not clear that we could get the accurate segments about the information stream using the features that can not reflect the whole graph.
Thus, the question is, how to keep the accuracy of representative features. One solution is to use a method rather than MDS to analyze the dominant features of the inputting information stream, the other is to use complex features as the representative of the data.
MDS is kind of dimension reduction method, which tries to capture common features in the data, and compress the representative without losing much accuracy. In theory, it's not very effective to use other dimension reduction methods if MDS has done a pretty good work. Maybe we'd better try the second solution: adopt the curve segmentation to the high dimension space.
2. Our goal in the project is to analyze the information stream, which is often dynamically changed with the time. However, the method we used to do the curve segmentation requires the whole graph. That's, we have to analyze the data from the scratch when some changes happen, in order to exploit MDS to reduce the dimensions of feature space. One consequence is that the patterns we got just . That means, the analysis result can change as more events occur in the stream. It sounds not good, although reasonable. Usually, people are easily lost in the changing world! Therefore, the question is: how to segment the curve in an incremental way with keeping the consistency of the result as much as possible. I don't know the answer now. One idea is to construct different segmentation based on the sizes of window containing information in the stream along the time line.
No comments:
Post a Comment