- Curve Segmentation Algorithm on the plot of all entries concerned.
- Combining lines with similar behavior.
- Identify topic segments.
- Classify topic segments based on the development pattern insides.
If necessary, the step 2 to step 4 could be used on the result in an iterated way, until no changes happen any longer.
We exploit a curve segmentation algorithm from Lowe in the first step to obtain a series of lines approximating the original points.
In the second step, we decide whether to combine two consecutive curve segments (i.e., lines) through making a comparison of slopes and average variances between them. The key point is the combined lines should have similar slope and average variance. How to define the 'similarity'? We decide to use increasing rate to calculate the difference between two consecutive lines. We believe the approach with rate will be better than absolute value for its indenpendance of the specific application.
Because I just redefine the topic semgents (might be a point, a line or a series of lines), I must give more details about that and plan a new experiment to prove it.
In fact, we can classify each topic segments at the step three. When a point is a topic segment, it belongs to 'dominated/dominant' pattern. When the topic segment is composed of one line, it would be 'drifting' or 'dominated/dominant' pattern. When a topic segment is made up of not only one lines, it's probably 'interrupted' pattern. For the topic segments of 'interrupted' pattern, we use one line over its components to replace original lines. Such a replacement will possibly change the development patterns around, leading to a new iteration from step 2 to step 4.
No comments:
Post a Comment