Saturday, October 08, 2005

Choose a good dataset to make the experiment of topic segmentation of Weblog

I spent the whole day on checking and improving my matlab code for my paper (topic segmentation). After some modifications, especially of the definition of 'interrupted' pattern, I found it's hard to identify a pattern like that with currently used dataset. So I decide to find a good one which is able to produce such a pattern as interrupted. I tried the entries in the TalkingPoint on 2004, and I found there were at least 2 interrupted patterns generating with our algorithm. Even though I need be careful about the result, it's a good news anyway.
:)

No comments: