CFS Goals

The original proposal (which I think we should put up on the project website) said we'd look at these methods for comparing texts:

  1. Measurement of the Shannon Entropy and Jensen-Shannon Divergence of texts
  2. Creation of Word Adjacency Networks (WAN) using Markov chains to store the proximity values of 100+ function words found within a text
  3. Nearest Shrunken Centroid (NSC)
  4. Random Forests
  5. Burrows & Craig's Delta, Zeta, and Iota

We now know that Hugh Craig is pursuing (4) and (5), so we should pursue (1), (2), and (3). Of course you've already pursued (5) by getting used to Intelligent Archive. The only one of the above I have no experience of is (3) and if you decide it's of no value to us we can drop it. But we should be doing something with the other two (= (1) and (2)) to diversify our approach

