Non-XML Text


XML Checking

../GABRIEL'S-SET/Latest//3 Henry 6_F.xml
../GABRIEL'S-SET/Latest//King Lear_F.xml
  File "src/lxml/parser.pxi", line 635, in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:105671)
  File "../GABRIEL'S-SET/Latest//King Lear_F.xml", line 2847
lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: div2 line 2813 and sp, line 2847, column 39
../GABRIEL'S-SET/Latest//King Lear_Q1.xml
../GABRIEL'S-SET/Latest//Romeo and Juliet_Q1.xml
../GABRIEL'S-SET/Latest//Romeo and Juliet_Q2.xml
../GABRIEL'S-SET/Latest//Taming of the Shrew_Taming of a Shrew (1594).xml

Parsing Speeches

CFS Goals

The original proposal (which I think we should put up on the project website) said we'd look at these methods for comparing texts:

  1. Measurement of the Shannon Entropy and Jensen-Shannon Divergence of texts
  2. Creation of Word Adjacency Networks (WAN) using Markov chains to store the proximity values of 100+ function words found within a text
  3. Nearest Shrunken Centroid (NSC)
  4. Random Forests
  5. Burrows & Craig's Delta, Zeta, and Iota

We now know that Hugh Craig is pursuing (4) and (5), so we should pursue (1), (2), and (3). Of course you've already pursued (5) by getting used to Intelligent Archive. The only one of the above I have no experience of is (3) and if you decide it's of no value to us we can drop it. But we should be doing something with the other two (= (1) and (2)) to diversify our approach

DTW of Speeches

Stylo tools in R

DHSI 2016 Victoria Canada