Project Meeting 25.5.17

Misc

Appraisal Dev Plan
Workshop Week in November

Non-XML Text

GEs Q/F King Lear differences 12.5.17

XML

SEE XML Texts: GE's Who's Got What List

XML Checking

XML encodings of Q1 LR and some other play editions..

Haskell HXT -- dependency errors
Python LXML

test_xml.py 25.5.17
checkAll.sh 25.5.517
../GABRIEL'S-SET/Latest//3 Henry 6_F.xml
../GABRIEL'S-SET/Latest//Hamlet_F.xml
../GABRIEL'S-SET/Latest//Hamlet_Q2.xml
../GABRIEL'S-SET/Latest//King Lear_F.xml
  File "src/lxml/parser.pxi", line 635, in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:105671)
  File "../GABRIEL'S-SET/Latest//King Lear_F.xml", line 2847
lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: div2 line 2813 and sp, line 2847, column 39
../GABRIEL'S-SET/Latest//King Lear_Q1.xml
../GABRIEL'S-SET/Latest//Othello_Q1.xml
../GABRIEL'S-SET/Latest//Richard_2_F.xml
../GABRIEL'S-SET/Latest//Richard_3_F.xml
../GABRIEL'S-SET/Latest//Romeo and Juliet_Q1.xml
../GABRIEL'S-SET/Latest//Romeo and Juliet_Q2.xml
../GABRIEL'S-SET/Latest//Taming of the Shrew_Taming of a Shrew (1594).xml
Fix -- who?
Publish -- Blog?

Parsing Speeches

parse_xml.py 25.5.17

CFS Goals

The original proposal (which I think we should put up on the project website) said we'd look at these methods for comparing texts:

Measurement of the Shannon Entropy and Jensen-Shannon Divergence of texts
Creation of Word Adjacency Networks (WAN) using Markov chains to store the proximity values of 100+ function words found within a text
Nearest Shrunken Centroid (NSC)
Random Forests
Burrows & Craig's Delta, Zeta, and Iota

We now know that Hugh Craig is pursuing (4) and (5), so we should pursue (1), (2), and (3). Of course you've already pursued (5) by getting used to Intelligent Archive. The only one of the above I have no experience of is (3) and if you decide it's of no value to us we can drop it. But we should be doing something with the other two (= (1) and (2)) to diversify our approach

HC Delta, Random Forrest etc

DTW of Speeches

Same code for arbitrary length strings

Char, Word, Phrase, line, sentence, paragraph, speech ....

Encoding of the strings ?? -- enumeration
Problem -- "enumerated string" == "position" (as strings become longer/unique) --> DTW on straight lines?
Maybe DTW of words/phrases only

Stylo tools in R

See https://dh101.ch/2013/10/16/stylometry-tools-and-examples-for-authorship-attribution/
Also: https://cls.ru.nl/~ihendrickx/Posters_ehum/4_Eder_Kestemont_Rybicki_Poster.pdf

DHSI 2016 Victoria Canada

https://dhsi.org/content/2016Curriculum/22.%20Introduction%20to%20Computation%20for%20Literary%20Criticism%20(2016).pdf

Other

To Do SEE 19.5.17