Misc
- Appraisal Dev Plan
- Workshop Week in November
Non-XML Text
XML
XML Checking
- XML encodings of Q1 LR and some other play editions..
- Haskell HXT -- dependency errors
- Python LXML
../GABRIEL'S-SET/Latest//3 Henry 6_F.xml ../GABRIEL'S-SET/Latest//Hamlet_F.xml ../GABRIEL'S-SET/Latest//Hamlet_Q2.xml ../GABRIEL'S-SET/Latest//King Lear_F.xml File "src/lxml/parser.pxi", line 635, in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:105671) File "../GABRIEL'S-SET/Latest//King Lear_F.xml", line 2847 lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: div2 line 2813 and sp, line 2847, column 39 ../GABRIEL'S-SET/Latest//King Lear_Q1.xml ../GABRIEL'S-SET/Latest//Othello_Q1.xml ../GABRIEL'S-SET/Latest//Richard_2_F.xml ../GABRIEL'S-SET/Latest//Richard_3_F.xml ../GABRIEL'S-SET/Latest//Romeo and Juliet_Q1.xml ../GABRIEL'S-SET/Latest//Romeo and Juliet_Q2.xml ../GABRIEL'S-SET/Latest//Taming of the Shrew_Taming of a Shrew (1594).xml
- Fix -- who?
- Publish -- Blog?
Parsing Speeches
CFS Goals
The original proposal (which I think we should put up on the project website) said we'd look at these methods for comparing texts:
- Measurement of the Shannon Entropy and Jensen-Shannon Divergence of texts
- Creation of Word Adjacency Networks (WAN) using Markov chains to store the proximity values of 100+ function words found within a text
- Nearest Shrunken Centroid (NSC)
- Random Forests
- Burrows & Craig's Delta, Zeta, and Iota
We now know that Hugh Craig is pursuing (4) and (5), so we should pursue (1), (2), and (3). Of course you've already pursued (5) by getting used to Intelligent Archive. The only one of the above I have no experience of is (3) and if you decide it's of no value to us we can drop it. But we should be doing something with the other two (= (1) and (2)) to diversify our approach
DTW of Speeches
- Same code for arbitrary length strings
- Char, Word, Phrase, line, sentence, paragraph, speech ....
- Encoding of the strings ?? -- enumeration
- Problem -- "enumerated string" == "position" (as strings become longer/unique) --> DTW on straight lines?
- Maybe DTW of words/phrases only
Stylo tools in R
- See https://dh101.ch/2013/10/16/stylometry-tools-and-examples-for-authorship-attribution/
- Also: https://cls.ru.nl/~ihendrickx/Posters_ehum/4_Eder_Kestemont_Rybicki_Poster.pdf
DHSI 2016 Victoria Canada
Other