Experiments in the Q/F King Lear differences. Now that we have TEI-XML encoded texts of Q1 (1608) and Folio (1623) King Lear, we can undertake some experiments. I propose that the first step is for you to put tags into the XML encodings to mark off: * In Q1, the passages that are unique to Q1 * In F, the passages that are unique to F The crib for finding and marking up the passages is the document "LR-Q1-F-mapping.htm". Below I provide some notes on adding the XML tags needed to mark up the texts in the above way. Once we have the XML documents so marked up, it should be easy to use XPath to pull out: * The Q1-only passages (hereafter Q1-unique) * The F-only passages (hereafter F-unique) * Q1 minus the Q1-only passages (hereafter Q1-common-with-F) * F minus the F-only passages (hereafter F-common-with-Q1) I was thinking we'd apply the various quantification methods, including such things as Shannon Entropy, Shannon-Jensen Divergence, Edit Distance, and DTW, to do a series of comparisons. Some tests will be more sensible that others in different cases. For example, DTW on texts that are not supposed to have any runs of words in common (such as Q1-unique and F-unique) makes no sense, but DTW does make sense in comparing Q1-common-with-F with F-common-with-Q1 since these are meant to be essentially the same writing. Conversely, Shannon Entropy and Shannon-Jensen Divergence make sense for comparing Q1-unique with F-unique. I leave it to you to consider what tests make best sense with what texts, but the kinds of questions I was hoping we could start to answer are: Q1) Some basic stats. How long are the Q1-unique and F-unique passages? How do their contributions break down by scene or act-scene (as in can we graph where in the play--beginning, middle, end--they fall)? Q2) What differences are there between Q1-common-with-F and F-common-with-Q1? Editorial judgement says that these bodies of writing are essentially the same lines with small verbal variants between them but no whole lines unique to either. Do our tests substantiate that? Q3) What differences if any are there between Q1-unique and Q1-common-with-F? I'm thinking not only of differences in choices of words within speeches, but also in kinds of speech (prose versus verse) and who speaks. Can we say that certain characters are over/under-represented in the Q1-unique material? We are looking to see if Q1-unique seems to be 'of a piece' with Q1-common-with-F or seems different in some way. Q4) What differences if any are there between F-unique and F-common-with-Q1? I'm thinking of the same kind of questions as in (2) above. Q5) What differences are there between Q1-unique and F-unique? In the one-text theory there was only ever one play and Q1 and F are both imperfect records of it, each deviating from the correct script only by errors of transmission. If that is true, then Q1-unique and F-unique should be alike despite being printed 15 years apart since both are parts of the same lost original play that comprise the superset of Q1-unique + F-unique + the lines in common to both plays, meaning Q1-common-with-F and F-common-with-Q1 which last two are, in this view, essentially the same thing. Q6) Any further question(s) suggested to us by our Advisory Board, which I am going to poll now. Q7) Any further question(s) and permutations of the above that occur to us. How to Add the XML Tagging for these Experiments In order to conform to the TEIlite encoding standard, we will use the <add> (meaning "addition") element for this, because that element is already defined in the TEIlite DTD. For shorthand here, we will call a piece of text so marked an "addition" but that is just a convenience and we do not in fact know if something was added to something else to make the edition we are marking up or if something was taken away from something else to make the edition we are marking up. We are calling the Q1-unique and F-unique materials "additions" solely because it is convenient to use the TEI element <add> to capture them. This <add> element is not allowed to contain any sub-elements other than empty milestone elements such as <lb/> line breaks. So, we need to encode each run of text that is unique to one edition at the lowest possible level in the document tree, amongst the parsed character data. The @resp attribute that we apply to the <add> element will point to one of two IDs that we add to the start of the document, after the list of IDs used to encode character names, to indicate what kind if material it is: ... <name id="AM">Knight</name> <name id="Q1-only">Material present in Q1 and not in F</name> <name id="F-only">Material present in F and not in Q1</name> ... When marking up a single line that is Q1-only or F-only, we need to mark up the speech prefix and the spoken words as separate elements: IN KING LEAR FOLIO: ... <sp who="D"><speaker>Cord.</speaker><l>Nothing my Lord.</l></sp> <sp who="A"><speaker><add resp="F-only">Lear.</add></speaker><l><add resp="F-only">Nothing?</add></l></sp> <sp who="D"><speaker><add resp="F-only">Cord.</add></speaker><l><add resp="F-only">Nothing.</add></l></sp> <sp who="A"><speaker>Lear.</speaker><l>Nothing will come of nothing, speake againe.</l></sp> ... When the addition is a run of verse lines, the content of each line is treated as a separate addition, and the addition may end (as it does here) before the end of the verse line: IN KING LEAR QUARTO: <sp who="D"><speaker>Cord.</speaker> <l>Had you not bene their father these white flakes,</l> <l>Had challengd pitie of them, was this a face</l> <l>To be exposd against the warring winds,</l> <l><add resp="Q1-only">To stand against the deepe dread bolted thunder,</add></l> <l><add resp="Q1-only">In the most terrible and nimble stroke</add></l> <l><add resp="Q1-only">Of quick crosse lightning to watch poore Per du,</add></l> <l><add resp="Q1-only">With this thin helme</add> mine iniurious dogge,</l> <l>Though he had bit me, should haue stood that night</l> When the addition is a run of prose lines, the fact that all that separates each prose line is an empty milestone <lb/> element (which is allowed within the <add> element) means that we can wrap the whole addition in a single pair of tags: IN FOLIO KING LEAR: <sp who="A"><speaker>Lear.</speaker><p>And the Creature . . <lb/> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <lb/> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <lb/> . . . . . . . . . . . . . . . . . . . . . . . do appeare: Robes,<lb/> and Furr'd gownes hide all. <add resp="F-only">Place sinnes with Gold, and<lb/> the strong Lance of Iustice, hurtlesse breakes: Arme it in<lb/> ragges, a Pigmies straw do's pierce it. None do's offend,<lb/> none, I say none, Ile able 'em; take that of me my Friend,<lb/> who haue the power to seale th'accusers lips.</add> Get thee<lb/> glasse-eyes, and like a scuruy Politician, seeme to see the<lb/> . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . <lb/> Bootes: harder, harder, so.</p></sp> ... * I think the labour of adding of tags like this gives one a better appreciation of just how the XML is working, and this experience prevents one jumping to false assumptions about exactly what is being pulled out of the XML by XPath and other tools. ... it is important that... the resulting files must validate against the "teilite.dtd".