Code Fixes
- Parsing pretty printed xml
- reg/seg words running into one another
- Curtizan speech verse/prose word/line counts
- Regularizing "Ill" to "I will" etc -- effect on total word counts
Speeches split by stage directions
- Detecting xml problems -- new codes added to analyseXML.py
- Logging occurrences to file -- eg 'F_splitSpeeches.txt'
- EG: Lr F act 5 scene 3 speeches 33 and 33 are both by the herald but are split by a stage direction "Trumpet"
F,5,3 33,AI,Herald,Her., % If any man of quality or degree, within the lists of the Army, will maintain upon Edmund, supposed Earl of Gloster, # that he is a manifold Traitor, let him appear by the third # sound of the Trumpet: he is bold in his defence. 34,AI,Herald,Her., @~ Again . F,5,3 34,AI,Herald,Her., @~ Again . 35,AI,Herald,Her., @~ Again .
- DECISION Manually edit the XML to fix these split speeches