- PCA Analysis of 200 function words in 900 word segments of each play (cf. Transforming King Lear by Craig and Kinney).
- Non VARDed/POSed XML files
Genre Decade Likelihood S. Quality of Edition
Notes
- Generally pairs of plays are found to be stylistically similar (located close together in the PCA plot) under this 200 function word analysis on 900 word segments.
- Generally when coloured by Genre, Histories (blue) lie to the Left, Comedies (red) lie to the lower right and Tragedies (green) lie in the center upper part.
- Coloured by good (green) and bad (red) editions, the MMWQ1 is clearly located away from the other plays.
- Coloured by provenance, plays that have a less than 100% probability of being by S. are generally located above the origin.
- Coloured chronologically, earlier (green: pre 1590) plays such as Tit are located at the top and later plays (red: post 1600) are located mid to bottom.
Files and Code Used
List of 200 Function Words CK_200_function_words.csvi XML preprocessing xml2pandas.py Data file generated by xml2pandas.py df_xml.csv Lookup Table used to assign categories to the plays lut_1.csv Python script to generate PCA plots exp_1ii.py
XML Files Used