Exp1 28.4.17

Bash Scripts

Kinney reports 70 occurrences of which as relative in LrF
We assume that (for the bar plots) he uses "div" to separate the acts...

But then he talks about 55 in both Q and F
The he talks about sections and passages that differ
- How does he find these counts within segments in the lilac texts?


Fri, Apr 28, 2017 10:26:02 AM hist:13473  jobs:0
Admin@ARMBP:/cygdrive/c/Users/Admin/Desktop/DMU/SEE Texts/SEE Experiments/Experiment 1 Kinney Chapter> cat KingLear1623.txt | tr "(, " "\n\n\n" | grep -i which | sort | uniq -c
     40 which$6#1$
     30 Which$6#1$
      2 which$6#2$
      3 Which$6#2$

Fri, Apr 28, 2017 10:26:02 AM hist:13474  jobs:0
Admin@ARMBP:/cygdrive/c/Users/Admin/Desktop/DMU/SEE Texts/SEE Experiments/Experiment 1 Kinney Chapter> cat KingLear1608.txt | tr "(, " "\n\n\n" | grep -i which | sort | uniq -c
     45 which$6#1$
     19 Which$6#1$
      2 which$6#2$
      3 Which$6#2$

Haskell Script:

Used getVariants.sh to find variants of function words.

{-
# -*- coding: utf-8 -*-

# --------------------------------------------------
# File Name: exp1.hs
# Location: 
# Purpose:
# Creation Date: 26-04-2017
# Last Modified: Wed, Apr 26, 2017  5:34:30 PM
# Author(s): Mike Stout 
# Copyright 2017 The Author(s) All Rights Reserved
# Credits: 
# --------------------------------------------------
-}
{-# OPTIONS_GHC -fno-warn-tabs #-}

import Utilities
import Data.List
import Data.Char

main = interact $ doStuff

sh (divId, result) = 
	unlines [ show divId ++","++ sh1 res | res <- result] 

sh1 (kw, ps, count) = joinAll "," [ getGroup kw, show kw, show count] -- , show ps]

doStuff s = unlines 
	$ map sh 
	$ map analyse 
	$ zip [1..] 
	$ prs "<div" "</div" 
	-- $ takes 10000 
	$ map toLower s

thats = words "that that$3$ that$6#1$ that$6#2$"
whichs = words "which which$6#1$ which$6#2$"
whos = words "who who$6#1$ who$6#2$"

group1 = whichs ++ thats ++ whos

group2 = words "does doth doest do"
group3 = words "these this those"
group4 = words "thy thine"

keywords = concat [ group1, group2, group3, group4 ] 

getGroup w 
	| elem w group1 = "1"
	| elem w group2 = "2"
	| elem w group3 = "3"
	| elem w group4 = "4"
	| otherwise = "0"

isKeyword (pos, w) = elem w keywords

analyse (divId, s) = (divId, result)
	where	result = map collate $ groupBy grp $ sortBy srt
				$ filter isKeyword 
				$ zip [1..] $ words s
		srt (_,w) (_,w') = compare w w'

		grp (_,w) (_,w') = w == w'
		

collate (x:xs) = (snd x, map fst (x:xs), length (x:xs))

Results:

 head Lr?.csv 
==> LrF.csv <==
1,2,"do",22
1,2,"does",4
1,2,"doth",4
1,1,"that$3$",32
1,1,"that$6#1$",39
1,1,"that$6#2$",26
1,3,"these",9
1,4,"thine",8
1,3,"this",52
1,3,"those",3

==> LrQ.csv <==
1,2,"do",27
1,2,"does",1
1,2,"doth",7
1,1,"that$3$",33
1,1,"that$6#1$",48
1,1,"that$6#2$",30
1,3,"these",8
1,4,"thine",5
1,3,"this",59
1,3,"those",4

Processing Script:

source ~/.bashrc

cat KingLear1608.txt | tr "(,) " "\n\n\n\n" | rh exp1.hs | grep "," | tee LrQ.csv
cat KingLear1623.txt | tr "(,) " "\n\n\n\n" | rh exp1.hs | grep "," | tee LrF.csv

grep "," Lr*.csv | sed -e 's/.csv//' | tr ":" "," | sed -e 's/Lr/Lr,/' > kinney.csv

Annalysis in R:

a <- read.table('kinney.csv', sep=',', header=F)
colnames(a) <- c('Play', 'Edition', 'Act', "WordGroup", 'Keyword', 'Count')
a$WordGroup <- factor(a$WordGroup)
a$Act <- factor(a$Act)
attach(a)
summary(a)

if (0) { 
x <- addmargins(xtabs(Count~WordGroup+Keyword, data=a, ),2)
x
f <- ftable(x, row.vars=c(3,4)) 
}

for (i in levels(a$WordGroup)){
	b <- droplevels(a[ a$WordGroup==i, ])
	#summary(b)
	#b$Keyword <- factor(b$Keyword)	
	x <- xtabs(Count~WordGroup+Keyword+Edition+Act, data=b)
	x <- addmargins(x,4)
#print(x)
	f <- ftable(x, col.vsrs=c(1,4))
	print(f)
}

Results:

                             Act   1   2   3   4   5 Sum
WordGroup Keyword    Edition                            
1         that$3$    F            32  18  14  13  12  89
                     Q            33  19  13  19  12  96
          that$6#1$  F            39  30  35  26  19 149
                     Q            48  31  35  26  25 165
          that$6#2$  F            26   8  10  23   9  76
                     Q            30   7  12  24  12  85
          which$6#1$ F            23  18  10  12   7  70
                     Q            18  16   9  14   7  64
          which$6#2$ F             1   0   0   2   2   5
                     Q             1   0   0   2   2   5
          who$6#1$   F             8   5  10   6   7  36
                     Q             7   4   8  11   8  38
          who$6#2$   F             5   3   7   4   6  25
                     Q             5   3   6   6   7  27
                          Act   1   2   3   4   5 Sum
WordGroup Keyword Edition                            
2         do      F            22  21  12  32  13 100
                  Q            27  19  13  37  13 109
          does    F             4   2   0   8   2  16
                  Q             1   1   0   5   2   9
          doth    F             4   4   3   1   1  13
                  Q             7   4   3   2   1  17
                          Act   1   2   3   4   5 Sum
WordGroup Keyword Edition                            
3         these   F             9   5   6   8   6  34
                  Q             8   5   5   9   5  32
          this    F            52  43  37  33  37 202
                  Q            59  40  47  38  39 223
          those   F             3   4   1   2   0  10
                  Q             4   4   2   4   2  16
                          Act   1   2   3   4   5 Sum
WordGroup Keyword Edition                            
4         thine   F             8   1   3   6   4  22
                  Q             5   1   2   3   5  16
          thy     F            41  18  28  26  34 147
                  Q            46  16  37  32  33 164

Kinney reports 76 vs 64 occurrences of "which" as a "relative" in Q1 vs F1 -- which we also find here.
But also says 55 are common to both .. ??
How does he do this?
Here we split the plays into acts (as Kinney oes for some of his analysis) and if "which" is common to both editions for a certain act we should take the lowest count number -- however across all acts this does not sum to 55 as Kinney reports.So what other division of the text could he be using?
- Perhaps breaking the play into scenes? rather than just 5 acts ??

Kinney's analysis of alternative words "which" for "that" etc

Clearly to know where "which" is replaced by "that" requires some sort of alignment of the texts.
Hence, in Exp2 we will try to combine our earlier work using DTW to align texts with this function word alignment to reproduce Kinney's results -- automatically.