Glossary of terms used in the Shakespeare's Early Editions project

Bayesian Probability A form of probability which calculates the likelihood of an occurence based on the confidence a person has in that occurrence coming true, rather than by the frequency that an occurrence will happen in multiple instances of the same situation i.e one occurrence in ten. The advantage of Bayesian probability is that it allows for the adjustment of a person's intial beliefs should new evidence appear which alters their perceived likelihood of an occurrence.

Bad Quartos A term devised by bibliographer A. W. Pollard which refers to certain Shakespeare publications, which Pollard asserts were not printed from authoritative manuscripts, and are thus inaccurate, or rife with corruption. These include the first quartos of Romeo and Juliet, Henry V, and The Merry Wives of Windsor.

Bitmap Generally used to display images on a computer or other screen. A bitmap is a quadrilateral arrangement of pixels, each capable of generating an RGB (see RGB) value, which together produce an image.

Command Line Interface A type of user interface that allows the user to operate an electronic device by issuing successive lines of text-based commands.

CTS Centre for Textual Studies. Established at De Montfort University, the CTS focuses on scholarly research in the fields of textual studies (see Textual Studies).

diff A tool used in computing to compare pieces of data. Diff calculates and displays differences between two files, and is typically used with two versions of the same file, focusing on lines of data, rather than individual characters. Diff utilities will generally then present any insertions and/or deletions that that would have to be made to one file to make it identical to the other.

Dynamic Time Warping An algorithm which aims to find an optimal alignment, or match, between two sequences of data. It is useful as it can make allowances for insertions and/or deletions that represent the differences between the sequences, and still offer the best match possible. It also records the 'cost', or number of changes required, to align the two data sequences.

Folio A format of large book, most commonly contrasted with the smaller quarto format. A folio is made by folding a sheet of paper once along its widest axis to give two leaves and four pages. The most famous example of this format is Shakespeare's First Folio, a posthumous collection of 36 of his plays, printed in 1623.

Function word A word whose purpose is to contribute towards a sentence's structure, rather than to its meaning. These include prepositions (on, in, after), pronouns (I, you, he, she, it), auxiliary verbs (be, do, have), conjunctions (and, if, but).

Graphical User Interface (GUI) A type of user interface that allows the user to operate an electronic device via graphical icons visually represented on the device's screen. This form of user interface is found in the vast majority of commonly used electronic devices.

Haskell A purely functional programming language which allows for the user to carry out complex computational work with a relatively small amount of code.

Heuristic Functions Often consisting of repeated operations that 'home in' on the desired result without ever reaching it, heuristic functions are used to produce solutions that are 'good enough' for a particular problem. This is of use when dealing with an incomplete data set, or when speed is more important than accuracy.

Homograph Two or more words with the same spelling, but different meanings. They may not necessarily be pronounced the same way as each other i.e. row (boat), row (argue).

Homophone Two or more words with the same pronunciation, but different meanings i.e. there, their, they're.

HTML Hypertext Markup Language, a common markup language (see Markup Language) used in the creation of web pages and applications.

Markov Chain A mathematical model which maps out potential states, and presents the probability of one state transitioning to another. For instance, a markov chain of weather may use two states: sun and rain, and map out the probability of one state transitioning to the other, or remaining the same.

Markup Language A language which allows the addition of annotations, or tags, to a particular document. These tags may describe the semantic value of the content they enclose and/or instruct the software presenting the document to carry out some form of action. For instance, my adding 'speaker' tags to a play, software can then be used to pick out the tagged speakers, which can allow a user to manipulate and examine the play in ways that would not be practical if they had to do it by hand.

miniDom Type of DOM (Document Object Model) used in Python (see Python) that presents HTML and XML documents in a logical manner so that the coder can use their particular programming language (in this case Python) to edit the structure of the document. Displays the document in a tree diagram, consisting of multiple nodes (see Nodes).

Nodes Parts of a DOM that describe the different parts that make up a particular document.

Orthography Encompasses the rules of a particular written language. This includes spelling, rules of capitalization, and punctuation that affect the presentation of language but not the syntax and grammar that embody its fundamental organization.

Python A general purpose programming language. It can be used for a wide variety of different purposes, from data analysis, to website development.

Quarto A format of small book most commonly contrasted with the larger folio format. A Quarto is a book or pamphlet in which each sheet of paper is folded to give it four leaves, and hence eight pages. During the Elizabethan period (mid-16th- early 17th century), plays were often published in this format. Amongst these, were eighteen of the 36 plays we have of William Shakespeare's (which were republished along with eighteen others, in his First Folio).

RGB A method of digital encoding colours, from the three primary colours red, green, and blue (hence RGB) for which the human eye has receptors in the retina. Each primary colour can take a decimal value from 0 - 255 (each colour uses 8 bits), and by varying the values of each RGB colour, any colour can be created. These three values are responsible for determining the output colour of pixels on a computer screen, which when blocked together in a bitmap (see Bitmap), produce the images we see every time we use the computer.

TEI Text Encoding Initiative An international voluntary organization that aims to establish a standard for the way in which text is represented digitally.

Text Alignment A technique for preparing texts in such a way as their differences, and similarities, can be detected and mapped.

Textual Studies Scholastic study into the ways in which texts are created, edited, transcribed or reproduced. Generally focused on tracing the history of text production and manipulation.

Typography Concerns the style and presentation of printed letters in a particular language. This includes typeface, font size, line spacing etc. does not concern itself with what the letters represent within the language.

Vim A text editor that can be used via command line (see Command Line Interface), or graphical user, interface (see GUI).

Vimdiff Form of diff utility (See diff) packaged with Vim.

XML An acronym for eXtensible Markup Markup language (see Markup Language). It provides the rules for encoding documents that allow it to be read by both machines, and humans. It is useful as it allows the creation of original tags, whose purpose is designed by the user, rather than just offering a set collection of pre-designed tags.

XPath XML Path Language. Language most commonly used for selecting, and navigating, nodes in an XML document.