Skip to content

Gender and writing stylometry

A note on writing style and vocabulary

Gender appears to be reified through writing style as well as handwriting, according to preliminary statistical analysis.

Recent attempts at creating algorithms that can determine a personā€™s gender by their writing style have produced some fairly accurate systems (Koppel 2003, Argamon 2003). They were able to guess with 83% accuracy based on a large sample of texts run through their algorithm. Generally speaking, the algorithm assumed men talk more about objects, and women more about relationships. Women tend to use more pronouns (I, you, she, their, myself), and men prefer words that identify or determine nouns (a, the, that) and words that quantify them (one, two, more). See the link in the reference section for the methodology involved.

David Lodge, whose early novel The Picture Goers was among the one out of five texts misgendered by the original algorithm, noted:

ā€œNovels are very problematic texts because they are written in a medley of styles. And more often than not the author is trying to imitate some kind of imagined consciousness Ā­ male or female. Indeed, writers have always tried to imitate the distinctive characteristics of male and female discourse and we are in the habit of thinking that they have often succeeded. But perhaps these scientists believe they can prove this is an illusion. Still, Iā€™m very surprised that this program is able to discern the gender of the real author. If you were to take ordinary first-person texts Ā­ letters or diaries Ā­ then you might, of course, expect a fairly high degree of accuracy. But that it can be done on literary novels intrigues me. This will have fascinating literary, critical and general sociological implications. That said, Iā€™d like to see them apply it to a novelistā€™s attempt to imitate the opposite sex in a particular passage.ā€ (McGrath 2003)

Some resourceful nerds at bookblog.net created a cruder version of the algorithm used on novels and called it the Gender Genie. Itā€™s available online for text analysis.

Elf Sternberg writes:

http://www.drizzle.com/~elf/

The Gender Genie algorithm, which first appeared in the NY Timesā€™ ā€œscienceā€ section, is a poor popularization of the algorithm as it appeared in the original academic literature. I have the original paper and that algorithm is meant to be applied to fiction; applied to non-fiction, the authors admit, the algorithm is no better than random chance at detecting an authorā€™s gender. A much better alogrithm, the one that has an ā€œ80%ā€ chance of detecting authorā€™s gender correctly, needs to be taught on a large sample to generate a massive statistical measure of male vs. female characteristics in text. Even applied to fiction, the popular algorithm is not much better. It seems to think Iā€™m a woman, at least 97% of the time.

Boo blog used to have Gender Genie, a simplified (and less scientific) version of the algorithm used by Koppel:

http://www.bookblog.net/gender/genie.html

The Gender GenieĀ statistics pageĀ indicates it only gets about 3 in 5 right, where Koppelā€™s original got 4 in 5 right based on multivariate analysis on a large sample of texts. Like other ā€œgender tests,ā€ this is not scientifically rigorous and should not be taken very seriously.

Next: Resources and references

Handwriting and gender