When people wanted to study language, they used to have two options. They could use a computer to analyze a large body of formal writing, like newspapers. Or they could go out and interview a bunch of people.
Now Twitter offers something fresh for researchers interested in the evolution of language: massive amounts of informal written communication.
Scientists at Carnegie Mellon University are demonstrating the potential of that data in a study, which has found that regional dialects are thriving on Twitter. In fact, local slang seems to be evolving within the social-media site.
Some of the findings confirm what you’d expect: People in Northern California tend to say “hella,” for example, to mean “very.” But there are other oddities, like “something.” If you live in New York City, you’re likely to write “suttin,” while people in many other cities tend to write “sumthin.” Northern Californians say “koo” for “cool” in their tweets, while Southern Californians favor “coo.” And people in cities seem more likely than people in rural areas to abbreviate “you” as “yu.” New Yorkers have a coinage all their own: “uu.”
Profanity inspires a whole subgenre of regional Twitterisms. Take the many ways of expressing amusement. “LMAO” means “laughing my a** off”—that’s national, of course. But people around Washington, D.C., where this blog is based, seem to favor “LLS,” for “laughing like sh*t.”
And folks in Philadelphia, Pittsburgh, and Cleveland have a penchant for yet another abbreviation: “CTFU,” meaning “cracking the f**k up.”
“It’s not really used anywhere else,” explains Jacob Eisenstein, a postdoctoral fellow in Carnegie Mellon’s Machine Learning Department.
“There are big regional differences in social media,” he says. “And some of these correspond to things that we know about from spoken language. But other things seem to be completely organic to social media.”
People who use Twitter on their smartphones can geotag their tweets with GPS coordinates. To conduct their study, Mr. Eisenstein and his co-authors culled a week’s worth of Twitter messages published last March. They narrowed those down to geotagged messages from users who posted at least 20 Tweets. The result was a database of 330,000 tweets and 9,500 users.
The researchers wrote a computer program to find patterns in all that text. The technique they developed could predict the whereabouts of a Twitter user in the United States with a median error of roughly 300 miles.
Asked for a reaction to the study, Geoffrey Nunberg, an expert on linguistic technologies, sent an e-mail to Wired Campus saying that the research “seems technically and methodologically impressive.”
“But the findings are less than epochal,” says Mr. Nunberg, an adjunct professor at the School of Information at the University of California at Berkeley. “‘Slang may depend on geography more than standard English does’—well, we’ve sort of known that for a while. But I do think that the availability of these huge corpora of tweets and text messages could produce some interesting results in the future.”
Co-authors of the Carnegie Mellon report are Eric P. Xing, an associate professor of machine learning; Noah A. Smith, an assistant professor in the Language Technologies Institute; and Brendan O’Connor, a machine-learning graduate student.
Wired Campus would love to learn about other ways that technology is changing the study of language. If you know of any interesting new research, drop us a note in the comments below.





11 Responses to Suttin Hella Koo, Y’all: Regional Dialects Thrive on Twitter
22228715 - January 14, 2011 at 7:25 am
What’s interesting about slang depending upon geography is not new, but it’s different. Thirty years ago, slang use was more dependent upon both sender and receiver begin physically close (even by phone, there was an extra charge for long distance calls, so you made them less frequently and kept them short.) It’s really interesting here that either communication is still very, very regionalized, or regional differences are so hardy that they don’t spread much.
And what’s up with Cleveland, Pittsburgh and PHILLY? The first two are geographically close, and although people from each are hyper-aware of small dialect differences (soda/pop, wash) they are similar enough that pairing them together is not surprising. But how did Philadelphia get in that mix?
drjeff - January 14, 2011 at 8:34 am
I find it very interesting that 300 miles is pretty much the same precision that obtains with spoken regional dialects: a person knowledgeable and skilled can usually place a speaker who has lived in the same area all their life within roughly that same distance.
Fascinating, because the clues are completely different: here, it’s word choice, word placement, and spelling. Spoken, it’s relying primarily on phoneme production, and secondarily on word choice.
But, the findings here are not really so surprising: except for celebrities, the vast majority of people on Twitter follow people whom they have spent time with personally, so it’s predictable that there should be some significant regional bias, no? And even with celebrities, many (newspaper, radio, politicians) have only regional fame.
cb_10 - January 14, 2011 at 9:26 am
While the information about dialects is interesting, the most dismaying thing about Twitter is the over-reliance on slang, net-speak, and abbreviations.
Slang certainly has a place and I use plenty of it in life, but many of these differences are manufactured ones. Everyone likes to think they’re the cool kids (or “koo kids” or “coo kids” if you’re in California).
Twitter can be a challenging and beneficial writing experience, forcing Tweeters to learn how to use economical language in creative ways. However, all too many rely on these kinds of shortcuts that tend to divide people and obscure communication, rather than enhance it.
11179102 - January 14, 2011 at 9:45 am
I once heard Mario Cuomo (when he was serving as Governor of New York) speak at university in North Carolina. In the Q&A, a student asked him a question. Cuomo paused and before answering the question said, “Buffalo, no doubt.” The student smiled and said that she was indeed from Buffalo.
With a smile, Cuomo replied “I know my people.”
awegweiser - January 16, 2011 at 11:40 am
I do not use Twitter and never plan to. It seems as absurd as its name and bastardizes the language.
People are becoming more illiterate by avoiding character consuming real words and even more than one syllable.
And just why is there a limit to 140 (or whatever it is) characters?
I don’t text, either. Voice phone or eMail will do me just fine.
filidh11 - January 18, 2011 at 2:45 pm
I’m a high school history teacher. When I saw the phrase “Suttin Hella Koo,” my first thought was that it must have SOMETHING to do with Sutton Hoo, a 6th century Anglo-Saxon burial site. I was overjoyed that people were Tweeting about important archaeological discoveries, even if the names of those discoveries were misspelled. Sadly, it turned out to be something much more prosaic. My small flame of hope has been extinguished by the butler of mediocrity holding the snuffer of disappointment. I’ll go back to reading my books now.
ardent1 - January 20, 2011 at 5:00 pm
I love New Yorkers: uu = youse?
arrive2__net - January 22, 2011 at 10:06 pm
I think Twitter’s 140 character messages are close to being the text equivalent of a ‘sound-byte’. Telegrams also invited extreme abbreviation and brevity, yet English survived that. Like any other form of human communication, Tweets are diverse in content and message, some are profound and deep like quotes from Buddha or Churchill, others humorous, trivial, silly, or incomprehensible. Tweets are often invitations to some other longer discussion or presentation of an idea. I’m surprised there is room in a Tweet’s 140 characters for such distinct regional variations, but there you are…
Bernard Schuster
Arrive2.net
Twitter.com/arrive2_net