Skip to main content
Intellect

Exposing quirks of English usage attracts worldwide following for BYU linguist

Wicked. Evil. Foul. Bad. Those words mean essentially the same thing, but we don't talk about "wicked weather," "foul witches" or the "forces of bad." Understanding such subtle differences in usage comes naturally because our brains remember the millions of words we have processed over our lifetimes and which ones go together. But people learning English don't have that repository. So, Mark Davies is volunteering to be their English brains.

The Brigham Young University linguist loves words so much that he sorts through them for days – tens of millions at a time. He has built a searchable Web site that can spit out exactly how often any word appears in English usage, along with words that most often accompany it and many other factors. Want to know which word is most commonly associated with "slippery?" Davies can tell you in less than a second. (You guessed it -- "slope.") He can also break down usage by genre – the most commonly used adjective in British tabloids turns out to be "boozy."

With no fanfare, his site, view.byu.edu, has spread to thousands of users in 83 countries, many of them teaching or learning English. Other users include a sitcom writer looking for new puns, a psychiatrist at Columbia University's medical school who is developing cognitive tests for Alzheimer's patients, and a ("boozy"?) regular at a British pub seeking material for homemade trivia contests.

"The site is sufficiently sophisticated for us egghead academics, but also easy enough for language learners and others who just think language is fun," Davies explained, counting faculty at Stanford, Michigan and Swarthmore and dozens of top international universities among regular visitors.

The "VIEW" in view.byu.edu stands for Variation In English Words and phrases, and the site uses as its database the 100-million-word British National Corpus. Davies is among a rare breed who loves to gather millions of words of written and spoken communication and catalog them into a collection called a "corpus." In addition to building an interface for the material provided by the University of Oxford authors of the British corpus, Davies has built his own corpus for Spanish and is putting the finishing touches on his Portuguese version. Those two projects were funded by grants from the National Endowment for the Humanities totaling more than $300,000.

It's not simply a matter of dumping words into a database. "You want the corpus to represent the range of types of usage, so you need to first determine that you want a certain percentage from newspapers, a certain portion from books, another portion from speeches, and so on," Davis said. "And then within books, you balance that between fiction and nonfiction, and then within those, between westerns and romance and engineering and history, for example."

The entries then must be tagged as particular parts of speech and organized in an architecture and interface that allows them to be accessed easily. That's Davies' specialty and the reason that he was given access to the British National Corpus. He's already on tap to build the interface for the first American National Corpus, currently under construction. And he's building himself the largest historical corpus of English (the British entries are all post-1970), which will include a quarter-billion words produced from 1500-1900. That project will enable study of how usage and meaning of words has changed over time.

"Imagine a word like 'market,'" Davies said. "At one point it would be most commonly associated with words like 'pig' or 'corn.' Now it would be more commonly found with 'stock' or 'international.'"

Although the self-effacing linguist believes most folks would rather use his tool to play around with searches on their names and favorite words, he does use it for rigorous linguistic research, making three to four scholarly presentations a year and publishing a similar amount of scholarly papers. Recently he teamed with departmental colleague Dee Gardner to publish a study of phrasal verbs – those combinations like "burned down" and "put up" that come naturally to native speakers but drive learners down a wall.

Davies also recently completed a dictionary containing the 5,000 most commonly used Spanish words, in order. "If you're learning Spanish you don't want to start just picking up words willy-nilly you want to start with the most frequent ones," he said. The same publisher has him working on a similar "frequency dictionary" for Portuguese.

Understanding frequency turns out to be helpful for other reasons. A company that develops predictive text interfaces for cell phones and devices for the disabled came to Davies for his Spanish corpus, because knowing which words are most commonly used helps its software more accurately "guess" which word is being entered. Attorneys from a Fortune 500 company used Davies' tool to prove their client's product was being confused with another, more commonly used term at issue in a lawsuit.

Davies isn't surprised by the growth in popularity and utility of his site.

"There's a real need for non-native English speakers who want to know how English is authentically used," he said. "And for native speakers, it's just fun to get on there and immerse ourselves in this wonderful data."

Read More From

Related Articles

data-content-type="article"

The sail before the trail: BYU Library resource documents Latter-day Saint pioneers at sea

July 22, 2024
Discover the remarkable stories of nearly 90,000 Latter-day Saint pioneers' ocean voyages to America, meticulously preserved by BYU's Saints by Sea database.
overrideBackgroundColorOrImage= overrideTextColor= overrideTextAlignment= overrideCardHideSection=false overrideCardHideByline=false overrideCardHideDescription=false overridebuttonBgColor= overrideButtonText= overrideTextAlignment=
data-content-type="article"

BYU researchers play central role in state's approval of drought-resistant grass in Utah

July 17, 2024
In the midst of a sweltering heat wave, the state of Utah this week approved a type of grass that will have a critical impact on future water conservation — and a couple of BYU professors (and their students) have been a key part in making it happen.
overrideBackgroundColorOrImage= overrideTextColor= overrideTextAlignment= overrideCardHideSection=false overrideCardHideByline=false overrideCardHideDescription=false overridebuttonBgColor= overrideButtonText= overrideTextAlignment=
data-content-type="article"

It's not rocket science... it's rocket engineering: BYU's Rocketry Team wins big again

July 11, 2024
The BYU Rocketry Team and their Utah-inspired rocket named “Alta” got on the podium three times, earning two first prizes and a second-place finish at the 2024 Spaceport America Cup.
overrideBackgroundColorOrImage= overrideTextColor= overrideTextAlignment= overrideCardHideSection=false overrideCardHideByline=false overrideCardHideDescription=false overridebuttonBgColor= overrideButtonText= overrideTextAlignment=
overrideBackgroundColorOrImage= overrideTextColor= overrideTextAlignment= overrideCardHideSection=false overrideCardHideByline=false overrideCardHideDescription=false overridebuttonBgColor= overrideButtonText=