Making sense of senses: filtering data
This post is not updated for v2, please see migration guide.
The Oxford Dictionaries API makes it possible to extract a list of words that you can use to verify an input into your app. Is it a ‘real’ word? Is it too long or too short? Does it break any rules you may have about, for example, offensiveness?
The Wordlist endpoint is here to help and this guide will show you how to best make use of its powerful sense filters to get the data you need.
Let’s take the offensiveness case as an example. The Oxford Dictionaries team gets a lot of requests from developers asking: ‘Can I get a list of all the words in English for my word game minus the offensive ones?’
Easy, right? Not quite.
Offensiveness is an odd beast. As well as being hugely subjective, many words in the English language live multiple lives. Take the not-so-offensive (but suitable for a blog post) example ass. Calling the Entries endpoint gives you both of its main senses plus its sub-senses, and whilst the first is unlikely to cause offence, the second might raise an eyebrow or two.
Most developers would likely permit the use of the term ass in their word game based on its first sense, not the second, so to enable developers to define the list appropriately the Wordlist endpoint has a three ways of applying filters to senses:
- Exclude: Removes headwords if any of their senses match the filter parameter.
- Exclude_senses: Removes only the senses that match the filter parameter.
- Exclude_prime_senses: Removes headwords if their primary sense matches the filter parameter.
Using the example of ass, let’s create a call to the API that looks for English nouns beginning with ‘as’ with the exclude parameter set to registers=vulgar slang. The call looks like this:
Looking through the list of words in alphabetical order you will see that ass doesn’t appear (between ‘asquith, herbert henry’ and ‘assad, hafiz-al’). This is because the API spotted the vulgar slang sense and omitted the headword completely from the list. However, if we change the filter to exclude_senses, ass is included. This is because at least one of its senses is not tagged as vulgar slang. The call looks like this:
Exclude_senses is therefore a powerful filter option in the Wordlist endpoint to decide how words are omitted from your list depending on how the different senses of a word are tagged. This is possible not just for registers; you can also exclude senses by lexical category (noun, verb, adjective, etc.), domains (rugby, psychology, zoology, etc.) and grammatical features (masculine, feminine, collective, etc.) and you can use the Utility endpoints to explore the datasets and find out which tags are available in each of these categories in all of the languages we have available.
As always, we’re here to help if you have a question about how to get what you need from the API. Just submit a comment in the section below, or send us an email at [email protected].
- The opinions and other information contained in OxfordWords blog posts and comments do not necessarily reflect the opinions or positions of Oxford University Press.