Fooling around with German declension

The first posts will be about acquiring some data sets to play around with German language in general. I'm currently learning about German declension, i.e. how to form articles, nouns, and adjectives according to grammatical gender, case, and singular/plural. I want to dig a bit deeper into this topic and look at it from different perspectives using statistics, NLP, and more.

Green blackboard with german nouns and various articles written all over.
Die deutschen Artikel by Jens-Olaf Walter under BY-NC 2.0.
Cropped from original and color effects applied.

Case and gender

German has four grammatical cases: nominative, genitive, dative, and accusative. They are used to form complex sentences without losing information about the arguments. What do I mean with arguments? In the example sentence in English ‘I give you the laptop’, we have the verb ‘give’ (or predicate to be more precise). The predicate has three arguments, namely ‘I’ (subject), ‘you’ (indirect object), and ‘the laptop’ (direct object). In this example, we utilised the word order to signal who gave what to whom. In German, the three arguments are formed using the grammatical cases. The sentence is translated as: ‘Ich gebe dir den Laptop’. We have the predicate ‘gebe’ with three arguments ‘Ich’ (subject), ‘dir’ (indirect object), and ‘den Laptop’ (direct object). These are formed using nominative, dative, and accusative, respectively. For complex sentences, it can be helpful to identify the subject, object, or indirect object. The genitive case is used to mark possession, e.g. ‘Die Bäume des Gartens’. Besides the typical usages, the cases are also applied in many other situations. It's impossible to sum up all the details in one blog post. The cases are outlined in the following table.

Table 1: The four grammatical cases in German with their typical usage and examples.
Case Typical usage Example
Nominative Subject Das Auto ist gelb
Genitive Possession Die Räder des Autos
Dative Indirect object Das Auto gehört mir
Accusative Direct object Ich fahre das Auto

The grammatical cases are of principal importance when building sentences, but you also have to take the grammatical genders into account. To complicate matters, there are not just one, but three: masculine (der), feminine (die), and neuter (das). Besides a few thumb rules, you have to learn the genders of all nouns by heart. However, in some cases, knowing which gender goes with what noun, helps you to parse and understand more complex sentences. The genders are listed in the following table.

Table 2: The three grammatical genders in German and the determined article of the nominative case.
Gender Article Example
Masculine der der Laptop
Feminine die die Suche
Neuter das das Bild

These rules combined forms the principles of declension. For example, we can create the following table of the determined article for all cases and the genders plus plural. It aids in translating ‘the «noun»’, whether singular or plural, and as subject, indirect/direct object, or possessive, to German. You should however beware, that you might need to add a suffix to the noun as well.

Table 3: Determined article for all grammatical cases and genders as well as plural.
Case MasculineM. FeminineF. NeuterN. PluralPl.
Nominative der die das die
Genitive des der des der
Dative dem der dem den
Accusative den die das die

Data sets

The theory is all good, but how does it look in practice. To research the topic of declension we need some data. We need a list of words, word type, and possibly the root of the word or how it is declined.

In the next posts, we will look into some data sets for the German language. I will focus on some open data sets, that I already know. Two that come to mind are Wiktionary and Universal Dependencies.

The next article will look at Universal Dependencies, and what a word really is.