Obviously, there are lots of ways you can organize your code when you implement
your own language data. This guide will focus on how it's done within spaCy. For
full language support, you'll need to create a `Language` subclass, define
custom **language data**, like a stop list and tokenizer exceptions and test the
new tokenizer. Once the language is set up, you can **build the vocabulary**,
including word frequencies, Brown clusters and word vectors. Finally, you can
**train the tagger and parser**, and save the model to a directory.
For some languages, you may also want to develop a solution for lemmatization
and morphological analysis.
- [Language data 101](#language-data)
- [The Language subclass](#language-subclass)
- [Stop words](#stop-words)
- [Tokenizer exceptions](#tokenizer-exceptions)
- [Norm exceptions](#norm-exceptions)
- [Lexical attributes](#lex-attrs)
- [Syntax iterators](#syntax-iterators)
- [Lemmatizer](#lemmatizer)
- [Tag map](#tag-map)
- [Morph rules](#morph-rules)
- [Testing the language](#testing)
- [Training](#training)
## Language data {#language-data}
import LanguageData101 from 'usage/101/\_language-data.md'