From a2525c76ee3fd0500aa34593103dc70b2068daab Mon Sep 17 00:00:00 2001
From: Ines Montani <ines@ines.io>
Date: Mon, 19 Dec 2016 17:18:38 +0100
Subject: [PATCH] Reformat word frequencies section in "adding languages"
 workflow

---
 website/docs/usage/adding-languages.jade | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/website/docs/usage/adding-languages.jade b/website/docs/usage/adding-languages.jade
index 1d7570844..605395c3b 100644
--- a/website/docs/usage/adding-languages.jade
+++ b/website/docs/usage/adding-languages.jade
@@ -424,16 +424,22 @@ p
 +h(3, "word-frequencies") Word frequencies
 
 p
-    |  The #[code init.py] script expects a tab-separated word frequencies file
-    |  with three columns: the number of times the word occurred in your language
-    |  sample, the number of distinct documents the word occurred in, and the
-    |  word itself.  You should make sure you use the spaCy tokenizer for your
+    |  The #[+src(gh("spacy-dev-resources", "training/init.py")) init.py]
+    |  script expects a tab-separated word frequencies file with three columns:
+
++list("numbers")
+    +item The number of times the word occurred in your language sample.
+    +item The number of distinct documents the word occurred in.
+    +item The word itself.
+
+p
+    |  You should make sure you use the spaCy tokenizer for your
     |  language to segment the text for your word frequencies. This will ensure
     |  that the frequencies refer to the same segmentation standards you'll be
-    |  using at run-time. For instance, spaCy's English tokenizer segments "can't"
-    |  into two tokens. If we segmented the text by whitespace to produce the
-    |  frequency counts, we'll have incorrect frequency counts for the tokens
-    |  "ca" and "n't".
+    |  using at run-time. For instance, spaCy's English tokenizer segments
+    |  "can't" into two tokens. If we segmented the text by whitespace to
+    |  produce the frequency counts, we'll have incorrect frequency counts for
+    |  the tokens "ca" and "n't".
 
 +h(3, "brown-clusters") Training the Brown clusters