mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 01:48:04 +03:00 
			
		
		
		
	Merge branch 'master' of https://github.com/explosion/spaCy
This commit is contained in:
		
						commit
						51639214a1
					
				
							
								
								
									
										106
									
								
								.github/contributors/danielhers.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										106
									
								
								.github/contributors/danielhers.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,106 @@
 | 
				
			||||||
 | 
					# spaCy contributor agreement
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This spaCy Contributor Agreement (**"SCA"**) is based on the
 | 
				
			||||||
 | 
					[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 | 
				
			||||||
 | 
					The SCA applies to any contribution that you make to any product or project
 | 
				
			||||||
 | 
					managed by us (the **"project"**), and sets out the intellectual property rights
 | 
				
			||||||
 | 
					you grant to us in the contributed materials. The term **"us"** shall mean
 | 
				
			||||||
 | 
					[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 | 
				
			||||||
 | 
					**"you"** shall mean the person or entity identified below.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you agree to be bound by these terms, fill in the information requested
 | 
				
			||||||
 | 
					below and include the filled-in version with your first pull request, under the
 | 
				
			||||||
 | 
					folder [`.github/contributors/`](/.github/contributors/). The name of the file
 | 
				
			||||||
 | 
					should be your GitHub username, with the extension `.md`. For example, the user
 | 
				
			||||||
 | 
					example_user would create the file `.github/contributors/example_user.md`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Read this agreement carefully before signing. These terms and conditions
 | 
				
			||||||
 | 
					constitute a binding legal agreement.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Contributor Agreement
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. The term "contribution" or "contributed materials" means any source code,
 | 
				
			||||||
 | 
					object code, patch, tool, sample, graphic, specification, manual,
 | 
				
			||||||
 | 
					documentation, or any other material posted or submitted by you to the project.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2. With respect to any worldwide copyrights, or copyright applications and
 | 
				
			||||||
 | 
					registrations, in your contribution:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you hereby assign to us joint ownership, and to the extent that such
 | 
				
			||||||
 | 
					    assignment is or becomes invalid, ineffective or unenforceable, you hereby
 | 
				
			||||||
 | 
					    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
 | 
				
			||||||
 | 
					    royalty-free, unrestricted license to exercise all rights under those
 | 
				
			||||||
 | 
					    copyrights. This includes, at our option, the right to sublicense these same
 | 
				
			||||||
 | 
					    rights to third parties through multiple levels of sublicensees or other
 | 
				
			||||||
 | 
					    licensing arrangements;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that each of us can do all things in relation to your
 | 
				
			||||||
 | 
					    contribution as if each of us were the sole owners, and if one of us makes
 | 
				
			||||||
 | 
					    a derivative work of your contribution, the one who makes the derivative
 | 
				
			||||||
 | 
					    work (or has it made will be the sole owner of that derivative work;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that you will not assert any moral rights in your contribution
 | 
				
			||||||
 | 
					    against us, our licensees or transferees;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that we may register a copyright in your contribution and
 | 
				
			||||||
 | 
					    exercise all ownership rights associated with it; and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that neither of us has any duty to consult with, obtain the
 | 
				
			||||||
 | 
					    consent of, pay or render an accounting to the other for any use or
 | 
				
			||||||
 | 
					    distribution of your contribution.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					3. With respect to any patents you own, or that you can license without payment
 | 
				
			||||||
 | 
					to any third party, you hereby grant to us a perpetual, irrevocable,
 | 
				
			||||||
 | 
					non-exclusive, worldwide, no-charge, royalty-free license to:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * make, have made, use, sell, offer to sell, import, and otherwise transfer
 | 
				
			||||||
 | 
					    your contribution in whole or in part, alone or in combination with or
 | 
				
			||||||
 | 
					    included in any product, work or materials arising out of the project to
 | 
				
			||||||
 | 
					    which your contribution was submitted, and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * at our option, to sublicense these same rights to third parties through
 | 
				
			||||||
 | 
					    multiple levels of sublicensees or other licensing arrangements.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					4. Except as set out above, you keep all right, title, and interest in your
 | 
				
			||||||
 | 
					contribution. The rights that you grant to us under these terms are effective
 | 
				
			||||||
 | 
					on the date you first submitted a contribution to us, even if your submission
 | 
				
			||||||
 | 
					took place before the date you sign these terms.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					5. You covenant, represent, warrant and agree that:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * Each contribution that you submit is and shall be an original work of
 | 
				
			||||||
 | 
					    authorship and you can legally grant the rights set out in this SCA;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * to the best of your knowledge, each contribution will not violate any
 | 
				
			||||||
 | 
					    third party's copyrights, trademarks, patents, or other intellectual
 | 
				
			||||||
 | 
					    property rights; and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * each contribution shall be in compliance with U.S. export control laws and
 | 
				
			||||||
 | 
					    other applicable export and import laws. You agree to notify us if you
 | 
				
			||||||
 | 
					    become aware of any circumstance which would make any of the foregoing
 | 
				
			||||||
 | 
					    representations inaccurate in any respect. We may publicly disclose your 
 | 
				
			||||||
 | 
					    participation in the project, including the fact that you have signed the SCA.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					6. This SCA is governed by the laws of the State of California and applicable
 | 
				
			||||||
 | 
					U.S. Federal law. Any choice of law rules will not apply.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					7. Please place an “x” on one of the applicable statement below. Please do NOT
 | 
				
			||||||
 | 
					mark both statements:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * [x] I am signing on behalf of myself as an individual and no other person
 | 
				
			||||||
 | 
					    or entity, including my employer, has or will have rights with respect to my
 | 
				
			||||||
 | 
					    contributions.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * [ ] I am signing on behalf of my employer or a legal entity and I have the
 | 
				
			||||||
 | 
					    actual authority to contractually bind that entity.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Contributor Details
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					| Field                          | Entry                              |
 | 
				
			||||||
 | 
					|------------------------------- | --------------------               |
 | 
				
			||||||
 | 
					| Name                           | Daniel Hershcovich                 |
 | 
				
			||||||
 | 
					| Company name (if applicable)   |                                    |
 | 
				
			||||||
 | 
					| Title or role (if applicable)  |                                    |
 | 
				
			||||||
 | 
					| Date                           | 8 November 2017                    |
 | 
				
			||||||
 | 
					| GitHub username                | danielhers                         |
 | 
				
			||||||
 | 
					| Website (optional)             | www.cs.huji.ac.il/~danielh         |
 | 
				
			||||||
| 
						 | 
					@ -48,7 +48,8 @@ def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
 | 
				
			||||||
    # load the IMBD dataset
 | 
					    # load the IMBD dataset
 | 
				
			||||||
    print("Loading IMDB data...")
 | 
					    print("Loading IMDB data...")
 | 
				
			||||||
    (train_texts, train_cats), (dev_texts, dev_cats) = load_data(limit=n_texts)
 | 
					    (train_texts, train_cats), (dev_texts, dev_cats) = load_data(limit=n_texts)
 | 
				
			||||||
    print("Using %d training examples" % n_texts)
 | 
					    print("Using {} examples ({} training, {} evaluation)"
 | 
				
			||||||
 | 
					          .format(n_texts, len(train_texts), len(dev_texts)))
 | 
				
			||||||
    train_data = list(zip(train_texts,
 | 
					    train_data = list(zip(train_texts,
 | 
				
			||||||
                          [{'cats': cats} for cats in train_cats]))
 | 
					                          [{'cats': cats} for cats in train_cats]))
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -63,7 +63,7 @@ cdef class Tokenizer:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    cpdef Doc tokens_from_list(self, list strings):
 | 
					    cpdef Doc tokens_from_list(self, list strings):
 | 
				
			||||||
        util.deprecated(
 | 
					        util.deprecated(
 | 
				
			||||||
            "Tokenizer.from_from list is now deprecated. Create a new Doc "
 | 
					            "Tokenizer.from_list is now deprecated. Create a new Doc "
 | 
				
			||||||
            "object instead and pass in the strings as the `words` keyword "
 | 
					            "object instead and pass in the strings as the `words` keyword "
 | 
				
			||||||
            "argument, for example:\nfrom spacy.tokens import Doc\n"
 | 
					            "argument, for example:\nfrom spacy.tokens import Doc\n"
 | 
				
			||||||
            "doc = Doc(nlp.vocab, words=[...])")
 | 
					            "doc = Doc(nlp.vocab, words=[...])")
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -188,6 +188,38 @@ p
 | 
				
			||||||
        +cell int
 | 
					        +cell int
 | 
				
			||||||
        +cell The row the vector was added to.
 | 
					        +cell The row the vector was added to.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					+h(2, "resize") Vectors.resize
 | 
				
			||||||
 | 
					    +tag method
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					p
 | 
				
			||||||
 | 
					    |  Resize the underlying vectors array. If #[code inplace=True], the memory
 | 
				
			||||||
 | 
					    |  is reallocated. This may cause other references to the data to become
 | 
				
			||||||
 | 
					    |  invalid, so only use #[code inplace=True] if you're sure that's what you
 | 
				
			||||||
 | 
					    |  want. If the number of vectors is reduced, keys mapped to rows that have
 | 
				
			||||||
 | 
					    |  been deleted are removed. These removed items are returned as a list of
 | 
				
			||||||
 | 
					    |  #[code (key, row)] tuples.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					+aside-code("Example").
 | 
				
			||||||
 | 
					    removed = nlp.vocab.vectors.resize((10000, 300))
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					+table(["Name", "Type", "Description"])
 | 
				
			||||||
 | 
					    +row
 | 
				
			||||||
 | 
					        +cell #[code shape]
 | 
				
			||||||
 | 
					        +cell tuple
 | 
				
			||||||
 | 
					        +cell
 | 
				
			||||||
 | 
					            |  A #[code (rows, dims)] tuple describing the number of rows and
 | 
				
			||||||
 | 
					            |  dimensions.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    +row
 | 
				
			||||||
 | 
					        +cell #[code inplace]
 | 
				
			||||||
 | 
					        +cell bool
 | 
				
			||||||
 | 
					        +cell Reallocate the memory.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    +row("foot")
 | 
				
			||||||
 | 
					        +cell returns
 | 
				
			||||||
 | 
					        +cell list
 | 
				
			||||||
 | 
					        +cell The removed items as a list of #[code (key, row)] tuples.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
+h(2, "keys") Vectors.keys
 | 
					+h(2, "keys") Vectors.keys
 | 
				
			||||||
    +tag method
 | 
					    +tag method
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -52,6 +52,7 @@
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    "MODEL_META": {
 | 
					    "MODEL_META": {
 | 
				
			||||||
        "core": "Vocabulary, syntax, entities, vectors",
 | 
					        "core": "Vocabulary, syntax, entities, vectors",
 | 
				
			||||||
 | 
					        "core_sm": "Vocabulary, syntax, entities",
 | 
				
			||||||
        "dep": "Vocabulary, syntax",
 | 
					        "dep": "Vocabulary, syntax",
 | 
				
			||||||
        "ent": "Named entities",
 | 
					        "ent": "Named entities",
 | 
				
			||||||
        "vectors": "Word vectors",
 | 
					        "vectors": "Word vectors",
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -10,9 +10,10 @@ p
 | 
				
			||||||
    for models, lang in MODELS
 | 
					    for models, lang in MODELS
 | 
				
			||||||
        for model, i in models
 | 
					        for model, i in models
 | 
				
			||||||
            - var comps = getModelComponents(model)
 | 
					            - var comps = getModelComponents(model)
 | 
				
			||||||
 | 
					            - var type = comps.size == "sm" && comps.type == "core" ? "core_sm" : comps.type
 | 
				
			||||||
            +row
 | 
					            +row
 | 
				
			||||||
                +cell #[+a("/models/" + lang + "#" + model) #[code=model]]
 | 
					                +cell #[+a("/models/" + lang + "#" + model) #[code=model]]
 | 
				
			||||||
                    if i == 0
 | 
					                    if i == 0
 | 
				
			||||||
                        +icon("star", 16).o-icon--inline.u-color-theme
 | 
					                        +icon("star", 16).o-icon--inline.u-color-theme
 | 
				
			||||||
                +cell #{LANGUAGES[comps.lang]}
 | 
					                +cell #{LANGUAGES[comps.lang]}
 | 
				
			||||||
                +cell #{MODEL_META[comps.type]}
 | 
					                +cell #{MODEL_META[type]}
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -10,10 +10,12 @@ p
 | 
				
			||||||
        +cell #[+api("cli#download") #[code cli.download]]
 | 
					        +cell #[+api("cli#download") #[code cli.download]]
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    +row
 | 
					    +row
 | 
				
			||||||
        +cell
 | 
					        +cell #[code spacy.en] etc.
 | 
				
			||||||
            |  #[code spacy.en] etc.
 | 
					        +cell #[code spacy.lang.en] etc.
 | 
				
			||||||
        +cell
 | 
					
 | 
				
			||||||
            |  #[code spacy.lang.en] etc.
 | 
					    +row
 | 
				
			||||||
 | 
					        +cell #[code spacy.en.word_sets]
 | 
				
			||||||
 | 
					        +cell #[code spacy.lang.en.stop_words]
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    +row
 | 
					    +row
 | 
				
			||||||
        +cell #[code spacy.orth]
 | 
					        +cell #[code spacy.orth]
 | 
				
			||||||
| 
						 | 
					@ -43,6 +45,10 @@ p
 | 
				
			||||||
        +cell #[code Language.create_make_doc]
 | 
					        +cell #[code Language.create_make_doc]
 | 
				
			||||||
        +cell #[+api("language#attributes") #[code Language.tokenizer]]
 | 
					        +cell #[+api("language#attributes") #[code Language.tokenizer]]
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    +row
 | 
				
			||||||
 | 
					        +cell #[code Vocab.resize_vectors]
 | 
				
			||||||
 | 
					        +cell #[+api("vectors#resize") #[code Vectors.resize]]
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    +row
 | 
					    +row
 | 
				
			||||||
        +cell
 | 
					        +cell
 | 
				
			||||||
            |  #[code Vocab.load]
 | 
					            |  #[code Vocab.load]
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in New Issue
	
	Block a user