Number spaCy

Enhancing Numeric Entity Recognition in Text with spaCy

Number spaCy is a custom spaCy pipeline component that enhances the identification of number entities in text and fetches the parsed numeric values using spaCy’s token extensions. It uses RegEx to identify number entities written in words and then leverages the word2number library to convert those words into structured numeric data. The output numeric value is stored in a custom entity extension: ._.number. This lightweight component can be seamlessly added to an existing spaCy pipeline or integrated into a blank model. If using within an existing spaCy pipeline, ensure to insert it before the NER model.

Example
import spacy
from number_spacy import find_numbers

nlp = spacy.blank('en')
nlp.add_pipe('find_numbers')

doc = nlp('I have three apples. She gave me twenty-two more, and now I have twenty-five apples in total.')

for ent in doc.ents:
    if ent.label_ == 'NUMBER':
        print(f'Text: {ent.text} -> Parsed Number: {ent._.number}')

Author info

W.J.B. Mattingly

GitHubwjbmattingly/number-spacy

Categories pipeline

Found a mistake or something isn't working?

If you've come across a universe project that isn't working or is incompatible with the reported spaCy version, let us know by opening a discussion thread.

Submit your project

If you have a project that you want the spaCy community to make use of, you can suggest it by submitting a pull request to the spaCy website repository. The Universe database is open-source and collected in a simple JSON file. For more details on the formats and available fields, see the documentation. Looking for inspiration your own spaCy plugin or extension? Check out the project idea section in Discussions.

Read the docs JSON source