Skip to content

Vocabulary

Vocabulary()

Class that creates a Vocabulary object that is used to store references to Embeddings.

Source code in /home/docs/checkouts/readthedocs.org/user_builds/rel/envs/latest/lib/python3.7/site-packages/REL/vocabulary.py
24
25
26
27
28
29
30
31
def __init__(self):
    self.word2id = {}
    self.idtoword = {}

    self.id2word = []
    self.counts = []
    self.unk_id = 0
    self.first_run = 0

add_to_vocab(token)

Adds token to vocabulary.

Returns:

  • –
Source code in /home/docs/checkouts/readthedocs.org/user_builds/rel/envs/latest/lib/python3.7/site-packages/REL/vocabulary.py
54
55
56
57
58
59
60
61
62
63
def add_to_vocab(self, token):
    """
    Adds token to vocabulary.

    :return:
    """
    new_id = len(self.id2word)
    self.id2word.append(token)
    self.word2id[token] = new_id
    self.idtoword[new_id] = token

get_id(token)

Normalises token and checks if token in vocab.

Returns:

  • –

    Either reference ID to given token or reference ID to #UNK# token.

Source code in /home/docs/checkouts/readthedocs.org/user_builds/rel/envs/latest/lib/python3.7/site-packages/REL/vocabulary.py
73
74
75
76
77
78
79
80
def get_id(self, token):
    """
    Normalises token and checks if token in vocab.

    :return: Either reference ID to given token or reference ID to #UNK# token.
    """
    tok = Vocabulary.normalize(token)
    return self.word2id.get(tok, self.unk_id)

normalize(token, lower=LOWER, digit_0=DIGIT_0) staticmethod

Normalises token.

Returns:

  • –

    Normalised token

Source code in /home/docs/checkouts/readthedocs.org/user_builds/rel/envs/latest/lib/python3.7/site-packages/REL/vocabulary.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
@staticmethod
def normalize(token, lower=LOWER, digit_0=DIGIT_0):
    """
    Normalises token.

    :return: Normalised token
    """

    if token in [Vocabulary.unk_token, "<s>", "</s>"]:
        return token
    elif token in BRACKETS:
        token = BRACKETS[token]
    else:
        if digit_0:
            token = re.sub("[0-9]", "0", token)

    if lower:
        return token.lower()
    else:
        return token

size()

Checks size vocabulary.

Returns:

  • –

    size vocabulary

Source code in /home/docs/checkouts/readthedocs.org/user_builds/rel/envs/latest/lib/python3.7/site-packages/REL/vocabulary.py
65
66
67
68
69
70
71
def size(self):
    """
    Checks size vocabulary.

    :return: size vocabulary
    """
    return len(self.id2word)