2
Is there anything that makes training a translation task easy?
(lemmy.dbzer0.com)
Welcome to Free Open-Source Artificial Intelligence!
We are a community dedicated to forwarding the availability and access to:
Free Open Source Artificial Intelligence (F.O.S.A.I.)
Thanks, the quickstart guide was straightforward to follow. Do you have any suggestions on how to do word splitting with code, if any? For example, on a test run, I found that the model was not able to synthesize unique constants correctly even though this test run consisted only of obvious "a to b" relationships.
If you’re working with a well known language, then you can probably use NLTK to tokenize your words. Word2vec is also helpful if you want a word embedding approach. https://github.com/nltk/nltk
Thanks for the tips. After doing a bunch of searching, I found that what I needed was BPE, or byte-pair encoding. This allows the token set to contain sub-word sequences, which lets the tokenizer represent a unique constant like
0x0373
as['__sow', '0x', '03', '73', '__eow']
.