ESL Cloze Generator

Brendon Albertson

unread,

Mar 21, 2021, 2:02:55 PM3/21/21

to tatoeba

Hi everyone,

I'm posting a link to a cloze (fill-in-the-blank) generator app I made with the Tatoeba English corpus (will try to add other languages soon).

Here's the app

Here's a github

Basically, you input a list of words and it spits out sentences with those words missing, along with a word bank.

If anyone would like to collaborate, I'm happy to share ideas. I'd also appreciate any feedback. I think Tatoeba has huge potential for apps in the field of TESOL (teaching English to speakers of other languages).

Thanks for taking a look!

-Brendon

Gilles Bedel

unread,

Mar 22, 2021, 8:35:16 PM3/22/21

to tatoeba...@googlegroups.com, Brendon Albertson

Hi Brendon,

Thanks for sharing your app with us. I am glad that Tatoeba is useful
to you. Your apps looks simple and effective. I like it. And I see you
added Japanese already. I imagine it mostly works but without proper
tokenization I get unwanted results. For example, asking for clozes
using the word かな returns sentences such as にぎや __ 町です。 We have
the same problem in Tatoeba search though; languages without word
boundaries are not easy to deal with.

I have a little suggestion. Your app includes a button to "get
printable worksheet". I assume your intention is that the button
removes the form so that you only get to print the sentences and
directions. But the unwanted "Back" button remains on the printed page.
Instead of hiding things with Javascript, I suggest you to include a
print style sheet. This allows to apply a different CSS only for
printing. Here is a random tutorial:

https://www.makeuseof.com/format-web-page-for-printer/

— gillux

CK @ Tatoeba Project

unread,

Mar 22, 2021, 8:44:38 PM3/22/21

to tatoeba...@googlegroups.com, Brendon Albertson

For Japanese, I wonder if you could allow the Japanese comma for input.

This didn't work, which is the natural way it would be typed in Japanese.

を、が、は、に

This did, but is more difficult to type.

を, が, は, に

--
You received this message because you are subscribed to the Google Groups "tatoeba" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tatoebaprojec...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tatoebaproject/20210323013844.4234f49288f0810c0d3a2aa0%40free.fr.

Brendon Albertson

unread,

Mar 23, 2021, 9:10:57 AM3/23/21

to tatoeba

CK and gillux, thanks so much for taking a look. I'm really grateful for your feedback.

I will try a CSS print style sheet as you mentioned.

Also, for segmentation, I've been looking into using TinySegmenter. This seems to segment out particles well, so your example with を, が, は, に would hopefully work. Then it could be a nice way to practice grammar.

As for the Japanese comma, yes, I ran into that same issue. I thought it was separating, but I guess not. It may be something with the unicode. Thanks for catching this!

Will continue to send any major updates.

Best,

Brendon

Brendon Albertson

unread,

Mar 23, 2021, 6:36:58 PM3/23/21

to tatoeba

gillux, here's a little Python script I made to convert a CSV of Japanese sentences into an array of tokenized sentences: https://github.com/jestasgameland/tatoeba_tinysegmenter

I wonder if Tinysegmenter would be of use in making a tokenized version of the Japanese Tatoeba?

The next step for me will be trying to integrate this with the Cloze Generator...

Gilles Bedel

unread,

Mar 23, 2021, 9:17:55 PM3/23/21

to tatoeba...@googlegroups.com, Brendon Albertson

On Tue, 23 Mar 2021 15:36:58 -0700 (PDT)
Brendon Albertson <albertso...@gmail.com> wrote:

> gillux, here's a little Python script I made to convert a CSV of
> Japanese sentences into an array of tokenized
> sentences: https://github.com/jestasgameland/tatoeba_tinysegmenter
>
> I wonder if Tinysegmenter would be of use in making a tokenized
> version of the Japanese Tatoeba?

Thank you, but if we are to tokenize Japanese sentences, we will
probably use Mecab instead because we already have it integrated to
autogenerate furigana. By the way, I had a quick look at the file
sentences_jp_tokenized.json you committed on that repository. It failed
to parse correctly the first two sentences ("にちょっと" and "何かし"),
so I wonder if that tokenizer is reliable. Mecab is by no means a
state-of-the-art tokenizer but it has Python bindings if you want to
give it a try.

I’d love to get proper tokenization of Japanese to enhance search
results in Tatoeba. (But even then, that wouldn’t be useful to you
unless we provide these search results over an API.) It’s just that we
have too many things to do and too few hands helping.

— gillux

Brendon Albertson

unread,

Mar 25, 2021, 11:44:01 AM3/25/21

to Gilles Bedel, tatoeba...@googlegroups.com

Thanks, I appreciate you taking a look and catching the incorrect parsings from Tinysegmenter. I hadn't heard of Mecab and will give it a try!

Brendon

--

Brendon Albertson

MS, TESOL

www.brendonalbertson.com

Reply all

Reply to author

Forward