[CWB] Fwd: Subtitle Search interface

Ian Worthington worthy.vii at gmail.com
Fri Jun 28 04:51:16 CEST 2019


Hello everyone, My name is Ian Worthington and I have a dream that I think
you can help me with.

You see, I'm a student of the Japanese language, and I'm not a very good
one.

My biggest issue is that I always have my doubts if the things I read are
actually something people say, or if it's just things that are usually
written.
So because of this, I end up saying strange things because I've never heard
them before, but I may have read them.

So, my solution, was to gather lots and lots of subtitles, match them up on
paired languages, and then have a nice search interface for it so people
can very easily find how to actually say things, in a very native way, in
the other language.

I finished it many many years ago, when my coding skills were quite weak
(using CherryPy, Postgresql and Sphinx - with the OpenSubs corpus), and
this was when I didn't even know how to use Git. It took a long time,
writing the various subtitle parsers, tokenizers etc.
Here's a pit of it in action back then
[image: image.png]
(I used it in a presentation to get a job here in Japan)

I recently rediscovered my project, and did some searching online - only to
find that you guys have already made exactly what I wanted long ago:
http://opus.nlpl.eu/cwb/OpenSubtitles/frames-cqp.html

What I would like to do, is
1) Build a better UI front end (I'll do)
2) Host it on a separate URL (like, subsearch.com or something)

For this to work, I would love it if I could get permission to use the API
endpoint for these queries - preferable using json.

I don't aim to make any profit from this at all, and I'm happy to attach
all credit/references back to your site however you like.

What do you think?

Thanks,
Ian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20190628/794766cd/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 284179 bytes
Desc: not available
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20190628/794766cd/attachment-0001.png>


More information about the CWB mailing list