Kanji Notebook Diairy

Resurrecting a project I had about 10 years ago.
For my own use. May be of interest to classmates and instructors.
Extracting the kanji and vocab I find interesting from the kanjidic project.
An adjunct to Spahn and Hadamitzky's book that is a bit useless on kindle and not great in print form as a reference book. (lack of search capability and use of romaji for starters.)
However the index is available in kandic so I can extract it and other information for cross reference to FileMaker and from there to FileMaker Go on an iPad.

31 August

Have managed to get the data out of the XML file. Fortunatly I had an xsl file I could edit to enable this, and didn't have to relearn the process.
I need to think at this stage what I want in the notebook, and what cross references. kkld. Japanese grades, old jlpt grades, Possibly O'Niell although if I had a copy of his book I can't find it. I have Henshalls descriptions. I have kakitorikun references but it's been years.

  • Need to cut down the several thousand entries to just the 2000 or so in spahn and kkld.
  • Need to find a regex plugin for filemaker to prettify the okurigana. Or use native capability.
  • Need to see if I can figure out what I did 10 years ago and leverage into the new build.
  • Need to think about example vocab.
  • Wonder if I can script finding vocab that only uses previously known kanji.

https://baseelementsplugin.zendesk.com/hc/en-us/articles/115002990887-About-the-BaseElements-Plugin

https://www.dracoventions.com/products/2empowerFM/family/regex.php#sysReq

FM has moved on a bit I'm still on v15 and not about to buy a newer version.
I have go15, I think 17 should be compatible.

Idea :!: – autolookup of online kanji resource. Embed webpage.


Partial target data. Morphemes not needed, but could make multi radical lookup using them. Wonder if indexes exist. Radical index not needed. Variants? Kanji in examples can be direct linked without index no. English headwords. Prefer kkld. Different headwords for kunyomi etc. Undecided. Stroke order – better separate from main kanji. Animated? I have the font with numbers so could make images. I believe animated files may be public domain or cc license.
Kanji structure too much like skip code to me.

Krad file gives the components for kanji search. Feels like reinventing the wheel to implement it. For now copy paste from using another dictionary. Where it's implemented in keyboard area.

https://www.nihilist.org.uk/
Stroke order font
https://jisho.org/search/本%20%23kanji
Syntax of kanji URL at jisho.org

http://kakijun.com/c/672c.html
Syntax at kakijun for 本 Using Unicode number. In Kanjidic file?

https://app.kanjialive.com/
Syntax at kanjialive

1 September

The question was whether to start anew or work with the old file.
The old file also only has the old jyouyou set and is also more focused on grade 1 to 6 kanji.
Leaving plus 195 kanji. The immediate problem being importing without regenerating some data I have for the existing kanji but not these extra ones. Also adding the unicode for each but that is simpler.

Cleaning up the old file.
Remove / change existing sentences feature. Remove kakitorikun etc.

Need to add search.

Kanji alive has annoying javascripted display setup. It scales to monitor size not portal size. So it wonted display properly in anything less than a full width page. A potential problem. Also no horizontal scroll when this happens. The problem of fixed design over fluid design. Not that I can talk as I'm working to a fixed ipad landscape design. If FM go can work split screen this might change.

I found 57 varient forms in the kanji set I'm working on. Some of which have parent reading and meanings. All seem to be grade 10 in Japanese school system. None have spahn or kkld numbers. Are these the ones that were added to the jyouyou set? Oddly, or maybe not, in filemaker at least a search for the variant character returns the standard characters as well.

My tategaki solution doesn't work on ipad. need to rethink look.
FileMaker can run on a floating tab. Although not sure I want it to.
need to look at scripted searches for radical groups.

Need to generate pretty okurigana in FM10 because the plugin I used no longer works anyway abandoned by the developer. There are other regex logins but I can't easily colour the strings I replace.

simplify simplify simplify

2 September

More okurigana issues.
I have settled 3000 kanji ! Most probably never looked at, but includes all of Spahn entries, all of kkld 1 and 2nd edition and all of the extended education set.
There is a difference between kkld 1st and 2nd editions, with various kanji omitted and added. Have a print first edition and electronic 2nd edition.
Anyhow, using fm10 I've coloured the okurigana. But they are in a comm separated list, I need them in separate fields. So some scripting is called for.

3 September

Kanji in main file now.
okurigana colour solved natively. involving looping scripts, magic keys and a cumbersome replacement of regex.

hurrah.

Reduced the amount of kanji by removing the variants. They were problematic. I key on kanji believing they should be unique. But (at least to FileMaker) the variants, although having different unicode values, are seen as the same. So I managed to overwrite some when exporting, reimporting, updating.

Cleaned out some of the cruft. Commented it more. Originally it was a quickish hack. The primary file had better commenting.
Hard to go through uncommented code 10years later.

I noticed some slight issues with the radicals when there are variants. May need some thought. Although I'm not convinced of immediate use of the 214 radicals other than historical curiosity. Even then I doubt I'd memorise rather than figuring it out.

next allowing the user to add words. Originally I had kanji notebook KNB as a utility for an app I made that analysed and deconstructed texts. I was interested in the frequency of primary school kanji in texts. So the vocab was auto generated.
Now I want to hand enter less. They are to illustrate the readings. Automatic cross reference would be good.
I also want to be able to add pictures. Signs, packaging, diagrams on how to write the character correctly. Handwriting examples. Maybe an illustrative picture eg. a tree for 本. – which would be stupid as tree is 木 – 本 is book.

Need to finish up soon. as I soon won't have the spare time.

4 September

Oddly kanjidic doesn't differentiate between radical variants, so 恨 and 思 are both radical 61which seems to have 3 forms 忄, 心, 㣺. now I have all the forms in the related table but only the first matches because that’s all the info kanjidic seems to provide. there are nelson dic numbers for some radicals but …
I wonder how I generated the radicals table?
I could hand reclassify radicals but that makes for fragility. I would have problems if I ever needed to reimport kanjidic data.

https://www.sanseido.biz/sp/

https://kids.yahoo.co.jp/zukan/

5 September

Got distracted by dictionaries.

https://pypi.org/project/kanjinetworks/
this looks interesting but would take me time to sort out the python and stuff. The site was good when it existed.
Of all the ways to extract data from a website or database pdf is probably the least useful. XML would have been the most portable and useful.
There is a plain text file. Maybe I can grep it and get it into FileMaker.

I might get Kanji Sieve functional again if I replace chuta.jp with language.tiu.ac.jp . I think they just let chuta lapse but it pointed to tiu in any case. A fragility in my hack.

But this etymology thing and kanji sieve is a distraction.

/[一-龠]+|[ぁ-ゔ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[a-zA-Z0-9]+|[々〆〤]+/u

https://moji.tekkai.com/zoom/%E6%9C%AC/page.html

10 September
16 September

Been working on this and not writing here.

to do

  • Etymology to web viewer and lookup kanji for cross reference
  • lookalikes
  • phonetic groups
  • synonynms
  • sentence box bigger.
20 September

https://github.com/mifunetoshiro/kanjium

Work in Unicode

several higher level kanji were driving me crazy. overwriting data on input. solved by having the fields work in unicode. Japanese was almost enough but must have been using an encoding system where some characters shared a codepoint. Too used to having everything work in unicode these days.

24 September
29 September

https://youtu.be/zu7OqwboJOo

Mac and Windows runtimes made.

Windows and fonts is a bit of a headache. Might try to see if Google fonts can improve matters.

30 September

Bloody hell, trying to import data on 600 kanji, only 599 matched. How to find the unmatched? in 3000 records. I managed it by export/import/find the field that wasn't duplicated.
It was 2 二 the source file had katakana ニ close but no cigar.
There's a lesson in there somewhere.