Learning Japanese the Rubyist way

Makoto Inoue

This post was originally published on the New Bamboo blog, before New Bamboo joined thoughtbot in London.


  • Introduction
  • Step 1: How to read Japanese characters
  • Step 2: Japanese and OO
  • Step 3: Japanese and functional
  • Step 4: Writing Japanese programming language in Ruby
  • Summary
  • Ruby Advent Calendar

Introduction

Have you ever thought about learning Japanese, but it looks too difficult to learn? Surprisingly, Japanese and Ruby share some common features and concepts. This is a shortcut version of my presentation called “Japanese and Ruby” which I presented at LRUG.

When you finish reading this post, hopefully you find Japanese language less magical, and may even add “Learn Japanese” to one of your 2011 new year resolutions.

Learning a new language always has a bit of steep learning curve. Go and get some coffee before you start!!

Step 1: How to read Japanese characters

One of the first big hurdle when learning a language is to remember all characters. This is not a issue if you are learning a language based on alphabet, but many non western languages have their own character sets.

To make the matters worse, Japanese uses three different character set, Kanji (Chinese character), Hiragana, and Katakana. Hiragana and Katakana each has 46 characters and there are a lots of Kanji (possibly 50,000, though we use ONLY 2000 ~ 3000 in daily use).

Here is the mapping of Hiragana, Katakana, and Alphabet.

Hiragana Katakana Kanji

(The diagram is from Wikipedia)

The point here is not to overwhelm you with the amount of information, but to let you think “WHY” Japanese uses 3 character sets.

Originally, Japanese did not have its own character set, so we used to borrow characters from China (Kanji). Since Chinese grammar and Japanese grammar are completely different, it was not easy to map all these Kanji into Japanese sentence. That’s when Hiragana and Katakana were born to supplement Kanji. Hiragana is often used as a glue to combine words into sentence where Kanji alone is not good enough.

For example “行” is a Kanji character which means “to go”. Japanese has many different ways to change the ending of verb (eg: goes) end we use Hiragana to supplement. Here are the examples.

Japanese Alphabet Meaning
行く Iku I go
行かない Ikanai I do not go
行こう Ikou Let’s go, the more casual way
行きましょう Ikimasho Let’s go, the more polite way
行け Ike Go, very non polite way

NOTE: If the above examples do not look like Japanese, you have an encoding issue. Make sure that your browser encoding is set to UTF-8.

Katakana, on the other hand, was often used alongside with Kanji so that people can understand how to pronounce the Kanji. Nowadays, Katakana is often used to represent new words which came from foreign countries.

eg: 漢字カンジ, ルビー

(Trivia. The above example is expressed with html5 ruby tag )

(Another Trivia. Before multibyte became common, Japanese computers were only able to handle Alphabet and single-byte Katakana (eg: ルビー), instead of multibyte (eg: ルビー). Some banks’ ATM slips still use this single byte Katakana)

Even though they are the conventions, you can use Kanji, Katakana, and Hiragana interchangeably.

The following 3 all mean “Cherry blossom bloom” and pronounce the same “Sakura Saku”:

  • 桜咲く
  • サクラさく
  • さくらさく

(Trivia. The world “Karaoke” is the combination of Kanji “Kara”(空 , means “Empty”) and English “Oke” Orchestra.)

Here is the quick recap of what you learnt so far.

  • Kanji came first to import Chinese words
  • Hiragana was created to suit for domestic use
  • Katakana is used to adopt new words

Doesn’t this “There are many ways to achieve one thing” concept familiar with Ruby’s philosophy?

  • Ruby came first to bring concept of OO & Functional
  • Ruby was created to suit for every day scripting use
  • Ruby keeps evolving by adopting new concepts (Fiber, Multinationalization/M17N, Refinements etc)

Step 2: Japanese and OO

Satoshi Nakashima is a well known Japanese blogger who used to work at Microsoft as one of the development team members who shipped Windows 95 and Internet Explorer.

He was once asked “Are there anything it helped to create Windows 95 as a Japanese?” He initially did not come up with anything, but then thought that Japanese grammar structure is more suited to Object Oriented programming. To explain his thought, I will explain you some basic Japanese grammar.

English and Japanese has very different grammatical order.

English grammar structure is called “SVO”(Subject - Verb - Object), while Japanese one is called “SOV”(Subject - Object - Verb)

If I put “I eat bacon” in Japanese order, it is going to be “I bacon eat”(Watashi ha bacon wo tabemasu “私はベーコンを食べます”)

At first glance, English order is clearer as “what you do”(verb) comes next to “who does it”(subject). It’s almost like command line options (eg: git clone url).

The problem of command line options is that there are so many choices that it’s hard to figure out which command you are supposed to use.

On the other hand, Japanese grammatical order is more similar to GUI. You often (right-mouse) click an object you are interested, then it suggests the possible actions. This is much more user friendly because you do not have to know all the possible actions and its argument options.

As you already know, Ruby is one of the best scripting languages to express OO (though you can write in procedural, or “command oriented way” if you wish)

# Procedural
open("box")
open("car")
open("file", "foo.txt")

# OO
Box.new.open
Car.new.open
File.open("foo.txt")

In the above example, they both do exactly the same thing, but the implementation will be quite different. For procedural example, I imagine that you have to keep adding nested “if” statement as logic becomes more complicated. On the other hand, the logic of OO way is kept isolated within each class.

Step 3: Japanese and functional

I often says Japanese is a politician’s language. What does this mean? My definitions of politicians’s are:

  • they do not commit to anything unless necessary
  • they mean different things depending on context

In Japanese grammar, there is a term called “Postpositional” (“Pre-positional” is often used in English, such as for you, after dinner, and so on). Postpositional is used to decide the role of noun which it supports. This enables you to change the order of structure very flexibly, chain as many sentence as you like, and also let you omit subject.

Here are some examples of what I just said.

Japanese English How to pronounce Structure How it is ordered if written in English
私はベーコンを食べます I eat bacon Watashi ha bacon wo tabemasu SOV I bacon eat
ベーコンを私は食べます I eat bacon Bacon wo watashi ha tabemasu OSV Bacon I eat
ベーコンを食べます I eat bacon Bacon wo tabemasu OV Bacon eat

And this is the example of chaining too much sentence together.

One of the common mistakes Japanese people make when writing a sentence is chaining too much, because it is very hard to digest the whole sequence (One of my friends explained this as “Don’t write a sentence which could cause stack overflow”).

What makes English very logical and concise (in my opinion) is because the subject and verb comes at the beginning. Even though you can still write verbose sentence in English, this strict ordering forces you to write things relatively concise.

On the other hand, you can write a lot of sentence in Japanese meaning nothing because it omits subject, and also the verb you used at the very end have very loose relationship to the sentence you started at the beginning.

Now let’s move back to how this (loosely) relates to some of the concepts in Ruby.

  • they do not commit to anything unless necessary => Lazy evaluation
  • they mean different things depending on context => Block

eg:

10000.times # ==> #<Enumerator: 10000:times>

User.order('users.id DESC').limit(20).includes(:items)

File.open("/tmp.txt").each do |line|
  puts line
end

Functional features of Ruby lets you do crazy meta-programming. Though they are powerful, abusing may confuse people to understand the code and may cause unexpected bug ;-P

Step 4: Writing Japanese programming language in Ruby

So, how are you doing so far? Easy peasy Japanesey?

(Trivia: the above expression is apparently common phrase in UK, derived from some TV commercial saying “easy peasy lemon squeezy”)

When you learn a new language, reading books/articles are not enough. You always need to practice. Having said that, speaking to real Japanese people from day one may be a bit too difficult (or you just do not have a Japanese friend ;-( ), so here is a toy for you to play around.

Some of my colleague once asked me “Are there any Japanese programming languages? What I mean is not just to be able to write Japanese text as string, but all programming syntax (such as "if”, “loop”) are actually in Japanese". Yes, there are some. Nadeshiko and Mind are the ones. However, I decided to write it myself using Ruby, and here is the result.

Looks amazing, isn’t it?

Here are few more Japanese examples to understand what I just showed.

  • ‘に’ and ‘を’ are postpositional which means that the words in front of them (1 and 2) are objects.
  • ‘たす’(hiragana) and ‘足す’(kanji + hiragana) are both verbs and mean “to add”
  • ‘て’ is also postpositional which says this is end of one sentence and next sentence will start (equivalent to “and”)

In my programme, I simply used postpositional as delimiters to split a Japanese phrase into words.

(Trivia: Japanese words are not separated by space, so tokeniziing Japanese are very important part of natural language processing)

So

-1に2をたして4を掛ける

Becomes

[1, 2, :+, 4, :*]

Japanese grammar is a bit like reverse polish notation, or a stack machine which is often used by a compiler to process a programming language. So the above array is equivalent to the following mathematical calculation.

describe Evaluator do
  it "must calculate all operands" do
    Evaluator.new([1, 2, :+, 3 , :* , 1, :-, 2, :/]).
      evaluate.must_equal ((((1+2) * 3) - 1 ) / 2)
  end
end
# NOTE: This is minitest which comes by default in Ruby 1.9

There are a few more secrets

If you see the video closely, you can notice that the number I typed is slightly different from normal ascii number. It’s unicode number, so it raises “undefined local variable or method” error.

Since Japanese does not have any space between words, you can catch an entire sentence as a method.

So I just passes the entire expression as one method and catches at method_missing.

This is how japanize works.

When I was researching how to implement very simple compiler/interpreter, I learnt a lot from an article written by Koichi Sasada (the creator of Ruby 1.9 Virtual Machine). The article is written in Japanese, but there is one sample code which implements some basic VM functionalities in Ruby.

The code handles not just maths, but also loop and if statement. If you are curious enough, you could implement something similar on top of Japanize. I will accept pull request as long as it looks like Japanese !!

Summary

Here are the list of things you learnt through this post.

  • Japanese uses 3 characters, Kanji, Hiragana, and Katakana.
  • Japanese grammar structure is Subject - Object - Verb(SOV)
  • Japanese order can be flexible thanks to postpositional

Even though Matz did not intend to reflect Japanese language into the design of Ruby, I think there are certain influence, since anyone’s thought is influenced by the language they use.

(NOTE: \@yukihiromatz says “Japanese and Ruby? I try not to think too much about Japanese culture. The method chain looks like Japanese, but it’s just a coincident. Having said that, the support of M17N is heavily influenced by the use case of Japanese people. Otherwise, I wouldn’t spend too much time on such a hard problem”. You can compare how \@matztranslated bot actually translated the sentence).

If you are interested more, the full slide of my talk at LRUG is here.

Ruby and japanese from inouemak

The talk was videotaped and uploaded after the event.

Ruby Advent Calendar

This is Day 17 of Ruby Advent Calendar. The previous entry was written by matschaffer or gautamrege, and the next will be written by elight.