Multimodal AI


What is multimodal AI – it is ai that takes input from different modes – like audio, video, or other sensors/senses.

It is the next hot thing in AI.

Implicit in multimodal AI is this concept of mapping or translation. How do we translate sound to text, or text to numbers.

For 20+ years I have been involved in this mapping question. I have called it at various times mapping, translation, or transduction. These days I prefer transduction. We are in the age of transduction – which is translation between different scales of things – or different phases of things. If we imagine the past as the science of transformation, how does one thing transform into something else, transduction is how do we map the parts of the transformation.

That may make no sense – I just think I confused myself.

In any case, for years I have played with translating/transducing chess into other modes (sound, audio, dance ). Chess has 8 rows and 8 columns, like the 8 notes in the western scale. 8s show up again and again in various cultures and it would be interesting to play a bit more with this – I think of the 8 of the ba gua – the 8 trigrams – as well as ba gua the martial art. But I digress.

So today after a number of false starts I had GPT chat generate a python program to generate a musical composition based on a chess game.

chat gpt used the musical program lilypond which I have never used before. Lilypond generates musical notation and midi. This is cool.

I dont necessarily agree with the mappings – why were they selected? I asked chat gpt which said
“The mappings I provided between chess positions and musical notes in the earlier examples were selected for simplicity and as a starting point for the demonstration.” OpenAI. (2023). ChatGPT (September 25 Version) [Large language model].

I then asked “could you regenerate this by considering each piece as representing a different rhythm and each square as representing a different note – the horizontal being notes within an octave and the vertical as different timbres (frequencies)”

This only seemed to generate difference in rhythm – the notes were all the same. Also the musical notation does not seem to match the midi.

In any case – I found this interesting.

Leave a Reply