Right now, we have pure reference counted string. Not bad, but I now see better performing implementations that can save quite a lot of memory as well (small string optimization).
BTW, it would be interesting to combine substring search with character filter...