In lexical analysis, tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining. Tokenization is useful both in linguistics (where it is a form of text segmentation), and in computer science, where it forms part of lexical analysis.
What I’m trying to say is that IRC has a standard and IRCv3 has a standard, and that parsing incoming messages based on these standards by iterating over the string and creating “tokens” out of them you’re more likely to have a forwards-compatible parser than simply using a regex.
There are many examples of IRC tokenizers; I personally like Twisted’s tokenizer for it’s simplicity.