How to parse a tweet text from Twitter using Ruby to parse-out ‘@’ and ‘#’
Posted by arjunghosh on March 5, 2009
Well lot of us love @twitter and also Ruby, and some time work on both
And often we need to do the folowing with a tweet
Well I had to do the following quite often:-
Take out the ‘@’ (i.e. @replies )and ‘#’ (i.e. hashtags ) from a tweet and separate it from the text part.
For example, we have a tweet:
@myfriend1 @myfriend2 this is a sample text #link #text
Now I want this tweet to be seperated into the following Array:
['myfriend1','myfriend2']
['link','text']
and the text only – ["this is a sample text "]
So first had to build a RegE, and then using the ever useful .gsub method of Ruby, created the following:
parsed_text = tweet.text.gsub(/ ?(@\w+)| ?(#\w+)/) { |a| ((a.include?(‘#’)) ? tags : replies) << a.strip.gsub(/#|@/,”); ” }
So the parsed_text has the final text only. tags is an Array which will contain the hashtags and replies is an Array which will contain the @replies.
The RegEx / ?(@\w+)| ?(#\w+)/ extracts and seperates the hashtags & the @replies and place them in two seperate arrays.
The RegEx /#|@/,” only reples the ‘@’ and ‘#’ symbols in the extracted array elements.
And you can download it from Gist here http://gist.github.com/78498
Also while working on creating the above regular expressions, I found this interesting RegEx testing site called www.rubular.com which will help you write regular expressions very easily.


Yogesh Goel said
Hi,
Have added to my blog
http://www.ygoel.com/
the write-up & all the photographs of the Kolkata Bloggers Meet 2009.
Do take some time to visit and do not forget to put down your inputs for the same.
Regards & Love,
Yours ever in blogging,
Yogesh Goel
ygoel.com
arjunghosh said
Will do