The Hitchhiker’s Guide to Ruby On Rails Galaxy

Records of my voyage through RoR Galaxy

How to parse a tweet text from Twitter using Ruby to parse-out ‘@’ and ‘#’

Posted by arjunghosh on March 5, 2009

Well lot of us love @twitter and also Ruby, and some time work on both :)

And often we need to do the folowing with a tweet

Well I had to do the following quite often:-

Take out the ‘@’ (i.e. @replies )and ‘#’ (i.e. hashtags ) from a tweet and separate it from the text part.

For example, we have a tweet:

@myfriend1 @myfriend2 this is a sample text #link #text

Now I want this tweet to be seperated into the following Array:

['myfriend1','myfriend2']

['link','text']

and the text only – ["this is a sample text "]

So first had to build a RegE, and then using the ever useful .gsub method of Ruby, created the following:

parsed_text = tweet.text.gsub(/ ?(@\w+)| ?(#\w+)/) { |a| ((a.include?(‘#’)) ? tags : replies) << a.strip.gsub(/#|@/,”); ” }

So the parsed_text has the final text only.  tags is an Array which will contain the hashtags and replies is an Array which will contain the @replies.

The RegEx / ?(@\w+)| ?(#\w+)/ extracts and seperates the hashtags & the @replies and place them in two seperate arrays.

The RegEx /#|@/,” only reples the ‘@’ and ‘#’ symbols in the extracted array elements.

And you can download it from Gist here http://gist.github.com/78498

Also while working on creating the above regular expressions, I found this interesting RegEx testing site called www.rubular.com which will help you write regular expressions very easily.

About these ads

8 Responses to “How to parse a tweet text from Twitter using Ruby to parse-out ‘@’ and ‘#’”

  1. Hi,

    Have added to my blog

    http://www.ygoel.com/

    the write-up & all the photographs of the Kolkata Bloggers Meet 2009.

    Do take some time to visit and do not forget to put down your inputs for the same.

    Regards & Love,
    Yours ever in blogging,

    Yogesh Goel
    ygoel.com

  2. raf said

    with your current regexp you’d catch part of the domain on email addresses as well, look at this:

    ” @myfriend1 @myfriend2 someone@domain.com this is a sample text #link #text”

    result:

    tags # => ["link", "text"]
    replies # => ["myfriend1", "myfriend2", "domain"]

    what if we are twitting about Ruby code? sometimes it happens

    ” @myfriend1 @myfriend2 someone@domain.com this is a sample text #link #text ActiveRecord#find”

    tags # => ["link", "text", "find"]
    replies # => ["myfriend1", "myfriend2", "domain"]

    you can fix that getting rid of the question marks on the regex => / (@\w+)| (#\w+)/

    tags # => ["link", "text"]
    replies # => ["myfriend1", "myfriend2"]

    besides I’d add a .strip after the last } => << a.strip.gsub(/#|@/,”); ” }.strip
    and I'd use do/end instead of {/}

    Thanks for sharing.

  3. Howdy,

    As a new RoR developer, I am so thankful I found your post! Additionally, I’d like to make a remark for any new developers who might be using the titter-auth gem.

    gsub will not work on an array!! you’ll need a @tweets.each {|twitstring| twistring.gsub(RegEx)… }}

    Here’s my code that tallies my jock & nerd tweets.

    @tweets.each{|x| x.gsub(/(\+|-)\d\s(geek|nerd|jock)/){|a| ((a.include?(‘jock’)) ? jock : nerd) <>jock=[]
    =>[]

    -voodoologic

  4. Devesh said

    Its awesome… really its very nice…

  5. Hi. Great solution. Is there a way to store those @ and # arrays as objects in the database?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: