yield_self in Ruby 2.5

2017-09-28 Follow @mlomnicki

Comments

Pipes

Ruby 2.5 adds a very interesting method Object#yield_self.

This is its definition. Slightly simplified

# object.yield_self {|x| block } → an_object
# Yields self to the block and returns the result of the block.

class Object
  def yield_self
    yield(self)
  end
end

At first, it doesn’t look like a noticeable feature. It just returns what the block returns. However it turns out that it’s akin to the pipe operator known from F# and Elixir. Let’s see what opportunities it opens up.

yield_self in action

This is a pretty typical Ruby code that reads data.csv, parses it and sums one of the columns.

CSV.parse(File.read(File.expand_path("data.csv"), __dir__))
   .map { |row| row[1].to_i }
   .sum

Such a code usually takes a few secs to understand. Mainly because we read it left-to-right but it runs right-to-left.

Let’s rewrite it with yield_self

"data.csv"
  .yield_self { |name| File.expand_path(name, __dir__) }
  .yield_self { |path| File.read(path) }
  .yield_self { |body| CSV.parse(body) }
  .map        { |row|  row[1].to_i }
  .sum

Better? Worse? I don’t think there’s a clear answer. We can name some benefits:

There’s a clear flow, from top to the bottom
The code is open for additions. Adding more steps to the flow shouldn’t hurt the readability.

There are also some drawbacks

It’s more verbose than the original version and uses unnecessary blocks
It’s not an idiomatic Ruby. Obviously it can’t be because it’s a brand new feature

Let’s try how this pattern works in other typical scenarios.

  events = Event.upcoming
  events = events.limit(params[:limit])          if params[:limit]
  events = events.where(status: params[:status]) if params[:status]
  events

With yield_self

Event.upcoming
  .yield_self { |events| params[:limit]  ? events.limit(params[:limit]) : events }
  .yield_self { |events| params[:status] ? events.where(status: status) : events }

Again, with yield_self the code is more verbose. On the other hand we don’t have to overwrite the events variable and explicitly return it in the last line.

Next example shows how yield_self can be used to print the number of Rails stargazers.

"https://api.github.com/repos/rails/rails"
  .yield_self { |url| URI.parse(url) }
  .yield_self { |url| Net::HTTP.get(url) }
  .yield_self { |response| JSON.parse(response) }
  .yield_self { |repo| repo.fetch("stargazers_count") }
  .yield_self { |stargazers| "Rails has #{stargazers} stargazers" }
  .yield_self { |string| puts string }

OK, this doesn’t look good. It seems there’s more noise than the actual code.

Naming is hard

Let’s not give up yet though. Do we really need to name block arguments? What if we avoid the names?

"https://api.github.com/repos/rails/rails"
  .yield_self { |_| URI.parse(_) }
  .yield_self { |_| Net::HTTP.get(_) }
  .yield_self { |_| JSON.parse(_) }
  .yield_self { |_| _.fetch("stargazers_count") }
  .yield_self { |_| "Rails has #{_} stargazers" }
  .yield_self { |_| puts _ }

Much better, isn’t it? Now let’s compare to the traditional syntax.

uri      = URI.parse("https://api.github.com/repos/rails/rails")
response = Net::HTTP.get(uri)
repo     = JSON.parse(response)
puts "Rails has #{repo.fetch("stargazers_count")} stargazers"

It’s nice that with yield_self we don’t have to name temporary variables. We can just use an underscore as the unnamed variable and avoid useless names such as uri or response.

Ditch blocks

Also, it would be nice to get rid of blocks. This is already possible but looks cryptic.

"https://api.github.com/repos/rails/rails"
  .yield_self(&URI.method(:parse))
  .yield_self(&Net::HTTP.method(:get))
  .yield_self(&JSON.method(:parse))
  .yield_self { |_| _.fetch("stargazers_count") }
  .yield_self { |_| "Rails has #{_} stargazers" }
  .yield_self(&method(:puts))

Nope, wrong direction. The previous version, despite being more verbose is definitely more pleasant to comprehend.

Hopefully there will be a nicer syntax in Ruby in the future. There’s an interesting proposal to introduce a shorthand operator for Object#method. There’s no agreement on how the operator should look though.

I’m going to use the syntax proposed by the author of this feature request. This is just for fun, this code is not gonna work.

"https://api.github.com/repos/rails/rails"
  .yield_self(URI->parse)
  .yield_self(Net::HTTP->get)
  .yield_self(JSON->parse)
  .yield_self { |_| _.fetch("stargazers_count") }
  .yield_self { |_| "Rails has #{_} stargazers" }
  .yield_self(Kernel->puts)

Conclusion

Object#yield_self might be a useful little tool to build a pipeline that passes data from one block to another. I regret that it doesn’t have a shorter name, say pipe or apply. Would I recommend yield_self over the traditional syntax? I believe there are some usecases where it would be useful but you have to examine them yourself. Do it carefully and think twice if using non-idiomatic approach will be beneficial to you and your team.

I hope you enjoyed this article. What do you think about yield_self? Let me know in the comments!

UPDATE 29.09.2017

As pointed out in the comments, the underscore is a special name for unused variables and it’s best to reserve it for such uses. Please do not use it as presented in this article. Instead you can pick a short variable name, such as it.

"https://api.github.com/repos/rails/rails"
  .yield_self { |it| URI.parse(it) }
  .yield_self { |it| Net::HTTP.get(it) }
  .yield_self { |it| JSON.parse(it) }
  .yield_self { |it| it.fetch("stargazers_count") }
  .yield_self { |it| "Rails has #{it} stargazers" }
  .yield_self { |it| puts it }

Follow @mlomnicki