Thursday, February 28, 2013

Drinking from the Streaming API

Today we’re open-sourcing the Streaming API. The client is full featured: it offers support for GZip, OAuth and partitioning; automatic reconnections with appropriate backfill counts; access to raw bytes payload; proper retry schemes, and relevant statistics. Even better, it’s been battle-tested in production by our internal teams. We highly recommend you take advantage of the Hosebird Client if you plan on working with the Streaming API.

Using Hosebird

The Hosebird Client is broken into two main modules: hbc-core and hbc-twitter4j. The hbc-core module uses a simple message queue that a consumer can poll for messages. The hbc-twitter4j module lets you use the superb Twitter4J project and its data model on top of the message queue to provide a parsing layer.

The first step to use Hosebird is to setup the client using the ClientBuilder API:

/ Create an appropriately sized blocking queue
BlockingQueue<String> queue = new LinkedBlockingQueue<String>(10000);
/ Authenticate via OAuth
Authentication auth = new OAuth1(consumerKey, consumerSecret, token, secret);
/ Build a hosebird client
ClientBuilder builder = new ClientBuilder()
   .hosts(Constants.STREAM_HOST)
   .authentication(auth)
   .endpoint(new StatusesSampleEndpoint())
   .processor(new StringDelimitedProcessor(queue))
   .eventMessageQueue(queue);
Client hosebirdClient = builder.build();

After we have created a Client, we can connect and process messages:

client.connect();
while (!client.isDone()) {
String message = queue.take();
System.out.println(message); / print the message}

Hosebird Examples

We recommend you learn from the examples on GitHub or contribute your own.

If you want a quick example, set these properties in hbc-example/pom.xml:
<consumer.key>SECRET</consumer.key>
<consumer.secret>SECRET</consumer.secret>
<access.token>SECRET</access.token>
<acesss.token.secret>SECRET</acesss.token.secret>

Then you can run this command on the command line:
mvn exec:java -pl hbc-example

This will connect to the sample stream API and print 1000 JSON items from the API.

Acknowledgements
The Hosebird Client was primarily authored by Steven Liu (@TwitterAPI team for their thoughtful suggestions and help.

On behalf of the Hosebird team,
- Chris Aniszczyk, Manager of Open Source (@cra)

Tuesday, February 19, 2013

Twitter Typeahead.js: You Autocomplete Me

Twitter MIT license. By sharing a piece of our infrastructure with the open source community, we hope to evolve typeahead.js further with community input.

If your web application needs a fully-featured queryable search box, typeahead.js can help. Some of its capabilities and features include:

Search data on the client, server, or both
Handle multiple inputs on a single page with shared data and caching
Suggest multiple types of data (e.g. searches and accounts) in a single input
Support for international languages, including right-to-left (RTL) and input method editors (IME)
Define custom matching and ranking functions
Grey text hints that help explain what hitting tab will do

It’s also optimized for large local datasets, so it's fast for high-latency networks.

Examples

We recommend you take a look at our examples page. There are three ways to get data:

Using local, hard-coded data passed on page render:

$('#input').typeahead([
{
name: 'planets',
local: [ "Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune" ]
}
]);

Using a prefetch URL that will be hit to grab data on pageload and then stored in localStorage:

$('#input').typeahead([
{
name: 'countries',
prefetch: '/countries.json',
}
]);

Or using a queryable API that returns results as-you-type (with the query being passed in the ?q= parameter):

$('#input').typeahead([
{
name: 'countries',
remote: '/countries.json',
}
]);

You can also combine local or prefetch with a remote fallback for the performance of local data combined with the coverage of a remote query API (e.g. quickly search your friends but be able to find anyone on your site). There are lots of options for configuring everything from ranking, matching, rendering, templating engines, and more; check out the README for those details.

If you want to use this with a project like Bootstrap, all you have to do is include the JavaScript file for typeahead.js after Bootstrap’s JavaScript file and use our configuration options.

We initially built typeahead.js to support our needs; now we look forward to improvements and suggestions from the community. To learn more about how typeahead.js works, check out our detailed joining the flock?

Acknowledgements
Typeahead.js was primarily authored by Tim Trueman (@jakeharding).

On behalf of the typeahead.js team,
- Chris Aniszczyk, Manager of Open Source (@cra)

Wednesday, February 6, 2013

New Twitter search results

We just shipped a new version of the Twitter app with a brand new search experience that blends the most relevant content - Tweets, user accounts, images, news, related searches, and more - into a single stream of results. This is a major shift from how we have previously partitioned results by type (for instance, Tweet search vs. people search). We think this simplified experience makes it easier to find great content on Twitter using your mobile device.

A typical search scores items of the same type and picks the top-scoring results. In a blended search experience, this is not straightforward. The scores of different content types are computed by different services, and thus not directly comparable for blending. Another challenge is to decide which type of content to mix, as not all content types are always desirable to display. This post discusses our approach to solving these challenges.

Ranking

When a user searches, different types of content are searched separately, returning a sequence of candidate results for each content type with a type-specific score for each. For certain content types that are displayed as a single group or gallery unit, such as users or images, we assign the maximum score of results as the representative score of this content type. The result sequences for some content types may be trimmed or discarded entirely at this point.

Once results of different content types are prepared, each type-specific score is converted into a universally compatible score, called a “uniscore”. Uniscores of different modules are used as a means to blend content types as in a merge-sort, except for the penalization of content type transition. This is to avoid over-diversification of content types in the blended result.

Fig. 1: Search ranker chose News1 followed by Tweet1 so far and is presented with three candidatesTweet2, User Group, and News2 to pick the content after Tweet1. News2 has the highest uniscore but search ranker picks Tweet2, instead of News2 as we penalize change in type between consecutive content by decreasing the score of News2 from 0.65 to 0.55, for instance.

Score unification

Individual content is assigned a type-specific score, which is called a “raw” score, by its corresponding service. To facilitate blending and ranking content of different types as described above, raw scores are converted into uniscores using type-specific log-linear score conversion functions – where the chance of a converted score to take its value in [0, 1] is at least 95%, as estimated from observed dataset.

Content selection and boosting

Certain types of content may not have many relevant items to show for a particular input query, in which case we may choose not to include this type of content in search results. In other cases, for instance if query volume or matched item counts have an unusual spike (what we call a “burst”), we show this type and may also boost it to appear at a higher place in the results. To facilitate this, we represent trends in searches or matching result counts as a single number that is proportional to the level of “burstiness”.

For example, consider measuring “burstiness” for the number of images and news content matching the query “photos”. We first obtain three sequences of term frequencies, e.g. :

Fig. 2 : Three sequences of number of Tweets over eight 15 minute buckets from bucket 1 (2 hours ago) to 8 (most recent).
Tweet : counts of Tweets that match query “photos”.
Image : counts of Tweets that match query “photos” and contain image links.
News : counts of Tweets that match query “photos” and contain news links.
Query “photos” is shown not only to match Tweets with image links more than those with news links but also is increasing over time.

Our approach to compute the burstiness of image and news facets is an extension of original work by Jon Kleinberg on bursty structure detection, which is in essence matching current level of burst to one of a predefined set of bursty states, while minimizing too diverse a change in matched states for smooth estimation [1].

In our extension, burstiness of mixable content types including images, users, news, and tweets are computed simultaneously to reflect relative difference in bursty levels between different types and used the distance of observed rate from each state’s bursty level as state cost. This is because accurately estimating probability of occurrence is infeasible for real-time estimation due to expensive computational cost and possible introduction of zero intervals between probability states due to numerical approximation. Optimal state sequences for images and news are estimated as shown in Fig 3.

Fig. 3 : Normalized image and news counts are matched to one of n=5 states : 1 average, 2 above, and 2 below. Matched states curves show a more stable quantization of original sequence which has the effect of removal of small noisy peaks.

Finally, burstiness of each content type is computed as an exponential moving average of state IDs in the optimal state sequence. As shown in Fig. 3, jointly optimizing the sum of state cost and transition cost yields a smooth quantization of original sequence, which automatically filters out small noisy peaks in original counts. Also, this maps both trending (bursty) and steadily high sequences to a high burstiness value.

Burstiness computed this way is used to filter out content types with low or no bursts. It’s also used to boost the score of corresponding content types, as a feature for a multi-class classifier that predicts the most likely content type for a query, and in additional components of the ranking system.

References

[1] J. Kleinberg, Bursty and Hierarchical Structure in Streams, Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002. (PDF)

Posted by Youngin Shin
Search-Quality Team

Thursday, January 31, 2013

Introducing Flight: a web application framework

Last year we rolled out a MIT license as a framework for structuring web applications.

Whether you use Flight as the JavaScript framework for your next web project, or just as source for new ideas, we look forward to learning from diverse perspectives via community feedback and contributions on GitHub.

Why Flight?

Flight is distinct from existing frameworks in that it doesn't prescribe or provide any particular approach to rendering or providing data to a web application. It's agnostic on how requests are routed, which templating language you use, or even if you render your HTML on the client or the server. While some web frameworks encourage developers to arrange their code around a prescribed model layer, Flight is organized around the existing DOM model with functionality mapped directly to DOM nodes.

Not only does this obviate the need for additional data structures that will inevitably influence the broader architecture, but by mapping our functionality directly onto the native web we get to take advantage of native features. For example, we get custom event propagation for free by piggybacking off DOM event bubbling, and our event handling infrastructure works equally well with both native and custom events.

How does it work?

Flight enforces strict separation of concerns. When you create a component you don't get a handle to it. Consequently, components cannot be referenced by other components and cannot become properties of the global object tree. This is by design. Components do not engage each other directly; instead, they broadcast their actions as events which are subscribed to by other components.

Why events?

Events are open-ended. When a component triggers an event, it has no knowledge of how its request will be satisfied or by whom. This enforced decoupling of functionality allows the engineer to consider each component in isolation rather than having to reason about the growing complexity of the application as a whole.

By making DOM node events proxies for component events, we let the web work for us:

we get event propagation for free
a component can subscribe to a given event type at the document level or it can choose to listen only those events originating from within a specified DOM Node
subscribing components do not distinguish between custom events from other components (e.g. 'dataMailItemsServed') and native DOM node events (e.g. 'click'), and process both types of event in an identical fashion.

Mobility and testing

Each component is a module that, aside from a minimal set of standard dependencies (relevant Flight utilities and mixins), has no reference to the outside world. Thus a given component will respond to a given event in the same way, regardless of environment. This makes testing simple and reliable — events are essentially the only variable, and a production event is easy to replicate in testing. You can even debug a component by triggering events in the console.

Mixins

A mixin defines a set of functionality that is useful to more than one object. Flight comes with built-in support for functional mixins, including protection against unintentional overrides and duplicate mixins. While classical JavaScript patterns support only single inheritance, a component prototype (or other object) can have multiple mixins applied to it. Moreover, mixins requires a fraction of the boilerplate required to form traditional classical hierarchies out of constructor-prototypes hybrids, and don't suffer the leaky abstractions of the latter ('super', 'static', 'const' etc.)

Documentation and demo

Our GitHub page includes email client:

Future work

Flight is an ongoing and evolving project. We’re planning to add a full testing framework and make available more of the utilities that we use for the Twitter website frontend. We also look forward to your contributions and comments. We know we haven’t thought of everything, and with your help we can continue to improve Flight for the benefit of everyone.

Acknowledgments

Flight was a group effort. These folks have contributed to the project: Angus Croll (folks in the web community who took the time to review the code.

On behalf of the Web Core team,
— Angus Croll, Engineer (@angustweets)

Monday, January 28, 2013

Braindump

Cross-posted from @skr's blog.

The Twitter stack

For various reasons, including performance and cost, Twitter has poured significant engineering effort into breaking down the site backend into smaller JVM based services. As a nice side effect we’ve been able to open source several of the libraries and other useful tools that came out of this effort.

While there is a fair amount of information about these projects available as docs or slides I found no simple, high level introduction to what we can unofficially call the Twitter stack. So here it is. It’s worth noting that all this information is about open source projects, that it is public already and that I am not writing this as part of my job at Twitter or on their behalf.

Now, granted these were not all conceived at Twitter and plenty of other companies have similar solutions. However I think the software mentioned below is quite powerful and with most of it released as open source it is a fairly compelling platform to base new services off of.

I will describe the projects from a Scala perspective, but quite a few are useful in Java programs as well. See the Twitter Scala school for an intro to the language, although that is not required to understand this post.

Finagle

At the heart of a service lies the Thrift protocol, but Finagle supports other protocols too such as Protocol buffers and HTTP.

Setting up a service using Finagle
A quick dive into how you would set up a Thrift service using Finagle.

Write a Thrift file defining your API. It should contain the structs, exceptions and methods needed to describe the service functionality. See Thrift Interface Description Language (IDL) docs, in particular the examples at the end for more info.
Use the Thrift file as input for a code generator that spits out code in your language. For Scala and Finagle based projects I would recommend Scrooge.
Implement the Scala trait generated from your Thrift IDL. This is where the actual functionality of your service goes.
Provide the Finagle server builder an instance of the implementation above, a port to bind to and any other settings you might need and start it up.

That looks pretty similar to just using plain Thrift without Finagle. However, there are quite a few improvements such as excellent monitoring support, tracing and Finagle makes it easy to write your service in an asynchronous fashion. More about these features later.

You can also use Finagle as a client. It takes care of all the boring stuff such as timeouts, retries and load balancing for you.

Ostrich

So let’s say we have a Finagle Thrift service running. It’s doing very important work. Obviously you want to make sure it keeps doing that work and that it performs well. This is where Ostrich comes in.

Metrics
Ostrich makes it easy to expose various metrics from your service. Let’s say you want to count how many times a particular piece of code is run. In your service you’d write a line of code that looks something like this:

Stats.incr(“some_important_counter”)

As simple as that. The counter named some_important_counter will be incremented by 1.

In addition to just straight up counters you can get gauges that report on the value of a variable:

Stats.addGauge("current_temperature") { myThermometer.temperature }

or you can time a snippet of code to track the performance

Stats.time("translation") {

 document.translate("de", "en")

}

Those and other examples can be found in the Ostrich readme.

Export metrics
Ostrich runs a small http admin interface to expose these metrics and other functionality. To fetch them you would simply hit http://hostname:port/stats.json to get the current snapshot of the metrics as JSON. At Twitter the stats from each service will be ingested from Ostrich by our internal observability stack, providing us with fancy graphs, alerting and so on.

To tie this back to our previous section: If you provide a Finagle client or server builder with an Ostrich backed StatsReceiver it’ll happily splurt out tons of metrics about how the service is performing, the latencies for the RPC calls and the number of calls to each method to name a few.

Ostrich can also deal with configuring your service, shutting down all the components gracefully and more.

This is an example of what a dashboard could look like with stats gathered from Ostrich by our observability stack. Screenshot from @raffi’s presentation deck.

Zipkin

Ostrich and Finagle combined gives us good service level metrics. However, one downside of a more service oriented architecture is that it’s hard to get a high level performance overview of a single request throughout the stack.
Perhaps you are a developer tasked with improving performance of a particular external api endpoint. With Zipkin you can get a visual representation of where most of the time to fulfill the request was spent. Think Firebug or Chrome developer tools for the back end. Zipkin is a implementation of a tracing system based off of the Google Dapper paper.

Finagle-Zipkin
So how does it work? There’s a finagle-zipkin module that will hook into the transmission logic of Finagle and time each operation performed by the service. It also passes request identifiers down to any services it relies on, this is how we can tie all the tracing data together. The tracing data is logged to the Zipkin backend and finally we can display and visualize that data in the Zipkin UI.

Let’s say we use Zipkin to inspect a request and we see that it spent most of it’s time waiting for a query to a MySQL database. We could then also see the actual SQL query sent and draw some conclusions from it. Other times perhaps a GC in a Scala service was a fault. Either way, the hope is that a glance at the trace view will reveal where the developer should spend effort improving performance.

Enabling tracing for Finagle services is often as simple as adding

.tracerFactory(ZipkinTracer())

to your ClientBuilder or ServerBuilder. Setting up the whole Zipkin stack is a bit more work though, check out the docs for further assistance.

Trace view, taken from my Strange loop talk about Zipkin.

Mesos

Mesos describes itself as “a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks”. I’ll try to go through this section without using buzzwords such as “private cloud”, although technically I just did.

The core Mesos project is an open source Apache incubator project. On top of it you can run schedulers that deal with more specific technologies, for example Storm and Hadoop. The idea being that the same hardware can be used for multiple purposes, reducing wasted resources.

In addition to using Storm on top of Mesos we deploy some of our JVM-based services to internal Mesos clusters. With the proper configuration it takes care of concerns such as rack diversity, rescheduling if a machine goes down and so on.

The constraints imposed by Mesos have the positive side effect of enforcing adherence to various good distributed systems practices. For example:

Service owners shouldn’t make any assumptions about jobs’ lifetimes, as the Mesos scheduler can move jobs to new hosts at any time.
Jobs shouldn’t write to local disk, since persistence is not guaranteed.
Deploy tooling and configs shouldn’t use static server lists, since Mesos implies deployment to a dynamic environment.

Iago

Before putting your new service into production you might want to check how it performs under load. That’s where Iago (formerly Parrot) comes in handy. It’s a load testing framework that is pretty easy to use.

The process might look something like this:

Collect relevant traffic logs that you want to use as the basis for your load test.
Write a configuration file for the test. It contains the hostnames to send load to, the number of requests per second, the load pattern and so on.
Write the actual load test. It receives a log line, you transform that into a request to a client.
Run the load test. At Twitter this will start up a few tasks in a Mesos cluster, send the traffic and log metrics.

Example
A load test class could be as simple as this:

class LoadTest(parrotService: ParrotService[ParrotRequest, Array[Byte]]) extends 

 ThriftRecordProcessor(parrotService) {



 val client = new YourService.FinagledClient(service, new TBinaryProtocol.Factory())



 def processLines(job: ParrotJob, lines: Seq[String]) {

   lines foreach {line =>client.doSomething(line) }

 }

}

This class will feed each log line to your service’s doSomething method, according to the parameters defined in the configuration of parrotService.

ZooKeeper

ZooKeeper is an Apache project that is handy for all kinds of distributed systems coordination.

One use case for ZooKeeper within Twitter is service discovery. Finagle services register themselves in ZooKeeper using our ServerSet library, see finagle-serversets. This allows clients to simply say they’d like to communicate with “the production cluster for service a in data centre b” and the ServerSet implementation will ensure an up-to-date host list is available. Whenever new capacity is added the client will automatically be aware and will start load balancing across all servers.

Scalding

From the Scalding github page: “Scalding is a Scala library that makes it easy to write MapReduce jobs in Hadoop. Instead of forcing you to write raw map and reduce functions, Scalding allows you to write code that looks like natural Scala”.

As it turns out services that receive a lot of traffic generate tons of log entries. These can provide useful insights into user behavior or perhaps you need to transform them to be suitable as Iago load test input.

I have to admit I was a bit sceptical about Scalding at first. It seemed there were already plenty of ways to write Hadoop jobs. Pig, Hive, plain MapReduce, Cascading and so on. However, when the rest of your project is in Scala it is very handy to be able to write Hadoop jobs in the same language. The syntax is often very close to the one used by Scala’s collection library, so you feel right at home, the difference being that with Scalding you might process terabytes of data with the same lines of code.

A simple word count example from their tutorial:

TextLine(args("input"))

   .read

   .flatMap('line -> 'word){ line : String => line.split("\\s")}

   .groupBy('word){group => group.size}

   .write(Tsv(args("output")))

jvmgcprof

One of the well known downsides of relying on the JVM for time sensitive requests is that garbage collection pauses could ruin your day. If you’re unlucky a GC pause might hit at the wrong time, causing some requests to perform poorly or even timeout. Worst case that might have knock on effects that leads to downtime.

As a first line of defence against GC issues you should of course tweak your JVM startup parameters to suit the kind of work the service is undertaking. I’ve found these slides from Twitter alumni Attila Szegedi extremely helpful.

Of course, you could minimize GC issues by reducing the amount of garbage your service generates. Start your service with jvmgcprof and it’ll help you reach that goal. If you already use Ostrich to track metrics in your service you can tell jvmgcprof which metric represents the work completed. For example you might want to know how many kilobytes of garbage is generated per incoming Thrift request. The jvmgcprof output for that could look something like this.

2797MB w=101223 (231MB/s 28kB/w)
50.00% 8   297
90.00% 14 542
95.00% 15 572
99.00% 61 2237
99.90% 2620    94821
99.99% 2652    95974

On the first line you can see that the number requests or work were 101223 for the period monitored, with 231MB/s of garbage or 28kB per request. The garbage per request can easily be compared after changes has been made to see if they had a positive or negative impact on garbage generation. See the jvmgcprof readme for more information.

Summary

It’s no surprise, but it turns out that having a common stack is very beneficial. Improvements and bug fixes made by one team will benefit others. There is of course another side to that coin, sometimes bugs are introduced that might just be triggered in your service. However, as an example, when developing Zipkin it was immensely helpful to be able to assume that everyone used Finagle. That way they would get tracing for free once we were done.

I have left out some of the benefits of the Twitter stack and how we use Scala, such as the very convenient way Futures allow you to deal with results from asynchronous requests. I hope to write a more in depth post on how to set up a Twitter style service that would deal with the details omitted in this article. In the meantime you can check out the Scala school for more information.

Thanks to everyone who worked on the projects mentioned in this article, too many to name but you know who you are.

Posted by Johan Oskarsson