My tech industry predictions for 2010
- Get link
- X
- Other Apps
As both an author and as a technical consultant, I am fairly opinionated in terms of what I expect for next year (tomorrow!). Here are my predictions: There will be pressure to reduce IT expenditures. Increasing trend to favor outsourcing deployment platform (e.g., Google AppEngine and Heroku), infrastructure (e.g., Amazon AWS, RackSpace, both relational and NoSLQ data stores), and software as a service (e.g., CRM, etc.). Cloud computing gets more real. Research will be concentrated on shorter term profits rather than long strategy. China might be an exception to this: a friend reports that he has seen willingness in China to fund very long term Artificial Intelligence research. More people will spend more time using web based information and recreational resources using portable devices. There will be a shortage of wireless bandwidth in some areas. Use of the Java platform will stay strong, but with more emphasis on alternative languages like JRuby, Scala, and Clojure. Skills and educ...
$1.5 trillion a year for "defense" spending, little money left for local governments; living locally; Happy Holidays
- Get link
- X
- Other Apps
If you factor in the cost of the debt to pay for our military spending then I think that a reasonable estimate of our yearly defense spending is about $1.5 trillion. The amount we spend on defense swamps other US government spending, including social programs. This is just my opinion, but I believe that we could keep our country relatively safe (compared to other countries) and spend far less money. The problem is basically banana republic style corruption: too much money is made by special interests for there to be any meaningful reform of military spending. The same comment is also true of other corporate interests like Wall Street, industrialized food production, pharmaceutical companies, insurance, etc. This is all enabled by total corporate control of the news media and shameful corruption in the lobbying industry and our federal government. Fortunately, for most of us, life is still very good despite corruption of the world's "elite." Again, this is just my personal...
Building the EtherPad system and perusing the source code
- Get link
- X
- Other Apps
The EtherPad collaboration-enabled online system is now open source . Cool. Google bought the company, the developers are joining the Wave team, and their product is now released under the Apache 2 license. Be sure to follow the instructions (failing to set the environment variable for the path to a MySQL client JAR file produces a strange "cp -f" error that has hung up a few people trying to build the system, as reported on Hacker News). It only took about 15 minutes to download the source and build the system - simple enough. After running the system and trying it, I used IntelliJ 9 to set up a project (choose project from existing source, main directory etherpad/trunk) and I am spending some time perusing the Scala and JavaScript code. Really nice looking code base, and reading through the Scala code will be an education. My JavaScript skills are a little weak, but I might still take a careful look at the JavaScript code to understand how they used Comet.
Amazon Elastic Load Balancing (ELB) is pretty cool
- Get link
- X
- Other Apps
Using this service costs $0.025/hour so it may make sense to just run HAProxy yourself on a EC2 instance, but then you have to worry about fault tolerance/recovery if that instance fails. The ELB cost is small in comparison to running a cluster of EC2 instances and "outsourcing" as much of your system as possible to AWS (e.b., SimpleDB, Elastic Load Balancing, Relational Database Server, EBS, etc.) can certainly reduce both the complexity of your application architecture and also your implementation costs. Here are my notes for a simple ELB setup for an AMI that contains a Rails web application: export EC2_PRIVATE_KEY=pk-....pem # different on your system export EC2_CERT=/Users/markw/.ec2/cert-...pem # different on your system ec2run -k gsg-keypair ami-e767ds71 ec2run -k gsg-keypair ami-e767ds71 Note: specifying "gsg-keypair" matches later doing ssh -i ~/.ssh/id_rsa-gsg-keypair ... elb-create-lb MarkTest123 --headers --listener "lb-port=80,instance-port=42...
The cost of commoditization of computing: infrastructure and software
- Get link
- X
- Other Apps
Discipline, a new view of system architecture, and rigorous automation procedures are required to take advantage of Amazon EC2, Google AppEngine, and other infrastructure as a service providers. Last week a customer commented on the rapid pace of Amazon's innovation. Yesterday they announced a new way to generate revenue from unused server instances by letting users bid on a spot market for unused EC2 instances. Discipline When you own your own server farm, even if it is just a few back room servers, you can spread out applications over your servers in a haphazard way and usually get away with some sloppiness in your process architecture. When you are dealing with someone else's infrastructure a more disciplined approach is just about mandatory. New view of system architecture Both Google and Amazon have published papers on dealing with very large scale geographically disperse systems comprised of many components, some of which are guaranteed to fail. These companies have al...
IntelliJ version 9.0
- Get link
- X
- Other Apps
This week JetBrains gave me an upgrade license for version 9 of IntelliJ. I don't do too much Java development anymore - mostly maintenance on some of my old projects and new AppEngine and Google Wave development. Overall, version 9 is a nice upgrade: nicer source code repository integration, built in task management, the IDE seems faster, etc. I use the JetBrains RubyMine product almost every day. Long term, my use of IntelliJ 9 will depend on how good the AppEngine support is. As a test, I generated a new AppEngine project, added some home page material, edited the appengine-web.xml file to specify a registered app name, set the version, then deployed to Google's servers with no problems. I did have a problem importing an AppEngine project from Eclipse but eventally realized that I needed to go to Module Settings -> Artifacts, and drag "Available elements" from the right window pane to the <output root> tree display in the left pane. I tried writing a simpl...
Balkanisation of Ruby?
- Get link
- X
- Other Apps
When I first started using Ruby, Matz's C-Ruby was mostly the only game in town. I am also an enthusiastic user Ruby 1.9.x, JRuby, and MacRuby. Seeing the Ruby spec being developed in Japan under government funding, and that it is for Ruby 1.8.7, I get a feeling of déjà vu as a long time Lisp user. The balkanisation of Lisp has been more than a small nuisance for me in the last 25 years. I would prefer that all Ruby implementations eventually implement a common specification, but I would rather it be for something that looks like 1.9.x.
Privacy and Security in the Internet Age
- Get link
- X
- Other Apps
Just some advice that I give friends and family: Delete all cookies in your browser every week - it is easy enough to sign in again to web sites that require authentication. People who do not delete their cookies never see what sites are tracking them. It is easiest to do a 'delete all cookies' operation and not to try to save the 5 or 10 cookies out of thousands that are stored in your local browser data. Keep a text file with all passwords in encrypted form - and, do not use the same password for different purposes. Every time you use your super market's discount card (or possibly pay with a credit card), your purchases are permanently associated with you - do you care? maybe or maybe not. I do use a lot of web services that track what I do (GMail, for example) but I make the decision to give up privacy vs. benefits on a service by service basis.
Coolness! good instructions for trying Rails 3.0pre
- Get link
- X
- Other Apps
New Amazon Web Services feature: boot from EBS
- Get link
- X
- Other Apps
Awesome - in the past I have had to write my own code/scripts to manage attaching and EBS file volume to a new EC2 instance. This new feature will make it a lot easier to manage your own EC2 based services. If you would prefer a PDF with the complete documentation, then use this link . This new feature will make using EC2 even easier for some applications - a welcome change. That said, I have my own scheme for automatically mounting EBS volumes, assigning ElasticIP addresses, etc. and for some deployments I will continue to use temporary boot volumes and pull AMIs from S3.
Playing with Chrome OS
- Get link
- X
- Other Apps
I have to say that even in its alpha (or beta?) version, Chrome OS looks good. I like the home "apps page" and except for not having an application that is a terminal emulator, it is fairly complete: my calendar, google docs, etc., are all available, and web browsing is fast. Again, given a good terminal program for use cases where I need to quickly SSH to a server, I can see a light weight and low power netbook nicely augmenting my laptop + external monitor setup.
I am watching the live Chrome OS Webcast
- Get link
- X
- Other Apps
For end users, Chrome OS is a great idea - I would argue that most of my friends and relatives would be better off not running Windows, OS X, or (full) Linux. What about software developers: still useful, but not a replacement for a laptop. In a pinch, assuming a terminal window to run remote bash shells, Emacs, etc., I could still get work done while travelling. Still, for me, a MacBook with Ubuntu and OS X with RubyMine, IntelliJ, Eclipse, OmniGraffle, etc. is just about perfect for my workflow. That said, I will buy a Chrome OS netbook when they are available.
Hosted MongoDB and CouchDB
- Get link
- X
- Other Apps
After I finish up some client work this morning, I am planning on finishing a DevX article on using Heroku as a deployment platform. Since deploying to Heroku is so simple and so well documented, you might think that I would have a difficult time writing new material :-) After a short tutorial on getting started, I am writing mostly about using both CouchDB and MongoDB as data store, either hosted yourself on EC2 (or another server external to Heroku, which is itself hosted on EC2) or commercial managed solutions like Cloudant for CouchDB and MongoHQ for a managed MongoDB service. I like to manage my own and customer deployments on EC2 - frankly, it is fun :-) That said, I think that there are sometimes business reasons for using hosted solutions like Heroku, Cloudant, and MongoHQ. It is a balance between development and admin costs and paying for managed platform as a service offerings.
nice: Rubymine 2.0 released
- Get link
- X
- Other Apps
I use Rubymine for most of my Ruby/Rails/Sinatra development on Ubuntu, and use it in conjunction with TextMate on OS X. I find it convenient enough to alternate between TextMate when I don't need IDE features, and Rubymine when I do. One of the biggest improvements is that indexing now occurs in the background and auto-complete and other features become available that depend on knowledge of an application and the gems that it uses. This is subjective, but once Rubymine 2.0 loads up and is done with any background indexing then the CPU use is minimal, and I think improved from earlier versions (nice to not have the fan kick in on my laptop when the CPU cores heat up). For the Rails application that I am coding on right now, Rubymine is using about 360MB of resident memory - this is OK with me.
MongoDB has good support for indexing and search, including prefix matching for AJAX completion lists
- Get link
- X
- Other Apps
I have been spoiled by great support for indexing and search in relational databases (e.g., Sphinx, native search in PostgreSQL and MySQL, etc.) I was pleased to discover, after a little bit of hacking this morning, how easy it is to do indexing and search using the MongoDB document-centered database. I have two common use cases for search, and MongoDB seems to handle both of them fairly well: Search for words inside of text fields Efficient word prefix search to support AJAX "suggest" style lists My approach does require combining search results for multiple search terms in application code, but that is OK. Assuming the use of MongoRecord, here is a code snippet: class Recipe collection_name :recipes fields :name, :directions, :words def to_s "recipe: #{name} directions: #{directions[0..20]}..." end def Recipe.make collection, name, directions collection.insert({:_id => Mongo::ObjectID.new, :name => name, :directions =...
How to install CouchDB + nginx + basic authentication on EC2, including a Ruby client
- Get link
- X
- Other Apps
Please note that if want to more secure installation, SSL should also be installed following these instructions (I used these instructions and another web blog to create the following abbreviated instructions). For my purposes, basic HTTP authentication is good enough. I assume that you are used to using nginx and CouchDB and either installed them from source or using apt-get . I am using Ubuntu, so you might have to modify these instructions slightly. On my laptop, I created a simple crypt program because OS X does not include one: #!/usr/bin/perl print crypt($ARGV[0],$ARGV[0])."\n"; After giving this script execute permissions, I created an encrypted password: crypt my12398pass61 You should save the output because on your EC2 instance you need to, as root or sudo, edit the file /etc/nginx/htpasswd adding a line: couchclient:myEKNgP2ivVVo where myEKNgP2ivVVo was the output from crypt for the plain text password my12398pass61. Then edit nginx.conf file adding something lik...
"always on" MongoDB installation on my laptop
- Get link
- X
- Other Apps
I spend a lot of time experimenting with infrastructure software, sometimes for customer jobs and sometimes just because it is fun to learn new things. For non-SQL data stores, I have spent a lot of time in the last year experimenting with and using CouchDB, AppEngine datastore, Tokyo Cabinet, MongoDB, Cassandra, and SimpleDB. Tokyo Cabinet and SimpleDB store hash values as strings, and don't have the great client APIs that the others have because limitations in string-only hash values. That said, for an Amazon hosted application SimpleDB can be a good choice and Tokyo Cabinet is light weight and easy to install and use. Casandra looks great, and as I have written about here before , Cassandra is easy to use from ruby and has great features. MongoDB has great performance and similar capabilities as Casandra. Chris Kampmeier has a great writeup that covers installing MongoDB on OS X, including setting it up as a system service. I followed Chris's directions. A pleasant surprise...
Using nailgun for faster JRuby startup
- Get link
- X
- Other Apps
I finally got around to trying nailgun tonight. On OS X with JRuby 1.4.0RC2, I built nailgun using: cd JRUBY_HOME/tool/nailgun ./configure make # I ignored the warning "no debug symbols in executable (-arch x86_64)" In one terminal window just leave a nailgun server running: $ jruby --ng-server NGServer started on all interfaces, port 2113. When you want to run JRuby as a railgun client, try something like: jruby --ng text-resource.rb On my MacBook, this cuts about 5 seconds of JRuby startup time off of running this test program. Sweet. For small programs, using ruby is still faster than jruby but this makes developing with JRuby faster.
I just tried Amazon's new Relational Database Service (RDS)
- Get link
- X
- Other Apps
Amazon just released a beta of their Relational Database Service (RDS). You pay by the EC2 instance hour, about the same cost as a plain EC2, but about $0.01/hour more for a small instance, plus some storage costs, and bandwidth costs if you access the database outside of an Amazon availability zone. RDS MyQL compatible (version 5.1) and is automatically monitored, restarted, and backed up. Currently, there is no master slave replication, but this is being worked on (RDS beta just started today). Here are my notes on my first use of RDS: Install the RDS command line tools rds-create-db-instance --db-instance-identifier marktesting123 --allocated-storage 5 --db-instance-class db.m1.small --engine MySQL5.1 --master-username marktesting123 --master-user-password markpasstesting123 Wait a few minutes and see if the RDS instance is ready: rds-describe-db-instances Open up ports for external access, if required (note, here I am opening up for world wide access just for this test): rds-autho...
Securing your Mac laptop
- Get link
- X
- Other Apps
Laptops get lost and stolen a lot. I am extra careful with my laptop because I keep so much of my and my customer's private data on it. I take a few steps to protect this information that I want to share with you (Mac OS X specific): I keep a small encrypted disk image that contains all my passwords and other sensitive information. It also contains my .ec2, .s3cfg, .profile, .ssh, .gnupg, and .heroku files. Then in my home directory I make soft links ln -s ... to these files. I do not keep the password for this disk image in my OS X keychain! It is a very small hassle: each time I boot up, I mount this image so my .ssh, etc. files are available. This adds 10 seconds of "overhead" to each time I boot my laptop. Whenever I start working for a new customer, I ask them if they would like me to also keep their working materials encrypted (some overhead involed, so I like to ask them if I should spend the time doing this). Update: a reader pointed out that this is ...
More getting stuff done by doing what I most want to do experiments
- Get link
- X
- Other Apps
I read an interesting article a few weeks ago (sorry, no attribution - can't find the article again) about trying to always do what you want to be doing. I used to do "round robin" style scheduling of my time: keeping a single to-do list and cycling through it (and sometimes just finishing small tasks outright). I have always thought that I needed to apply some meta-level discipline to get tasks that I don't enjoy as much done in a timely way. Scheduling work is not so difficult because I usually have just 3 or 4 active customers, and I enjoy most of my work. Other things like yard work (I prefer new projects over maintenance) got the round-robin treatment, and even recreation (I like to hike, cook/eat, read, and watch movies) activities used to be scheduled round-robin style to a (very) small degree. Lately, I have been experimenting with not doing any meta-level scheduling. Now when I finish an activity I start the new activity that is what I most want to do. The r...
RDF datastores are noSQL also - always keep an RDF data store service running
- Get link
- X
- Other Apps
We tend not to use things that are not "ready at hand." RDF datastores are noSQL also :-) I always keep Sesame running as a service just as I run PostgreSQL and MySQL services. Some things are better stored, queried, and maintained in a graph database. If you always have something like Sesame (or the free edition of AllegroGraph ) running as a service, and if you have client libraries installed for your favorite programming languages then it is easier to quickly choose the best data store for any given task. BTW, I also always keep a CouchDB service running.
Cloud computing options and portability
- Get link
- X
- Other Apps
I listened to Paul Miller's podcast with Rackspace's president of their Cloud Division Lew Moorman this morning. I mostly agree with his comments on easy portability between Rackspace cloud services and Amazon's EC2. I have not yet used Rackspace's cloud offerings, so my comments here are based on their documentation and a conversation I had with one of their support engineers (for one of my steady customers: I declined some work tasks to move to Rackspace because I don't like to spread myself too thin: I spend a lot of effort staying up to speed on Amazon and AppEngine, so I prefer to specialize on those two deployment platforms). The advantage of Rackspace is the binding of a persistent disk volume with their virtualized server instances (really, they offer a standard sort of VPS hosting service) where with Amazon it takes a little extra work to manage EBS volumes separately. For me, I like the benefit of Amazon's SQS, S3, and Elastic MapReduce - that said, I...
Switching an AppEngine project from JRuby+Sinatra to Java+JSP
- Get link
- X
- Other Apps
It is a bit of a pain to take several hours to convert a working codebase in one language/platform to another. I kept having small problems with JRuby and Sinatra that were just AppEngine specific (Ruby (or JRuby) and Sinatra are awesome). I am only about 20% into development, and I decided that I wanted really solid tools/platform. Also, converting working code in one language to another is simple. What convinced me to make the switch is that Java + Eclipse plugins support is just so good for AppEngine development, that for now the change seems like a good decision. For my next AppEngine project, I'll probably go back to JRuby + Sinatra since the support is getting better.
I built the open source IDEA 9.0 git snapshot - works fine
- Get link
- X
- Other Apps
Something to do while watching TV :-) With the Apache 2.0 license, it will be interesting to see how it is used. It is a large git clone, but built easily using ant. I get a free commercial license for IntelliJ IDEA (as I used to get free Enterprise JBuilder licenses from Borland) but I still plan on following the open source IDEA project - hopefully interesting things will happen! I use Eclipse a lot just because the Java AppEngine support is so very good, but for plain old Java coding, I like IDEA. The open source edition of IDEA is great for plain old Java coding, BTW, but is missing JSP + Tomcat development support (but NetBeans does a good job for J2EE-- development, and who does J2EE development anymore :-) It takes a while to do a git clone and build the IDE (builds versions for OS X, Windows, and Linux as the default ant build target) and since the build process is so easy, it was not much fun, so you might as well just download a built version for your OS platform if you don...
Nice tool for writing and maintaining documentation: YMUL web service and yumlcmd Ruby gem
- Get link
- X
- Other Apps
Although some customers request using a Word Processor for producing documentation, if it is my choice I like Latex and OmniGraffle for producing diagrams. Latex is the fastest tool (that I use) for producing great looking print or PDF documents. I am experimenting with something else this morning: the YUML web app for creating UML diagrams and the yumlcmd Ruby gem (add http://gemcutter.org to your gem source and then gem install yumlcmd ). Thanks to Under the Hat for pointing these tools out - check their blog for directions. Although YUML hardly replaces OmniGraffle, it is cool to have documentation text based (Latex files and YUML files): faster, and less work.
Some frustration with JRuby + Rails on Google AppEngine
- Get link
- X
- Other Apps
A few engineers at Google and other developers are doing some good work towards getting Rails running on AppEngine both robustly and in a way that provides a good local development environment. One problem is simply that if your web app is not active, initializing JRuby + Rails + and all required gems can time out (30 second window for handling requests). The Java and Python support for AppEngine is fantastic, but for two projects I want to do (my own projects, but may be revenue generating :-) I want a more agile programming language that Java and while my Python skills are sort-of OK, my knowledge of Django is very light. I should probably just bite the bullet and spin up on Django, but I would strongly prefer working in Ruby. I have been experimenting with the JRuby + Sinatra + ERB + datamapper combination and at least an inactive web application spins up well within the 30 second request timeout window. I very much like datamapper (object identity issues) and it should not be too d...
Designing for scalability and platform portability
- Get link
- X
- Other Apps
Once an application is designed and at least partially implemented, options for scalability and portability are reduced. If a system's usage profile can not be predicted, then deploying to physical servers is a real problem because you have to pay for support for peak usage periods - however, relying on cloud infrastructure can very much limit platform portability. It helps to consider scalability up front! Relying on scalable data store infrastructure like Googles AppEngine datastore or Amazon's SimpleDB can make life easier. For server side Java, coding to JPA makes it possible with some work to be portable between AppEngine datastore, SimpleDB, or using a traditional database on your own server. Some care needs to be taken to code to a subset of JPA (e.g., no cross domain queries in SimpleDB) if portability is important. In the Rails world, using Datamapper provides similar flexibility for portability between AppEngine datastore, SimpleDB, or a conventional database. And, ta...
Storing Lucene indices in Cassandra; cloud versus running your own server farm
- Get link
- X
- Other Apps
The Lucandra project looks very interesting, but is incomplete at this time (see the "to be dones" at the bottom of the linked page). Cassandra is a great project. I almost incorporated it into the design of a customer project recently, but we decided to host on Amazon so using their EC2, S3, SQS, and Electric Map Reduce services won out over rolling a custom stack. I think that this must be a start up dilemma: long term, it is probably least expensive running one's own small server farm, but when you are just getting started a "pay as you go" cloud approach using very solid infrastructure tools like EC2, S3, SQS, SimpleDB, etc. makes sense. I can't say this from personal experience, but my gut feeling is that if you can live within the constraints of Google's AppEngine, then it is probably less expensive using AppEngine than running your own server farm - even long term. BTW, if you have not read my DevX article on implementing search on the Java ve...
Interesting new book: "Networks, Crowds, and Markets: Reasoning about a Highly Connected World"
- Get link
- X
- Other Apps
This book will be published in 2010 but a complete pre-publication draft is available here . There is a PDF download link for the entire book near the top of the page. If you enjoyed reading Albert-László Barabási's classic book "Linked: The New Science of Networks" then this new book looks like a great followup. I have not got too far into David Easley's and Jon Kleinberg's new book yet, but the range of topics in this 800 page book looks like it will make a good read.
Using Facebook Connect just got a lot easier
- Get link
- X
- Other Apps
Nice: RubyMine 2.0 will be a free upgrade
- Get link
- X
- Other Apps
In a world filled with great free IDEs, JetBrains keeps being competitive with commercial IDE offerings. I think that their RubyMine product is hands down the best Ruby development environment (although I do sometimes use GEdit on Linux and TextMate on OS X). Offering the 2.0 upgrade for free (to be released in a few weeks) is a nice way to say thanks to their customers. A beta 2.0 download is available here . I'm running the beta right now, and it looks like a good upgrade.
My DevX article "Using Gambit-C Scheme to Create Small, Efficient Native Applications" is now online
- Get link
- X
- Other Apps
My article is a quick introduction to Scheme, and then some examples building small compiled applications in Scheme. Gambit-C Scheme compiles to C, and the generated C code is then compiled and linked. When I need to use Lisp, I tend to use Common Lisp for large applications and Gambit-C Scheme for small utilities. For me, being able to use a high level and expressive language like Scheme to build efficient and compact applications is a big win :-) I find the development environment of Gambit-C Scheme with Gambit-C's Emacs support to be very productive. Marc Feeley, the developer of Gambit-C, mentioned to me that several companies are doing product develop in Gambit-C Scheme. I have a NLP toolkit that I have ported to Gambit-C and I hope to get the time to finish and "ship" it sometime this year.
Adventures of living in the mountains: heavy monsoon rains and flooding
- Get link
- X
- Other Apps
We live in the mountains of Central Arizona (Sedona) - great area with mountains, trees, water for kayaking, etc. We do have our problems though. I was hiking with friends this morning, beautiful day. Fast forward to this afternoon: very heavy monsoon rains with lots of flooding: wash below our house overflowed into yards below us. We had water flowing through our yard, but it did not make it into our house house. We also ended up with several inches of accumulated hail on our deck and parts of our yard: the white ice looks like snow if you don't look too carefully.
I just read the text for Obama's speech to school kids: it is non-political and strong on American values
- Get link
- X
- Other Apps
This is the text of the speech he is scheduled to give in a few days. Well worth reading since some crazy right wingers have been telling lies and sowing so much disinformation. I have very much appreciated some conservative pundits who have publicly called this disinformation "stupid." Not all conservatives put the well being of their political party above the well being of our country. Good for them for speaking up! When I was in grade school, a friend's father, who was conservative, helped arrange for our whole class to get to see President John F. Kennedy speak. I think that my friend's Dad disagreed with President Kennedy on things political, but he wanted his son and his son's classmates to hear a President speak. There is a lot of anti-American rhetoric coming from some conservatives - it is up to the rest us, the majority I think, to speak up and point out stupidity when we hear it.
Very much liking Amazon EC2
- Get link
- X
- Other Apps
I remain very enthusiastic about Google's AppEngine (and also I am very much enjoying my developer's Wave account). That said, Amazon's AWS services are having a much larger effect on my work for customers and my own work and research. AppEngine is great for some types of projects, but EC2 can be used for anything. I have a text mining experiment that have been planning for a while, and today I have some free time to start setting it up. I have 3 old desktop computers (with a reasonable amount of memory and disk) that I usually haul out of my closet, run "headless," and set up for text mining and machine learning projects. Although I own these boxes, there is a drawback to leaving them running for several weeks in my home office: noise, heat generation, messing up my work environment, etc. I did a quick calculation and estimated that if I instead use one EC2 instance, a reasonably large ESB disk volume, and Elastic MapReduce when I need it to make Hadoop Map Reduc...
Giving something back
- Get link
- X
- Other Apps
Kai-Fu Lee is leaving Google (he managed their China operations) to form an angel investing fund for young Chinese entrepreneurs. I have had some interest in Kai-Fu Lee's career since purchasing a copy of his doctoral thesis in the late 1980s on the Sphinx real time speech recognition system. I was looking at using time delayed neural networks for speech recognition, and Kai-Fu Lee's thesis was both interesting and inspiring.
great video talk: "Innovation in Search and Artificial Intelligence"
- Get link
- X
- Other Apps
Peter Norvig's recent talk at UC Berkeley discussed how the effects of large data sets and increasing computer resources make it possible to achieve increasingly better modeling and predictive results. Well worth an hour to listen to. There were a lot of gems in this talk, but one that I may put to immediate use is using non-text data in map reduce, specifically using the protocol buffer tools. I have been using Hadoop more frequently and it is worth looking the effects of binary data for intermediate results. His comment that using map reduce is not necessarily incompatible with indexing data was also interesting. There is an overhead for creating indices, but it seems like there are opportunities to use indices for access to global information in a data set while making a complete sweep through the input data set during the map phase.
Easy installation is a form of elegance
- Get link
- X
- Other Apps
Often my work tasks are relatively easy: read requirements, use my previous experience and perhaps some new research to identify the best tools/frameworks to use, do a quick design, and get the job done. In order for this to be a quick and efficient process, it is important to be able to install software tools and keep them up to date - while taking a minimum amount of my own time. I am currently running Ubuntu on all of my servers and customer servers, and it makes it faster to just remember how to do things for one distro. Using apt-get is fine for stable software installs, but when evaluating new tools it is usually best to get the most recent stable releases. I am evaluating both Tokyo Cabinet and Cassandra for a task right now, and needed to install Cassandra. Evan Weaver, who works for Twitter, has written a Ruby gem that downloads the Java code for Cassandra the first time you try to start Cassandra: gem install cassandra --no-ri --no-rdoc cassandra_helper cassandra Love it - a ...