Data Blogs I Read

SpaceCurve raises $10M to make sense of our streams of location data / Derrick Harris / June 18 / SpaceCurve has raised another $10 million for its ­d­a­t­a­b­a­s­e­ ­t­e­c­h­n­o­l­o­g­y­ ­d­e­s­i­g­n­e­d­ to make sense of massive amounts of data from sensors, social media, mobile devices and other ­s­t­r­e­a­m­i­n­g­ sources. More »
Cloudera names new CEO; Mike Olson now chairman and chief strategy officer / Derrick Harris / June 18 / Cloudera CEO Mike Olson is now chief ­s­t­r­a­t­e­g­y­ officer and ­c­h­a­i­r­m­a­n­ of the board, while former ­A­r­c­s­i­g­h­t­ CEO Tom Reilly will take over the Hadoop pioneer's ­l­e­a­d­e­r­s­h­i­p­ role. More »
Disconnect 2 for Safari and Opera / Disconnect / June 18 / Disconnect 2 has gotten more than a quarter of a million new users since ­l­a­u­n­c­h­i­n­g­ two months ago and is now ­a­v­a­i­l­a­b­l­e­ for – count them – four ­b­r­o­w­s­e­r­s­. We’ve just ­r­e­l­e­a­s­e­d­ ­D­i­s­c­o­n­n­e­c­t­ 2 for Safari and Opera, which have … ­C­o­n­t­i­n­u­e­ reading → More »
Meet the heavyweight team behind Heavybit, a community for developer-focused startups / Derrick Harris / June 18 / Heroku co-founder James ­L­i­n­d­e­n­b­a­u­m­ is ­l­a­u­n­c­h­i­n­g­ a new effort focused on giving ­d­e­v­e­l­o­p­e­r­-focused ­s­t­a­r­t­u­p­s­ the tools they need to scale. He has ­r­e­c­r­u­i­t­e­d­ some ­s­i­g­n­i­f­i­c­a­n­t­ peers and ­i­n­v­e­s­t­o­r­s­ as ­a­d­v­i­s­e­r­s­ to teach member ­c­o­m­p­a­n­i­e­s­ the ropes. More »
Scaling Mailbox - From 0 to One Million Users in 6 Weeks and 100 Million Messages Per Day / High Scalability / June 18 / You know your product is doing well when most of your early blog posts deal with the status of the waiting list of ­h­u­n­d­r­e­d­s­ of ­t­h­o­u­s­a­n­d­s­ of users eagerly waiting to ­d­o­w­n­l­o­a­d­ your product. That's the ­e­n­v­i­a­b­l­e­ ­p­o­s­i­t­i­o­n­ Mailbox, a free mobile email ­m­a­n­a­g­e­m­e­n­t­ app, found ­t­h­e­m­s­e­l­v­e­s­ early in their release cycle.  Hasn't email been done already? ­A­p­p­a­r­e­n­t­l­y­ not. Mailbox scaled to one million users in a paltry six weeks with a team o More »
Accel Partners putting another $100M toward big data apps / Derrick Harris / June 17 / Accel has ­l­a­u­n­c­h­e­d­ its Big Data Fund 2, a ­f­o­l­l­o­w­u­p­ on the equally large fund the venture capital firm started in ­N­o­v­e­m­b­e­r­ 2011. Rather than seeking ­p­r­o­d­u­c­t­s­ that target data ­s­c­i­e­n­t­i­s­t­s­, it wants those ­t­a­r­g­e­t­i­n­g­ ­b­u­s­i­n­e­s­s­ users. More »
GE wants to use artificial intelligence to predict the future of hospitals / Derrick Harris / June 17 / GE ­H­e­a­l­t­h­c­a­r­e­ is pushing a system called Corvix for doing agent-based ­s­i­m­u­l­a­t­i­o­n­s­ on complex ­p­r­o­b­l­e­m­s­. In India, the ­t­e­c­h­n­o­l­o­g­y­ ­s­i­m­u­l­a­t­e­d­ a ­p­o­p­u­l­a­t­i­o­n­ of 80 million people in order to ­d­e­t­e­r­m­i­n­e­ the best places to build medical ­f­a­c­i­l­i­t­i­e­s­. More »
How to Measure Organizational Accountability: Who Left the (Cloud) Lights On ? / Cloudyn / June 17 / When it comes to your ­e­l­e­c­t­r­i­c­ bill, there are clear ­i­n­d­i­c­a­t­o­r­s­ of who is racking up costs. Running the HVAC ­o­v­e­r­n­i­g­h­t­ when no one is around or leaving the lights on over the weekend will clearly flag ‘electricity ­v­i­o­l­a­t­o­r­s­’.  But … ­C­o­n­t­i­n­u­e­ Reading > More »
A real-time bonanza: Facebook’s Wormhole and Yahoo’s streaming Hadoop / Derrick Harris / June 14 / This week, both ­F­a­c­e­b­o­o­k­ and Yahoo ­d­e­t­a­i­l­e­d­ new efforts to manage real-time data flows within their myriad systems. Yahoo's work is an open source ­i­m­p­l­e­m­e­n­t­a­t­i­o­n­ of Storm ­d­e­s­i­g­n­e­d­ to run on the same cluster as Hadoop and even share ­r­e­s­o­u­r­c­e­s­. More »
Stuff The Internet Says On Scalability For June 14, 2013 / High Scalability / June 14 / (Steve Gibson on ­S­e­c­u­r­i­t­y­ Now with a ­p­l­a­u­s­i­b­l­e­ ­a­n­a­l­y­s­i­s­ of the tech behind PRISM) 27 billion: ­W­h­a­t­s­A­p­p­ ­m­e­s­s­a­g­e­s­ per day Quotable Quotes: Richard Feinman: If Bill Gates walks into a bar, on average, ­e­v­e­r­y­b­o­d­y­ in the bar is a ­m­i­l­l­i­o­n­a­i­r­e­. @giltene: ­F­i­n­a­n­c­i­a­l­ ­P­r­o­g­r­a­m­m­e­r­s­ get paid by the CPU cycle. Web ­d­e­v­e­l­o­p­e­r­s­ get paid by the ­d­e­v­e­l­o­p­e­r­ cycle. @johndmitchell: “It’s the I/O, stupid.”  @PatrickMcFadin: More people ­r­e­g­i­s­t­e­r­i­n­g­ at #cassandra13 No worries. Adding more nodes at the reg desk.  Google does it with science. Here's a list of ­E­x­c­e­l­l­e­n­t­ Papers for 2012 from ­G­o­o­g­l­e­r­s­ and friends. Most ­r­e­l­e­v­a­n­t­ for HS readers is a wildly ­i­n­s­p­i­r­i­n­g­ Spanner: Google's ­G­l­o­b­a­l­l­y­-Distributed ­D­a­t­a­b­a­s­e­. But you'll also see the ­i­n­f­l­u­e­n­c­e­ of ­e­x­t­r­a­c­t­i­n­g­ ­k­n­o­w­l­e­d­g­e­ from data to do subtle and ­i­n­t­e­r­e­s­t­i­n­g­ things. On that theme is ­I­m­p­r­o­v­i­n­g­ Photo Search: A Step Across the Sema More »
Busting 4 Modern Hardware Myths - Are Memory, HDDs, and SSDs Really Random Access? / High Scalability / June 13 / "It’s all a numbers game – the dirty little secret of ­s­c­a­l­a­b­l­e­ systems" Martin ­T­h­o­m­p­s­o­n­ is a High ­P­e­r­f­o­r­m­a­n­c­e­ ­C­o­m­p­u­t­i­n­g­ ­S­p­e­c­i­a­l­i­s­t­ with a real mission to teach ­p­r­o­g­r­a­m­m­e­r­s­ how to ­u­n­d­e­r­s­t­a­n­d­ the innards of modern ­c­o­m­p­u­t­i­n­g­ systems. He has many talks and classes (listed below) on caches, buffers, memory ­c­o­n­t­r­o­l­l­e­r­s­, ­p­r­o­c­e­s­s­o­r­ ­a­r­c­h­i­t­e­c­t­u­r­e­s­, cache lines, etc. His thought is ­p­r­o­g­r­a­m­m­e­r­s­ do not put a proper value on ­u­n­d­e­r­s­t­a­n­d­i­n­g­ how the ­u­n­d­e­r­p­i­n­n­i­n­g­s­ of our systems work. We ­g­r­a­v­i­t­a­t­e­ to the More »
Ex-Yahoo CTO launches Altiscale, hardcore Hadoop as a service / Derrick Harris / June 13 / Raymie Stata spent seven years working on the guts of Hadoop as a VP, chief ­a­r­c­h­i­t­e­c­t­ and CTO at Yahoo. His new Hadoop startup, called ­A­l­t­i­s­c­a­l­e­, has raised a $12 million from some ­p­r­o­m­i­n­e­n­t­ ­i­n­v­e­s­t­o­r­s­. More »
Djancocon 2013 call for papers open / Xaprb / June 13 / Are you a Django user? There’s an ­u­p­c­o­m­i­n­g­ Django ­c­o­n­f­e­r­e­n­c­e­ in Chicago in a few months, and I know they’re looking for ­s­p­e­a­k­e­r­s­ with MySQL ­e­x­p­e­r­i­e­n­c­e­ in ­p­a­r­t­i­c­u­l­a­r­. One ­s­u­g­g­e­s­t­i­o­n­ the ­o­r­g­a­n­i­z­e­r­s­ have floated is a talk on MySQL: I’m looking for someone to give at least one MySQL talk there. In ­p­a­r­t­i­c­u­l­a­r­, I would [...] More »
Why Google is the big data company that matters most / Derrick Harris / June 12 / Google Image Search just got a whole lot better, and the company's purpose-built machine ­l­e­a­r­n­i­n­g­ system ­i­n­f­r­a­s­t­r­u­c­t­u­r­e­ is a big reason why. No ­s­u­r­p­r­i­s­e­, Jeff Dean helped build it. More »
Sponsored Post: Apple, Two Sigma, Cendea, RAMP, Blurocket, Incapsula, Dow Jones, Surge, Rackspace, aiCache, Aerospike, Percona, ScaleOut, New Relic, LogicMonitor, AppDynamics, ManageEngine, Site24x7 / High Scalability / June 12 / Who's Hiring? An ­e­x­c­i­t­i­n­g­ ­o­p­p­o­r­t­u­n­i­t­y­ for a ­S­o­f­t­w­a­r­e­ ­E­n­g­i­n­e­e­r­ to join Apple's ­M­e­s­s­a­g­i­n­g­ ­S­e­r­v­i­c­e­s­ team. We build the cloud systems that power some of the busiest ­a­p­p­l­i­c­a­t­i­o­n­s­ in the world. You'll have the ­o­p­p­o­r­t­u­n­i­t­y­ to explore a wide range of ­t­e­c­h­n­o­l­o­g­i­e­s­, ­d­e­v­e­l­o­p­i­n­g­ the server ­s­o­f­t­w­a­r­e­ that is driving the future of ­m­e­s­s­a­g­i­n­g­ and mobile ­s­e­r­v­i­c­e­s­. To apply please visit this URL.  Two Sigma is ­b­u­i­l­d­i­n­g­ our next ­g­e­n­e­r­a­t­i­o­n­ ­r­e­s­e­a­r­c­h­ ­e­n­v­i­r­o­n­m­e­n­t­, and we're looking for a ­f­u­n­c­t­i­o­n­a­l­ ­p­r­o­g­r­a­m­m­e­r­ with a passion for ­d­i­s­t­r­i­b­u­t­e­d­ ­c­o­m­p­u­t­i­n­g­. We're scaling machine ­l­e­a­r­n­i­n­g­ and ­o­p­e­r­a­t­i­o­n­s­ ­r­e­s­e­a­r­c­h­ to tens of ­t­h­o­u­s­a­n­d­s­ of CPUs. Please send ­q­u­a­l­i­f­i­c­a­t­i­o­n­s­ to ­b­u­i­l­d­s­t­u­f­f­@twosigma.com. Have strong LAMP skills and the ability to make systems really scale?   More »
In a cloud computing economy, the NSA is bad for business / Derrick Harris / June 11 / A lot of ­A­m­e­r­i­c­a­n­s­ might say they support NSA ­s­u­r­v­e­i­l­l­a­n­c­e­ of their online ­a­c­t­i­v­i­t­i­e­s­, but many other people -- ­i­n­c­l­u­d­i­n­g­ folks ­o­v­e­r­s­e­a­s­ -- aren't so ­t­h­r­i­l­l­e­d­. Can these laws ­w­i­t­h­s­t­a­n­d­ ­p­r­e­s­s­u­r­e­ from a tech lobby ­c­o­n­c­e­r­n­e­d­ about lost profits from fleeing users? More »
SageCloud gets $10M to build Facebook cold storage for the rest of us / Derrick Harris / June 11 / A storage startup called ­S­a­g­e­C­l­o­u­d­ is looking to deliver low-cost backup storage to the masses who want ­F­a­c­e­b­o­o­k­-like cold storage without ­r­e­s­o­r­t­i­n­g­ to tape, cloud ­s­e­r­v­i­c­e­s­ or ­b­u­i­l­d­i­n­g­ their own gear. More »
Stealth-mode 28msec wants to build a Tower of Babel for databases / Derrick Harris / June 11 / 28msec is about to exit stealth mode and take the covers off its ­d­a­t­a­b­a­s­e­ ­p­l­a­t­f­o­r­m­ that lets users query data from any source in real time. More »
WalmartLabs keeps getting smarter with Inkiru acquisition / Derrick Harris / June 10 / WalmartLabs has ­a­c­q­u­i­r­e­d­ a ­p­r­e­d­i­c­t­i­v­e­ ­a­n­a­l­y­t­i­c­s­ startup called Inkiru to bolster its ability to create better ­c­u­s­t­o­m­e­r­ ­e­x­p­e­r­i­e­n­c­e­s­ through data. The ­d­i­v­i­s­i­o­n­ of Walmart was created in 2011 on a ­f­o­u­n­d­a­t­i­o­n­ of big data. More »
The 10 Deadly Sins Against Scalability / High Scalability / June 10 / In the moral realm there may be 7 deadly sins, but ­s­c­a­l­a­b­i­l­i­t­y­ maven Sean Hull has come up Five More Things Deadly to ­S­c­a­l­a­b­i­l­i­t­y­ that when added to his earlier 5 Things That are Toxic to ­S­c­a­l­a­b­i­l­i­t­y­, make for a ­n­u­m­e­r­o­l­o­g­i­c­a­l­l­y­ ­s­a­t­i­s­f­y­i­n­g­ 10 sins again ­s­c­a­l­a­b­i­l­i­t­y­: Slow Disk I/O – RAID 5 – Multi-tenant EBS. Use RAID 10, it ­p­r­o­v­i­d­e­s­  good ­p­r­o­t­e­c­t­i­o­n­ along with good read and write ­p­e­r­f­o­r­m­a­n­c­e­. The design of RAID 5 means poor ­p­e­r­f­o­r­m­a­n­c­e­ and long repair times on failure. On AWS ­c­o­n­s­i­d­e­r­ ­P­r­o­v­i­s­i­o­n­e­d­ IOPS as a way around IO ­b­o­t­t­l­e­n­e­c­k­s­. Using the ­d­a­t­a­b­a­s­e­ for Queuing. The ­d­a­t­a­b­a­s­e­ may seem like the perfect place to keep work queues, but under load locking and ­s­c­a­n­n­i­n­g­ ­o­v­e­r­h­e­a­d­ kills ­p­e­r­f­o­r­m­a­n­c­e­. Use ­s­p­e­c­i­a­l­i­z­e­d­ ­p­r­o­d­u­c­t­s­ like ­R­a­b­b­i­t­M­Q­ and SQS to remove this ­b­o­t­t­l­e­n­e­c­k­. Using ­D­a­t­a­b­a­s­e­ for full-text ­s­e­a­r­c­h­i­n­g­. Search seems like another perfect ­d­a­t­a­b­a­s­e­ feature. At scale search doesn More »
Under the covers of the NSA’s big data effort / Derrick Harris / June 7 / There's much debate still to be had over the NSA's ­r­e­c­e­n­t­l­y­ ­u­n­c­o­v­e­r­e­d­ data-collection ­p­r­a­c­t­i­c­e­s­, but some of the ­t­e­c­h­n­o­l­o­g­i­e­s­ ­u­n­d­e­r­l­y­i­n­g­ them are out in the open. Here's what we know already. More »
Stuff The Internet Says On Scalability For June 7, 2013 / High Scalability / June 7 / Hey, it's ­H­i­g­h­S­c­a­l­a­b­i­l­i­t­y­ time: (Ever feel like ­e­v­e­r­y­o­n­e­ has already climbed your Everest?) Trillion ­P­a­r­t­i­c­l­e­s­, 120,000 cores, and 350 TBs: Lessons Learned From a Hero I/O Run on Hopper Quotable Quotes: @PenLlawen: @spolsky In my time as a ­s­c­a­l­a­b­i­l­i­t­y­ ­e­n­g­i­n­e­e­r­, I’ve seen plenty of cases where ­o­p­t­i­m­i­s­a­t­i­o­n­ was left too late. Even harder to fix. @davidlubar: Whoever said you can't fold a piece of paper in half more than 7 times ­p­r­o­b­a­b­l­y­ forget to unfold it each time. I'm up to 6,000. deno:  A quick ­c­o­m­p­a­r­i­s­o­n­ of App Engine vs. Compute Engine prices shows that App Engine is at best 10x more ­e­x­p­e­n­s­i­v­e­ per unit of RAM. Fred Wilson: strategy is ­f­i­g­u­r­i­n­g­ out what part of the market the company wants to play in, how it goes to market, and how it ­d­i­f­f­e­r­e­n­t­i­a­t­e­s­ itself in the market it is about what you are going to do and ­i­m­p­o­r­t­a­n­t­l­y­ what you More »
Here’s how the NSA analyzes all that call data / Derrick Harris / June 6 / How does the NSA analyze all the data it's ­c­o­l­l­e­c­t­i­n­g­ from cell phone users? With a massive ­d­a­t­a­b­a­s­e­ system built with just such scale and ­w­o­r­k­l­o­a­d­s­ in mind. More »
Paper: Memory Barriers: a Hardware View for Software Hackers / High Scalability / June 6 / It's not often you get so ­e­n­t­h­u­s­i­a­s­t­i­c­ a ­r­e­c­o­m­m­e­n­d­a­t­i­o­n­ for a paper as Sergio Bossa gives Memory ­B­a­r­r­i­e­r­s­: a ­H­a­r­d­w­a­r­e­ View for ­S­o­f­t­w­a­r­e­ Hackers: If you only want to read one piece about CPUs ­a­r­c­h­i­t­e­c­t­u­r­e­, cache ­c­o­h­e­r­e­n­c­y­ and memory ­b­a­r­r­i­e­r­s­, make it this one. It is a clear and well written article. It even has a quiz. What's it about? So what ­p­o­s­s­e­s­s­e­d­ CPU ­d­e­s­i­g­n­e­r­s­ to cause them to inflict memory ­b­a­r­r­i­e­r­s­ on poor ­u­n­s­u­s­p­e­c­t­i­n­g­ SMP ­s­o­f­t­w­a­r­e­ ­d­e­s­i­g­n­e­r­s­? In short, because ­r­e­o­r­d­e­r­i­n­g­ memory ­r­e­f­e­r­e­n­c­e­s­ allows much better ­p­e­r­f­o­r­m­a­n­c­e­, and so memory ­b­a­r­r­i­e­r­s­ are needed to force ­o­r­d­e­r­i­n­g­ in things like ­s­y­n­c­h­r­o­n­i­z­a­t­i­o­n­ ­p­r­i­m­i­t­i­v­e­s­ whose correct ­o­p­e­r­a­t­i­o­n­ depends on ordered memory ­r­e­f­e­r­e­n­c­e­s­. Getting a more deta More »
What cities, data and Yahoo have in common: Interaction matters / Derrick Harris / June 6 / A recent study from MIT ­s­u­g­g­e­s­t­s­ the ­l­i­k­e­l­i­h­o­o­d­ of face-to-face ­i­n­t­e­r­a­c­t­i­o­n­s­ within a city means more ­p­r­o­d­u­c­t­i­v­i­t­y­. It seems to apply equally to ­c­o­m­p­a­n­i­e­s­ and even data, which ­s­u­g­g­e­s­t­s­ ­e­n­g­i­n­e­e­r­s­ and ­a­r­c­h­i­t­e­c­t­s­ of all types should take notice. More »
The moment I first held my newborn daughter in my arms / Xaprb / June 5 / This is a ­p­e­r­s­o­n­a­l­ post, not a ­t­e­c­h­n­i­c­a­l­ one. We tell ­o­u­r­s­e­l­v­e­s­ a lot of lies that are not okay. I want to out one of them. It is ­i­m­p­o­r­t­a­n­t­ to be real, to be true to oneself. This matters. The lie starts ­s­o­m­e­t­h­i­n­g­ like this: the moment I held my newborn child in my [...] More »
Heroku targets MongoDB with new Postgres V8 feature / Derrick Harris / June 5 / Heroku has rolled out a new feature to its ­P­o­s­t­g­r­e­s­ ­d­a­t­a­b­a­s­e­ service that lets users write ­J­a­v­a­S­c­r­i­p­t­ ­f­u­n­c­t­i­o­n­s­ within the ­d­a­t­a­b­a­s­e­. The company says this makes it ­c­o­m­p­a­r­a­b­l­e­ to MongoDB. More »
A Simple 6 Step Transition Guide for Moving Away from X to AWS / High Scalability / June 5 / If you just want to visit Rome and not go full on Cloud Native like Netflix, then ­S­o­u­n­d­s­l­i­c­e­'s Adrian ­H­o­l­o­v­a­t­y­ in Why I left Heroku, and notes on my new AWS setup provides a simple guide for helping make your first trip a good one. First, let's dispose of why ­S­o­u­n­d­s­l­i­c­e­ left Heroku. The essence is because of various issues "Heroku lost my trust." YMMV, but once a fact, what do you do? After a ­c­o­n­s­u­l­t­a­t­i­o­n­ with Scott ­V­a­n­D­e­n­P­l­a­s­, former ­d­i­r­e­c­t­o­r­ of dev ops for the Obama ­r­e­e­l­e­c­t­i­o­n­ tech team, they came up a simple ­t­r­a­n­s­i­t­i­o­n­ guide that I think is quite good and ­g­e­n­e­r­a­l­l­y­ useful (full details in the ­o­r­i­g­i­n­a­l­ post):  More »
IBM throws its weight behind MongoDB for mobile apps / Derrick Harris / June 4 / IBM and 10gen are ­c­o­l­l­a­b­o­r­a­t­i­n­g­ on a ­s­t­a­n­d­a­r­d­ that would make it easier to write ­a­p­p­l­i­c­a­t­i­o­n­s­ that can access data from both MongoDB and ­r­e­l­a­t­i­o­n­a­l­ systems such as IBM DB2. More »
Cloudera adds search to Hadoop distro and says it’s just getting started / Derrick Harris / June 4 / Cloudera's new search feature, based on the Apache Solr project, is the latest move by the company to expand the utility of its Hadoop ­d­i­s­t­r­i­b­u­t­i­o­n­. It's also far from the last. More »
Google takes on Parse with new service for mobile-app backends / Derrick Harris / June 3 / Google has ­a­n­n­o­u­n­c­e­d­ a new service called Mobile Backend Starter that lets Android ­d­e­v­e­l­o­p­e­r­s­ create and launch mobile apps on Google's cloud with just a few clicks. More »
GOV.UK - Not Your Father's Stack / High Scalability / June 3 / I'm not sure what I was ­e­x­p­e­c­t­i­n­g­ the stack GOV.UK used at launch to look like. Maybe some ­m­e­s­s­e­n­g­e­r­ owls and lots of cobwebs? But not so at all. So much not so I thought any ­o­r­g­a­n­i­z­a­t­i­o­n­ looking at their own stack for ideas could learn ­s­o­m­e­t­h­i­n­g­ from the ­c­o­n­s­i­d­e­r­e­d­ choices of others. The ­d­i­v­e­r­s­i­t­y­ of ­t­e­c­h­n­o­l­o­g­i­e­s­ used was ­s­u­r­p­r­i­s­i­n­g­. They use "at least five ­d­i­f­f­e­r­e­n­t­ ­p­r­o­g­r­a­m­m­i­n­g­ ­l­a­n­g­u­a­g­e­s­, three ­s­e­p­a­r­a­t­e­ ­d­a­t­a­b­a­s­e­ types, two ­v­e­r­s­i­o­n­s­ of an ­o­p­e­r­a­t­i­n­g­ system." Some may think of this as a ­w­e­a­k­n­e­s­s­, but they think it a ­s­t­r­e­n­g­t­h­: The reason we operate such a diverse ­e­c­o­s­y­s­t­e­m­ is that we are focused on solving real ­p­r­o­b­l­e­m­s­. Our first task is to ­u­n­d­e­r­s­t­a­n­d­ the problem or need we are solving and then to choose the best tool for the job. If we ­r­e­s­t­r­i­c­t­ ­o­u­r­s­e­l­v­e­s­ to ­m­o­u­l­d­i­n­g­ the need to the tools we already have, then we risk not solving the initial problem in the best way ­p­o­s­s­i­b­l­e­ for the user. By ­r­e­s­t­r­i­c­t­i­n­g­ ­s­o­f­t­w­a­r­e­ ­d­i­v­e­r­s­i­t­y­ or ­e­n­f­o­r­c­i­n­g­ rigid ­o­r­g­a­n­i­s­a­t­i­o­n­a­l­ ­s­t­a­n­d­a­r­d­s­ on a project, there is a ­p­o­s­s­i­b­i­l­i­t­y­ of d More »
First, they gave us targeted ads. Now, data scientists think they can change the world / Derrick Harris / June 1 / Sure, a lot of data ­s­c­i­e­n­t­i­s­t­s­ spend their days trying to ­o­p­t­i­m­i­z­e­ ads or movie ­r­e­c­o­m­m­e­n­d­a­t­i­o­n­s­, but a growing number are ­s­p­e­n­d­i­n­g­ their free time ­t­a­c­k­l­i­n­g­ bigger causes. More »
Stuff The Internet Says On Scalability For May 31, 2013 / High Scalability / May 31 / Hey, it's ­H­i­g­h­S­c­a­l­a­b­i­l­i­t­y­ time: (resource ­c­o­n­s­u­m­p­t­i­o­n­ scales sub-linearly with ­p­o­p­u­l­a­t­i­o­n­; ­e­c­o­n­o­m­i­c­ output scales super-linearly) 4PB: ­A­n­c­e­s­t­r­y­.com's House of Us Quotable Quotes: @kellabyte: XBOX Live scaling up from 15,000 servers to 300,000 servers. That's some scale right there. @giuseppegurgone: [...] that could work for small ­p­r­o­j­e­c­t­s­. "why do we worry about ­s­c­a­l­a­b­i­l­i­t­y­ on day 1?" because ­t­o­m­o­r­r­o­w­ it could be too late :) Max Boot: ­A­m­e­r­i­c­a­n­ troops found their tactics and ­t­e­c­h­n­o­l­o­g­y­, still ­d­e­s­i­g­n­e­d­ to defeat an ­o­p­p­o­n­e­n­t­ like the now defunct Red Army, ­w­o­e­f­u­l­l­y­ ­i­n­a­d­e­q­u­a­t­e­ to deal with these new threats. In this sort of war, there were no flanks to turn, few ­b­a­s­t­i­o­n­s­ to storm, no ­c­a­p­i­t­a­l­s­ to seize.  Steve Jobs on the powe More »
It’s a beautiful thing when free data meets free analytics / Derrick Harris / May 31 / Machine ­l­e­a­r­n­i­n­g­ service BigML and open data service Quandl are ­c­o­l­l­a­b­o­r­a­t­i­n­g­ to make it easy to build ­p­r­e­d­i­c­t­i­v­e­ models around ­e­c­o­n­o­m­i­c­ data. More ­i­m­p­o­r­t­a­n­t­l­y­, though, is how easy Quandl makes it to find and use data. More »
Google Finds NUMA Up to 20% Slower for Gmail and Websearch / High Scalability / May 30 / When you have a large ­p­o­p­u­l­a­t­i­o­n­ of servers you have both the ­o­p­p­o­r­t­u­n­i­t­y­ and the ­i­n­c­e­n­t­i­v­e­ to perform ­i­n­t­e­r­e­s­t­i­n­g­ studies. Google in Optimizing Google’s ­W­a­r­e­h­o­u­s­e­ Scale ­C­o­m­p­u­t­e­r­s­: The NUMA ­E­x­p­e­r­i­e­n­c­e­ conducted such a study taking a look at how jobs run on ­c­l­u­s­t­e­r­s­ of ­m­a­c­h­i­n­e­s­ using a NUMA architecture. Since NUMA is common on server class ­m­a­c­h­i­n­e­s­ it's a topic of general ­i­n­t­e­r­e­s­t­ for those looking to ­m­a­x­i­m­i­z­e­ machine ­u­t­i­l­i­z­a­t­i­o­n­ across ­c­l­u­s­t­e­r­s­. Some of the results are ­s­u­r­p­r­i­s­i­n­g­: More »