SpaceCurve raises $10M to make sense of our streams of location data / Derrick Harris / June 18 / SpaceCurve has raised another $10 million for its database technology designed to make sense of massive amounts of data from sensors, social media, mobile devices and other streaming sources. More »
Cloudera names new CEO; Mike Olson now chairman and chief strategy officer / Derrick Harris / June 18 / Cloudera CEO Mike Olson is now chief strategy officer and chairman of the board, while former Arcsight CEO Tom Reilly will take over the Hadoop pioneer's leadership role. More »
Disconnect 2 for Safari and Opera / Disconnect / June 18 / Disconnect 2 has gotten more than a quarter of a million new users since launching two months ago and is now available for – count them – four browsers. We’ve just released Disconnect 2 for Safari and Opera, which have … Continue reading → More »
Meet the heavyweight team behind Heavybit, a community for developer-focused startups / Derrick Harris / June 18 / Heroku co-founder James Lindenbaum is launching a new effort focused on giving developer-focused startups the tools they need to scale. He has recruited some significant peers and investors as advisers to teach member companies the ropes. More »
Scaling Mailbox - From 0 to One Million Users in 6 Weeks and 100 Million Messages Per Day / High Scalability / June 18 /
You know your product is doing well when most of your early blog posts deal with the status of the waiting list of hundreds of thousands of users eagerly waiting to download your product. That's the enviable position Mailbox, a free mobile email management app, found themselves early in their release cycle.
Hasn't email been done already? Apparently not. Mailbox scaled to one million users in a paltry six weeks with a team o More »
Accel Partners putting another $100M toward big data apps / Derrick Harris / June 17 / Accel has launched its Big Data Fund 2, a followup on the equally large fund the venture capital firm started in November 2011. Rather than seeking products that target data scientists, it wants those targeting business users. More »
GE wants to use artificial intelligence to predict the future of hospitals / Derrick Harris / June 17 / GE Healthcare is pushing a system called Corvix for doing agent-based simulations on complex problems. In India, the technology simulated a population of 80 million people in order to determine the best places to build medical facilities. More »
How to Measure Organizational Accountability: Who Left the (Cloud) Lights On ? / Cloudyn / June 17 / When it comes to your electric bill, there are clear indicators of who is racking up costs. Running the HVAC overnight when no one is around or leaving the lights on over the weekend will clearly flag ‘electricity violators’. But … Continue Reading > More »
A real-time bonanza: Facebook’s Wormhole and Yahoo’s streaming Hadoop / Derrick Harris / June 14 / This week, both Facebook and Yahoo detailed new efforts to manage real-time data flows within their myriad systems. Yahoo's work is an open source implementation of Storm designed to run on the same cluster as Hadoop and even share resources. More »
Stuff The Internet Says On Scalability For June 14, 2013 / High Scalability / June 14 / (Steve Gibson on Security Now with a plausible analysis of the tech behind PRISM)
27 billion: WhatsApp messages per day
Richard Feinman: If Bill Gates walks into a bar, on average, everybody in the bar is a millionaire.
@giltene: Financial Programmers get paid by the CPU cycle. Web developers get paid by the developer cycle.
@johndmitchell: “It’s the I/O, stupid.”
@PatrickMcFadin: More people registering at #cassandra13 No worries. Adding more nodes at the reg desk.
Google does it with science. Here's a list of Excellent Papers for 2012 from Googlers and friends. Most relevant for HS readers is a wildly inspiring Spanner: Google's Globally-Distributed Database. But you'll also see the influence of extracting knowledge from data to do subtle and interesting things. On that theme is Improving Photo Search: A Step Across the Sema More »
Busting 4 Modern Hardware Myths - Are Memory, HDDs, and SSDs Really Random Access? / High Scalability / June 13 / "It’s all a numbers game – the dirty little secret of scalable systems"
Martin Thompson is a High Performance Computing Specialist with a real mission to teach programmers how to understand the innards of modern computing systems. He has many talks and classes (listed below) on caches, buffers, memory controllers, processor architectures, cache lines, etc.
His thought is programmers do not put a proper value on understanding how the underpinnings of our systems work. We gravitate to the More »
Ex-Yahoo CTO launches Altiscale, hardcore Hadoop as a service / Derrick Harris / June 13 / Raymie Stata spent seven years working on the guts of Hadoop as a VP, chief architect and CTO at Yahoo. His new Hadoop startup, called Altiscale, has raised a $12 million from some prominent investors. More »
Djancocon 2013 call for papers open / Xaprb / June 13 / Are you a Django user? There’s an upcoming Django conference in Chicago in a few months, and I know they’re looking for speakers with MySQL experience in particular. One suggestion the organizers have floated is a talk on MySQL: I’m looking for someone to give at least one MySQL talk there. In particular, I would [...] More »
Why Google is the big data company that matters most / Derrick Harris / June 12 / Google Image Search just got a whole lot better, and the company's purpose-built machine learning system infrastructure is a big reason why. No surprise, Jeff Dean helped build it. More »
Sponsored Post: Apple, Two Sigma, Cendea, RAMP, Blurocket, Incapsula, Dow Jones, Surge, Rackspace, aiCache, Aerospike, Percona, ScaleOut, New Relic, LogicMonitor, AppDynamics, ManageEngine, Site24x7 / High Scalability / June 12 /
An exciting opportunity for a Software Engineer to join Apple's Messaging Services team. We build the cloud systems that power some of the busiest applications in the world. You'll have the opportunity to explore a wide range of technologies, developing the server software that is driving the future of messaging and mobile services. To apply please visit this URL.
Two Sigma is building our next generation research environment, and we're looking for a functional programmer with a passion for distributed computing. We're scaling machine learning and operations research to tens of thousands of CPUs. Please send qualifications to email@example.com.
Have strong LAMP skills and the ability to make systems really scale?   More »
In a cloud computing economy, the NSA is bad for business / Derrick Harris / June 11 / A lot of Americans might say they support NSA surveillance of their online activities, but many other people -- including folks overseas -- aren't so thrilled. Can these laws withstand pressure from a tech lobby concerned about lost profits from fleeing users? More »
SageCloud gets $10M to build Facebook cold storage for the rest of us / Derrick Harris / June 11 / A storage startup called SageCloud is looking to deliver low-cost backup storage to the masses who want Facebook-like cold storage without resorting to tape, cloud services or building their own gear. More »
Stealth-mode 28msec wants to build a Tower of Babel for databases / Derrick Harris / June 11 / 28msec is about to exit stealth mode and take the covers off its database platform that lets users query data from any source in real time. More »
WalmartLabs keeps getting smarter with Inkiru acquisition / Derrick Harris / June 10 / WalmartLabs has acquired a predictive analytics startup called Inkiru to bolster its ability to create better customer experiences through data. The division of Walmart was created in 2011 on a foundation of big data. More »
The 10 Deadly Sins Against Scalability / High Scalability / June 10 /
In the moral realm there may be 7 deadly sins, but scalability maven Sean Hull has come up Five More Things Deadly to Scalability that when added to his earlier 5 Things That are Toxic to Scalability, make for a numerologically satisfying 10 sins again scalability:
Slow Disk I/O – RAID 5 – Multi-tenant EBS. Use RAID 10, it provides good protection along with good read and write performance. The design of RAID 5 means poor performance and long repair times on failure. On AWS consider Provisioned IOPS as a way around IO bottlenecks.
Using the database for Queuing. The database may seem like the perfect place to keep work queues, but under load locking and scanning overhead kills performance. Use specialized products like RabbitMQ and SQS to remove this bottleneck.
Using Database for full-text searching. Search seems like another perfect database feature. At scale search doesn More »
Under the covers of the NSA’s big data effort / Derrick Harris / June 7 / There's much debate still to be had over the NSA's recently uncovered data-collection practices, but some of the technologies underlying them are out in the open. Here's what we know already. More »
Stuff The Internet Says On Scalability For June 7, 2013 / High Scalability / June 7 / Hey, it's HighScalability time:
(Ever feel like everyone has already climbed your Everest?)
Trillion Particles, 120,000 cores, and 350 TBs: Lessons Learned From a Hero I/O Run on Hopper
@PenLlawen: @spolsky In my time as a scalability engineer, I’ve seen plenty of cases where optimisation was left too late. Even harder to fix.
@davidlubar: Whoever said you can't fold a piece of paper in half more than 7 times probably forget to unfold it each time. I'm up to 6,000.
deno: A quick comparison of App Engine vs. Compute Engine prices shows that App Engine is at best 10x more expensive per unit of RAM.
Fred Wilson: strategy is figuring out what part of the market the company wants to play in, how it goes to market, and how it differentiates itself in the market it is about what you are going to do and importantly what you More »
Here’s how the NSA analyzes all that call data / Derrick Harris / June 6 / How does the NSA analyze all the data it's collecting from cell phone users? With a massive database system built with just such scale and workloads in mind. More »
Paper: Memory Barriers: a Hardware View for Software Hackers / High Scalability / June 6 /
It's not often you get so enthusiastic a recommendation for a paper as Sergio Bossa gives Memory Barriers: a Hardware View for Software Hackers: If you only want to read one piece about CPUs architecture, cache coherency and memory barriers, make it this one.
It is a clear and well written article. It even has a quiz. What's it about?
So what possessed CPU designers to cause them to inﬂict memory barriers on poor unsuspecting SMP software designers?
In short, because reordering memory references allows much better performance, and so memory barriers are needed to force ordering in things like synchronization primitives whose correct operation depends on ordered memory references.
Getting a more deta More »
What cities, data and Yahoo have in common: Interaction matters / Derrick Harris / June 6 / A recent study from MIT suggests the likelihood of face-to-face interactions within a city means more productivity. It seems to apply equally to companies and even data, which suggests engineers and architects of all types should take notice. More »
The moment I first held my newborn daughter in my arms / Xaprb / June 5 / This is a personal post, not a technical one. We tell ourselves a lot of lies that are not okay. I want to out one of them. It is important to be real, to be true to oneself. This matters. The lie starts something like this: the moment I held my newborn child in my [...] More »
A Simple 6 Step Transition Guide for Moving Away from X to AWS / High Scalability / June 5 /
If you just want to visit Rome and not go full on Cloud Native like Netflix, then Soundslice's Adrian Holovaty in Why I left Heroku, and notes on my new AWS setup provides a simple guide for helping make your first trip a good one.
First, let's dispose of why Soundslice left Heroku. The essence is because of various issues "Heroku lost my trust." YMMV, but once a fact, what do you do?
After a consultation with Scott VanDenPlas, former director of dev ops for the Obama reelection tech team, they came up a simple transition guide that I think is quite good and generally useful (full details in the original post):
IBM throws its weight behind MongoDB for mobile apps / Derrick Harris / June 4 / IBM and 10gen are collaborating on a standard that would make it easier to write applications that can access data from both MongoDB and relational systems such as IBM DB2. More »
Cloudera adds search to Hadoop distro and says it’s just getting started / Derrick Harris / June 4 / Cloudera's new search feature, based on the Apache Solr project, is the latest move by the company to expand the utility of its Hadoop distribution. It's also far from the last. More »
Google takes on Parse with new service for mobile-app backends / Derrick Harris / June 3 / Google has announced a new service called Mobile Backend Starter that lets Android developers create and launch mobile apps on Google's cloud with just a few clicks. More »
GOV.UK - Not Your Father's Stack / High Scalability / June 3 /
I'm not sure what I was expecting the stack GOV.UK used at launch to look like. Maybe some messenger owls and lots of cobwebs? But not so at all. So much not so I thought any organization looking at their own stack for ideas could learn something from the considered choices of others.
The diversity of technologies used was surprising. They use "at least five different programming languages, three separate database types, two versions of an operating system." Some may think of this as a weakness, but they think it a strength:
The reason we operate such a diverse ecosystem is that we are focused on solving real problems. Our first task is to understand the problem or need we are solving and then to choose the best tool for the job. If we restrict ourselves to moulding the need to the tools we already have, then we risk not solving the initial problem in the best way possible for the user. By restricting software diversity or enforcing rigid organisational standards on a project, there is a possibility of d More »
First, they gave us targeted ads. Now, data scientists think they can change the world / Derrick Harris / June 1 / Sure, a lot of data scientists spend their days trying to optimize ads or movie recommendations, but a growing number are spending their free time tackling bigger causes. More »
Stuff The Internet Says On Scalability For May 31, 2013 / High Scalability / May 31 / Hey, it's HighScalability time:
(resource consumption scales sub-linearly with population; economic output scales super-linearly)
4PB: Ancestry.com's House of Us
@kellabyte: XBOX Live scaling up from 15,000 servers to 300,000 servers. That's some scale right there.
@giuseppegurgone: [...] that could work for small projects. "why do we worry about scalability on day 1?" because tomorrow it could be too late :)
Max Boot: American troops found their tactics and technology, still designed to defeat an opponent like the now defunct Red Army, woefully inadequate to deal with these new threats. In this sort of war, there were no flanks to turn, few bastions to storm, no capitals to seize.
Steve Jobs on the powe More »
It’s a beautiful thing when free data meets free analytics / Derrick Harris / May 31 / Machine learning service BigML and open data service Quandl are collaborating to make it easy to build predictive models around economic data. More importantly, though, is how easy Quandl makes it to find and use data. More »
Google Finds NUMA Up to 20% Slower for Gmail and Websearch / High Scalability / May 30 /
When you have a large population of servers you have both the opportunity and the incentive to perform interesting studies. Google in Optimizing Google’s Warehouse Scale Computers: The NUMA Experience conducted such a study taking a look at how jobs run on clusters of machines using a NUMA architecture. Since NUMA is common on server class machines it's a topic of general interest for those looking to maximize machine utilization across clusters.
Some of the results are surprising: