IPython:为交互式计算提供丰富的架构; Kibana:可视化日志和时间标记数据; Matplotlib:Python绘图;
Metricsgraphic.js:建立在D3之上的库,针对时间序列数据进行最优化; NVD3:d3.js的图表组件;
Peity:渐进式SVG条形图,折线和饼图;
Plot.ly:易于使用的Web服务,它允许快速创建从热图到直方图等复杂的图表,使用图表Plotly的在线电子表格上传数据进行创建和设计; Plotly.js:支持plotly的开源JavaScript图形库;
Recline:简单但功能强大的库,纯粹利用JavaScript和HTML构建数据应用; Redash:查询和可视化数据的开源平台; Shiny:针对R的Web应用程序框架; Sigma.js:JavaScript库,专门用于图形绘制; Vega:一个可视化语法;
Zeppelin:一个笔记本式的协作数据分析; Zing Charts:用于大数据的JavaScript图表库。
物联网和传感器
TempoIQ:基于云的传感器分析; 2lemetry:物联网平台; Pubnub:数据流网络;
ThingWorx:ThingWorx 是让企业快速创建和运行互联应用程序平台;
IFTTT:IFTTT 是一个被称为 “网络自动化神器” 的创新型互联网服务,它的全称是 If this then that,意思是“如果这样,那么就那样”;
Evrythng:Evrythng则是一款真正意义上的大众物联网平台,使得身边的很多产品变得智能化。
文章推荐
NoSQL Comparison(NoSQL 比较)- Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison;
Big Data Benchmark(大数据基准)- Redshift, Hive, Shark, Impala and Stiger/Tez的基准;
The big data successor of the spreadsheet(电子表格的大数据继承者) – 电子表格的继承者应该是大数据。
论文
2015 – 2016
2015 – Facebook – One Trillion Edges: Graph Processing at Facebook-Scale.(一兆边:Facebook规模的图像处理) 2013 – 2014
2014 – Stanford – Mining of Massive Datasets.(海量数据集挖掘)
2013 – AMPLab – Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. (Presto: 稀疏矩阵的分布式机器学习和图像处理)
2013 – AMPLab – MLbase: A Distributed Machine-learning System. (MLbase:分布式机器学习系统)
2013 – AMPLab – Shark: SQL and Rich Analytics at Scale. (Shark: 大规模的SQL 和丰富的分析)
2013 – AMPLab – GraphX: A Resilient Distributed Graph System on Spark. (GraphX:基于Spark的弹性分布式图计算系统)
2013 – Google – HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm. (HyperLogLog实践:一个艺术形态的基数估算算法)
2013 – Microsoft – Scalable Progressive Analytics on Big Data in the Cloud.(云端大数据的可扩展性渐进分析)
2013 – Metamarkets – Druid: A Real-time Analytical Data Store. (Druid:实时分析数据存储)
2013 – Google – Online, Asynchronous Schema Change in F1.(F1中在线、异步模式的转变)
2013 – Google – F1: A Distributed SQL Database That Scales. (F1: 分布式SQL数据库)
2013 – Google – MillWheel: Fault-Tolerant Stream Processing at Internet Scale.(MillWheel: 互联网规模下的容错流处理)
2013 – Facebook – Scuba: Diving into Data at Facebook. (Scuba: 深入Facebook的数据世界)
2013 – Facebook – Unicorn: A System for Searching the Social Graph. (Unicorn: 一种搜索社交图的系统)
2013 – Facebook – Scaling Memcache at Facebook. (Facebook 对 Memcache 伸缩性的增强) 2011 – 2012
2012 – Twitter – The Unified Logging Infrastructure for Data Analytics at Twitter. (Twitter数据分析的统一日志基础结构)
2012 – AMPLab –Blink and It’s Done: Interactive Queries on Very Large Data. (Blink及其完成:超大规模数据的交互式查询)
2012 – AMPLab –Fast and Interactive Analytics over Hadoop Data with Spark. (Spark上 Hadoop数据的快速交互式分析)
2012 – AMPLab –Shark: Fast Data Analysis Using Coarse-grained Distributed Memory. (Shark:使用粗粒度的分布式内存快速数据分析)
2012 – Microsoft –Paxos Replicated State Machines as the Basis of a High-Performance Data Store. (Paxos的复制状态机——高性能数据存储的基础) 2012 – Microsoft –Paxos Made Parallel. (Paxos算法实现并行)
2012 – AMPLab – BlinkDB:BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data.(超大规模数据中有限误差与有界响应时间的查询) 2012 – Google –Processing a trillion cells per mouse click.(每次点击处理一兆个单元格)
2012 – Google –Spanner: Google’s Globally-Distributed Database.(Spanner:谷歌的全球分布式数据库)
2011 – AMPLab –Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters.(Scarlett:应对MapReduce集群中的偏向性内容)
2011 – AMPLab –Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.(Mesos:数据中心中细粒度资源共享的平台)
2011 – Google –Megastore: Providing Scalable, Highly Available Storage for Interactive Services.(Megastore:为交互式服务提供可扩展,高度可用的存储) 2001 – 2010
2010 – Facebook – Finding a needle in Haystack: Facebook’s photo storage.(探究Haystack中的细微之处: Facebook图片存储)
2010 – AMPLab – Spark: Cluster Computing with Working Sets.(Spark:工作组上的集群计算)
2010 – Google – Storage Architecture and Challenges.(存储架构与挑战)
2010 – Google – Pregel: A System for Large-Scale Graph Processing.(Pregel: 一种大型图形处理系统)
2010 – Google – Large-scale Incremental Processing Using Distributed
Transactions and Noti?cations base of Percolator and Caffeine.(使用基于Percolator 和 Caffeine平台分布式事务和通知的大规模增量处理)
2010 – Google – Dremel: Interactive Analysis of Web-Scale Datasets.(Dremel: Web规模数据集的交互分析)
2010 – Yahoo – S4: Distributed Stream Computing Platform.(S4:分布式流计算平台) 2009 – HadoopDB:An Architectural Hybrid of MapReduce and DBMS
Technologies for Analytical Workloads.(混合MapReduce和DBMS技术用于分析工作负载的的架构)超人学院
2008 – AMPLab – Chukwa: A large-scale monitoring system.(Chukwa: 大型监控系统)
2007 – Amazon – Dynamo: Amazon’s Highly Available Key-value Store.(Dynamo: 亚马逊的高可用的关键价值存储)
2006 – Google – The Chubby lock service for loosely-coupled distributed systems.(面向松散耦合的分布式系统的锁服务)
2006 – Google – Bigtable: A Distributed Storage System for Structured Data.(Bigtable: 结构化数据的分布式存储系统)
2004 – Google – MapReduce: Simplied Data Processing on Large Clusters.(MapReduce: 大型集群上简化数据处理)
2003 – Google – The Google File System.(谷歌文件系统)
视频
数据可视化 数据可视化之美
Noah Iliinsky的数据可视化设计
Hans Rosling’s 200 Countries, 200 Years, 4 Minutes