Back to

Well, if you need some logos, we provide these:

svg ·  png
Red Clever Cloud logo
svg ·  png
svg ·  png
White Clever Cloud logo
svg · png
White Clever Cloud logo
svg · png
Épisode publié le December 10, 2021

#60 Databricks et Snowflake aboient, haproxy passe et graphe les ramassages des miettes

Animé par Julien Durillon

Message à caractère informatique
Message à caractère informatique
#60 Databricks et Snowflake aboient, haproxy passe et graphe les ramassages des miettes

Dans cet épisode de référence, bien que difficile à numéroter, nous recevons Mathieu Ancelin et nous parlons : de la levée de fonds de PlanetScale, de la guerre entre Databricks et Snowflakes, des 20 ans de HAProxy, des ressources query dans SQL, des meilleurs performances de nos vieux claviers PS/2, d’un outil Apple Open Source pour l’analyse de logs de Garbage Collection, avant de finir en musique… indice : c’est pas du Mozart.

Timecodes & liens :

00:00:00 Présentation des guests

00:02:00 PlanetScale is now generally available

  • $50M in Series C funding
  • Vitess’s maintainers(Clustering systems for MySQL)
    • Connection pooling
    • Query de-duping
    • Transaction rate manager
    • Virtually seamless dynamic re-sharding

00:06:29 La guerre entre Databricks et Snowflake

  • Databricks concurrent de Snowflake (data platform)
  • TPC Transaction Processing Performance Council
    • 1980s was the era of the Wild West of database benchmarking
  • TPC-DS benchmark record for its data lakehouse technology
    • TPC-DS is a decision support benchmark with audited results.
    • 99 queries over 100TB 3.108 seconds
    • 2.7x faster and between 7x and 12x better in terms of price performance
    • outperformed the previous record by 2.2x holded by Alibaba
  • removing the DeWitt Clause from our service terms
    • a new provision that prohibits people (researchers, scientists, or competitors) from publishing any benchmarks of Oracle’s database systems.
    • It’s a primary reason you often see benchmarks comparing anonymous systems, sometimes referred to as DBMS-X, in research papers and why many benchmarks are completely absent.
    • Benchmark clause @ Google
      • a) must seek permission before disclosing results
      • b) must provide repro details
      • c) must allow Google to test my services

  • Resultat assez proche de Databricks
  • Price is more like 267 compared 1791
  • Signup and try with already loaded dataset
  • Removed Dewitt Clause

  • New score from Snowflake includes a self-published prebaked data set
  • Using official TPC-DS dataset, time to execute 99 queries is doubled

00:13:00 Willy Tarreau on HAProxy at Its 20-Year Anniversary

  • HAProxy has 20 years old
    • Happy birthday
    • Willy Tarreau founder of haproxy
  • Timeline (
    • 1999 – Zprox
      • Testing tool developed to gauge how an application would perform when facing lots of clients with 28 Kbps modems
    • 2000 – Zprox
      • Modified to include regex-based header rewriting, with a minimalistic config language.
      • Keywords introduced: listen, server
    • 2001 – HAProxy 1.0
      • Developed to offload traffic from hardware load balancers
    • 2002 – HAProxy 1.1
      • Simple round-robin scheduler
      • Simple health checks
      • Improved its logging capabilities
      • Cookie insertion
    • 2003 – HAProxy 1.2
      • IPv6 support on the client side
      • Replaced the wait-queue linked list with a rbtree
      • Introduced maxconn setting
      • Keywords introduced: except, forwardfor
    • 2006 – HAProxy 1.3
      • Elastic Binary Trees within the internal scheduler
      • TCP scripting
      • Explicit source port ranges
      • Interface binding
    • 2009 – HAProxy 1.4
      • RDP protocol support with server stickiness and user filtering
      • Client-side Keep-Alive
      • HTTP authentication support
      • ACL-based persistence
    • 2010 – HAProxy 1.5
      • SSL and compression
      • Data sampling
      • Server-side keep-alive
      • DDoS protection
    • 2015 – HAProxy 1.6
      • Lua scripting
      • Server-side connection multiplexing
      • Dynamic buffer allocation
      • Replaced zlib with an in-house stateless implementation
    • 2016 – HAProxy 1.7
      • HAProxy Runtime API
      • Server hot reconfiguration
      • SPOE (Stream Processing Offload Engine)
      • Introduced content processing agents & multi-type certs
    • 2017 – HAProxy 1.8
      • Improved HAProxy Runtime API
      • Introduced multithreading
      • Dynamic Cookies
      • New mux layer
    • 2018 – HAProxy 1.9
      • HTX – internal HTTP representation
      • End-to-End HTTP/2 (enabling gRPC)
      • Improved queue priority control
      • Improved the scalability of the multithreading feature
    • 2019 – HAProxy 2.0 & 2.1
      • Cloud-native threading and logging
      • HAProxy Kubernetes Ingress Controller
      • HAProxy Data Plane API
      • Prometheus exporter
      • Dynamic SSL Certificate Updates
      • FastCGI
      • Improved debugging
      • Native Protocol Tracing
    • 2020 – HAProxy 2.2 & 2.3
      • Fully Dynamic SSL Certificate Storage
      • Improved idle connection management
      • Native Response Generator
      • Health Check System Overhaul
      • Syslog Protocol (UDP/TCP)
      • OpenTracing (SPOE)
      • SSL/TLS Environments
      • Improved Cache
    • 2021 – HAProxy 2.4
      • HTTP/2 WebsocketsFIX & MQTT Protocols
      • Dynamic SSL Certificate Storage
      • Built-in OpenTracing
      • DNS TCP Resolution
  • Outage Google Cloud Load Balancer

00:21:00 Forecasting SQL query resource usage with machine learning

  • SQL powered by Presto over Hadoop and Google cloud storage
  • Problems:
    • Avoid overwhelmed due to resource-consuming queries
    • Data system customers would like to know the resource consumption estimation of their queries.
    • Elastic scaling needs query resource usage forecasting.
  • Forecast typically done with query plans generated from SQL engines
  • the system
    • learns from plain SQL statements
    • builds machine learning models from historical query request logs without dependency on any SQL engines or query plans.

Carte pci facebook:

Spending $5K to learn how database indexes work

00:34:00 Les claviers PS/2 sont plus performants que l’USB

  • Les claviers PS/2, ça envoie des interruptions en direct au processeur.
  • L’USB c’est du poll régulier. Si tu bourrines la touche “flèche droite” entre deux polls (quelques milliseconds, ça dépend de si ton processeur est chargé ou pas), un seul appui est enregistré.

00:41:15 GCGC : Garbage Collection Graph Collector by Apple

  • Jupyter notebook interface to analyze GC log files.
  • 17 generated plots, which analyze latency, concurrent and stop-the-world events, heap information, allocation rates, frequencies of events, and event summaries
  • The tool uses Jupyter notebook data visualization allows for easy customization of provided plots.
  • Supports for Shenandoah/G1/Zgc (some edge cases are known and not handled automatically)

00:46:00 douce musique de fin : MESHUGGAH – Bleed

Julien Durillon

Dev & ops @ Clever Cloud since 2010. I'm responsible for the invoices your receive 😬

Check out our Blog

We write about programming, security and Clever Cloud products

Join thousands of developers already using Clever Cloud. And try it for free!

Create your account now to start your apps and collaborate within a few minutes. For free with 20€ offered.

By signing up, your agree to our terms of use
Sign in with Github