Live Transcript Delivery
-
Upload
grzegorz-kolpuc -
Category
Engineering
-
view
196 -
download
1
Transcript of Live Transcript Delivery
Live Transcript Delivery
Scalable real-time text streaming
GRZEGORZ GIEDROJCGRZEGORZ KOLPUC
Grzegorz Giedrojc Grzegorz Kolpuc
Street Events MasterGrzegorz Giedrojc
Event Platform Grzegorz Kolpuc
Financial & Risk
IP & Science
Legal News
Tax & Accounting
Technology & OPS
60,000+ EMPLOYEES
10,000+IN TECHNOLOGY
1200+ EMPLOYEES IN GDYNIA
150+IN TECHNOLOGY
Eikon
Transcripts
Core Abstractions
● EVENT● TRANSCRIPT● BRIEF● Live TRANSCRIPT
Event
Earnings Release
Earnings CallEarnings Presentation
Event
Transcript
Brief
Transcript delivery
1.Planning2.Preparations3.Streaming4.Final version5.Audio synchronization
Planning
Preparations
Theoretical:➔ Company data➔ Potential participants➔ Historical experience
Technical:➔ TranXP➔ Dragon
22
TranXP
Dragon Training
Additional Equipment
Final Version
Audio Synchronization
External Service
Audio sync. Transcript / Brief
mp3
Transcript Brief
Movie demo
Architecture
Streaming - High Lvl View TranXP External
Vendor
Blackbird
Event Platform
Customers
Blackbird
WSWSWSWS
APPAPPAPPAPP
Blackbird
40 cores256 GB
12 cores12 GB
LT Entry Point
TranXP
Dragon
Company Info
WSWSWSWS
APPAPPAPPAPP
Blackbird
Event Collection Tool
TranXP
Dragon
Company Info
WSWSWSWS
APPAPPAPPAPP
Blackbird
ECT
Audio Sync
TranXP
Dragon
Company Info
ECT
WSWSWSWS
APPAPPAPPAPP
Blackbird
Audio Sync.
Background process synchronize audio with
text
Transcript Distribution
TranXP
Audio Sync.
Dragon
Company Info
ECT
WSWSWSWS
APPAPPAPPAPP
Blackbird
Internal LT Broker
Distribute LT internally
Where we are so far...
● Transcripts○ Created by TR○ Delivered by vendors
● Internal distribution only (inside TR)
Our goal
● Deliver live Transcripts to external clients○ In distributed manner○ Scalable○ HA
● Evolution , not Revolution
Event Platform
● Main ‘Events’ provider across TR● Aggregates various contents and serve in as events
● Should serve Live Transcript to internal and external client
Transcript Receiver
Transcript Receiver
Messaging ??
● Base features:○ Message order○ Parallel consumers
● Distributed and scalable● Fault tolerant● No data loss (replication)● May repeat messages (Nice to have)
Solution?
Publish-subscribe messagingDistributed and scalablePartitioned Topics
Each partition may be consumed independently
No server side ACKsAcknowledge responsibility is on consumer
side (increase throughput)Messages stored as distributed commit log
consumer can start reading from any point of time (basing on offsets)
Apache Kafka
Apache Kafka
Transcript Receiver
Kafka Cluster
Broker
Broker
Broker
What we need now?
● Processing engine to consume feed○ Very low latency○ Open source○ Stream grouping (no race condition)
● Distributed and scalable○ Configurable parallelism
● Fault tolerant● No data loss
Solution?
➔ Popular real-time computation systems➔ Massively used in production by many
companies◆e.g. Twitter, Yahoo, Spotify
➔ Distributed, scalable and fault-tolerant➔ Open-sourced in 2011 by Nathan Marz➔ Written in Clojure
Apache Storm
➔ Execute topology (storm program) in distributed manner
➔ Topology is running as set of Spouts and Bolts
➔ Single message is represented as Tuple➔ Unbound chain of tuples is a Stream
Core Concepts
Storm Cluster
Node 1
Storm Daemons - Nimbus
Nimbus
Node 2
Node 3
Distributes code around the cluster
Assigns tasks to machinesMonitors for failures
Storm Cluster
Node 1
Storm Daemons - Supervisor
Nimbus
Supervisor
Node 2
Supervisor
Node 3
Supervisor
Starts and stops worker processes based on what Nimbus has assigned to it
Executes a subset of a topology - spouts and/or bolts
Storm Cluster
Node 1
Fail Fast
Nimbus
Supervisor
Node 2
Supervisor
Node 3
Supervisor
● Runs under supervision
Storm Cluster
Node 1
Workers
Nimbus
Supervisor
W1
W1W1
W1
Node 2
Supervisor
W1
W1W1
W1
Node 3
Supervisor
W1
W1W1
W1
Processing unitsExecute bolts and spouts
as Executors and tasks
Storm Parallelism
Storm Cluster
Node 1
Storm Topology
Transcript Receiver
Kafka Cluster
Broker
Broker
Broker
Nimbus
Supervisor
W1
W1W1
W1
Node 2
Supervisor
W1
W1W1
W1
Node 3
Supervisor
W1
W1W1
W1
Storm Cluster
Node 1
Client Delivery
Transcript Receiver
Kafka Cluster
Broker
Broker
Broker
Nimbus
Supervisor
W1
W1W1
W1
Node 2
Supervisor
W1
W1W1
W1
Node 3
Supervisor
W1
W1W1
W1
Ext
erna
l Clie
nts
Live Transcript BridgeStorm Cluster
Node 1
LTB
Transcript Receiver
Kafka Cluster
Broker
Broker
Broker
Nimbus
Supervisor
W1
W1W1
W1
Node 2
Supervisor
W1
W1W1
W1
Node 3
Supervisor
W1
W1W1
W1
Ext
erna
l Clie
nts
What we achieved● High speed live transcript delivery○ Messages are sent with subseconds latency
● Processing in distributed manner○ Replication○ Fault tolerance
● Ready for more customers and more transcripts○ All components may be scaled horizontally
QUESTIONS ?
Thank You