Career Summary

My name is tao yu. I am a master student in the DBIIR Lab at Renmin University of China (RUC). During the first year in lab, i did research on Graph computing, and i built actordas, a destributed graph computing engine based on AKKA.

Project&Work Experience


2019 - Present

Entity centric analysis requires that the finest granularity of log data should be stored for later process. In the meantime, real-time analyzing requires that the log data should be loaded into data warehouse as soon as possible.


Paraflow enables users to load data into data warehouse (like HDFS) as soon as possible, and provides real-time analysis over data of being loaded and in the warehouse.

  • Fast loading. Paraflow utilizes a well-designed pipeline for efficient data loading.
  • No loss staging. Kafka is used in the system to stage data without losses.
  • Real-time analysis. Lightweight indices are used in Paraflow to speed up queries..

Technologies used:

  • Kafka
  • Hadoop
  • PostgresSQL
  • Zookeeper
  • Presto
  • Maven


2018 - 2019

The large-scale graph data is computed by distributed graph data, and the communication between sub-graph nodes is computed by using the asynchronous communication mode of Actor to complete the specific graph algorithm. For the framework of GraphX graph calculation of standard Spark ecology, with the same computing load, the computing performance is improved, about five times.


Independently complete the design and development of the calculation process

Technologies used:

  • AKKA
  • Maven

IT Developer

2018 - 2019

Familiar with software product iteration development, participate in system compatibility adaptation, memory occupancy dynamic monitoring and visual storage coding development, and continuous integration of billing system.

Technologies used:

  • Maven/Ant
  • JavaScript
  • Java Socket
  • Shell

Web Developer(Intern)

2017 - 2018

Based on WiFi probes, Hadoop is used to build the data analysis platform, Spark framework is used to analyze the data collected by probes quickly, Tomcat vertical cluster server is used to build the data receiving server to realize high concurrent processing, and Echarts (v3.0) chart is used to realize the visualization of the Web interface, which can realize the trend analysis of shopping Mall passenger flow and machine learning. Regression prediction and decision support are provided. Mainly responsible for the compilation of front and back-end codes for the construction of the overall business logic. The first inventor of the patent and software work A Big Business Data Analysis System has been accepted and made public.

Technologies used:

  • Hadoop/Spark
  • Echarts
  • BootStrap
  • HBase

Skills & Tools


  • Angular
  • JavaScript


  • Java/Structs2
  • Scala/AKKA
  • Python
  • Hadoop/Spark


  • Shell
  • Code Review
  • Git
  • Unit Testing
  • Maven
  • PhotoShop
  • kafka
  • Socket


  • MSc in Big Data Science and Engineering
    Renmin University Of China
    2019 - 2022
  • BSc IoT Engineering
    Hohai University
    2015 - 2019


  • Award for Software designing
    China software Cup Undergraduate software design competition undergraduate group two prize
  • National Scholarship
    for Undergraduates in 2016


  • Chinese (Native)
  • English (Professional)


  • Climbing
  • Singing
  • Cooking