My name is tao yu. I am a master student in the DBIIR Lab at Renmin University of China (RUC). During the first year in lab, i did research on Graph computing, and i built actordas, a destributed graph computing engine based on AKKA.
Entity centric analysis requires that the finest granularity of log data should be stored for later process. In the meantime, real-time analyzing requires that the log data should be loaded into data warehouse as soon as possible.
Paraflow enables users to load data into data warehouse (like HDFS) as soon as possible, and provides real-time analysis over data of being loaded and in the warehouse.
- Fast loading. Paraflow utilizes a well-designed pipeline for efficient data loading.
- No loss staging. Kafka is used in the system to stage data without losses.
- Real-time analysis. Lightweight indices are used in Paraflow to speed up queries..
The large-scale graph data is computed by distributed graph data, and the communication between sub-graph nodes is computed by using the asynchronous communication mode of Actor to complete the specific graph algorithm. For the framework of GraphX graph calculation of standard Spark ecology, with the same computing load, the computing performance is improved, about five times.
Independently complete the design and development of the calculation process
Familiar with software product iteration development, participate in system compatibility adaptation, memory occupancy dynamic monitoring and visual storage coding development, and continuous integration of billing system.
- Java Socket
Based on WiFi probes, Hadoop is used to build the data analysis platform, Spark framework is used to analyze the data collected by probes quickly, Tomcat vertical cluster server is used to build the data receiving server to realize high concurrent processing, and Echarts (v3.0) chart is used to realize the visualization of the Web interface, which can realize the trend analysis of shopping Mall passenger flow and machine learning. Regression prediction and decision support are provided. Mainly responsible for the compilation of front and back-end codes for the construction of the overall business logic. The first inventor of the patent and software work A Big Business Data Analysis System has been accepted and made public.
Skills & Tools
- Code Review
- Unit Testing
MSc in Big Data Science and EngineeringRenmin University Of China2019 - 2022
BSc IoT EngineeringHohai University2015 - 2019
Award for Software designingChina software Cup Undergraduate software design competition undergraduate group two prize
National Scholarshipfor Undergraduates in 2016
- Chinese (Native)
- English (Professional)