Built a column-oriented Database infrastructure to better handle queries in a cache-friendly manner. The program first receives queries, then scans the columns of the data, and find out the lines satisfying task requirements.

The data used is SingaporeWeather.csv, which stores the historical records for weather data in Singapore, measuring the weather information every 30 minutes from January 1, 2002 to December 31, 2021, spanning almost a million records.

The columns of the records are listed as follows:
• id: the increasing index of weather records.
• Station: “Changi” or “Paya Lebar”, represents the site of the observation.
• Timestamp: the timestamp of the observation, in format YYYY-MM-DD hh:mm of UTC+8 time zone.
• Temperature: air temperature in degrees celcius (°C).
• Humidity: relative humidity in %.

The program searches for the respective monthly maximum and minimum values of temperature and humidity columns with a given year and location condition using a columnar database.

The output should be in the following form:
Date: the corresponding date of the Value, in format YYYY-MM-DD of UTC+8 time zone.
• Station: “Changi” or “Paya Lebar”, represents the station of the Value.
• Category: “Max Temperature”, “Min Temperature”, “Max Humidity”, or “Min Humidity”, represents the meaning of the Value.
• Value: the value of temperature or humidity.

The following diagrams depict the inner workings of the program for a more efficient query.

Columnar DB implementation diagram 1
Columnar DB implementation diagram 2
Columnar DB implementation diagram 3
Columnar DB implementation diagram 4


Please contact me directly for a demonstration of the database code.

Back to Projects