Caltech Stem

Posts

Showing posts from February, 2020

Technical Journal - Dataset

February 26, 2020

Ramya Korlakai Vinayak, a graduate from Caltech under Prof. Babak Hassibi, graciously offered us a dataset that was labeled using MTurk (Amazon Mechanical Turk), a crowdsourcing marketplace. The folder contained four MatLab files: Birds5EdgeQuery300workers , Birds5TriangleQuery285workers , Dogs3EdgeQuery300workers , and Dogs3TriangleQuery320workers . The naming convention is as follows: Animal Classified + Number of Breeds + Comparison Query Method + Number of workers. It is still vague as to which one we will use for our project but they are all very similar in some sense. Each file has 6 variables: Adj : Adjacency matrix under standard rules (and -1 = unobserved) AdjWithMultiples : Adj with entries observed more than once due to random queries CAdj : logical, mapping matrix to tell if an edge is observed (1) or not (0) count : number of edges observed (no multiples) groundtruth : actual breed of the animal m : number of nodes in the graph Above demonstrates how

Journal - 2/3-18/20

February 19, 2020

This week has been interesting, to say the least. Something I have achieved over the break was finishing my application for the BU Rise program. The summer research camp in Boston is very intriguing to me because I get to work with professors on their research. This activity is something I might want to pursue in the near future which is why I really want to attend the program. My application has five essays that took me quite the time to complete. I asked friends and teachers to look over my pieces. This caused my works to under several iterations but in the end, it was all worth it. Wish me luck! Above is a video describing what students do during BU RISE. I guess there's also the fact that I want to visit Boston during the summer. I came from Massachusetts to California after 8th grade. It's been 3+ years and you bet I'm missing my home. I think I even had a plan for when I get there. First I want to meet up with my friends at Lexington and afterward go see family

Technical Journal - Research Papers

February 12, 2020

On the Efficiency of Data Collection for Crowdsourced Classification : The paper aims to explain the accuracy gaps between popular collection policies (state-action mapping) on crowdsourced data. The quality of crowdsourced data is often highly variable so it is suspected that it is the cause. However, studies show that the policies used to collect such data have a strong impact on the accuracy of the system. The first theoretical explanation of the accuracy gaps is that the non-adaptive uniform allocation, and the adaptive uncertainty sampling and information gain maximization. Done with the representation of the collection process in terms of random walks in the log-odds (log of odds of guessing correctly/incorrectly) domain, the author derives lower and upper bounds on the accuracy of the policies. The bounds accurately tell us by how much do two adaptive policies trump a non-adaptive one. It is believed that techniques used by them are applicable to additional scenarios su

Sick Week - 1/20-31/20

February 03, 2020

This past week I was hit by a cold that knocked me out for nearly a week. I missed two and a half days of school because of it. I believe I caught it from Matthew Lee after he showed up to robotics practice after his episode of the flu. I suspect this because the day after I felt absolutely horrible. Catching a cold/flu at this time was especially not good because of recent news about the Coronavirus spreading to the U.S. I got worried but my symptoms weren't matching up so I was in the clear. I spent the next few days quarantining myself in my room to not get my peers and family sick. Below is a video detailing what the Coronavirus is in case you haven't seen the news: Also during this week, I learned about the legal strife between Caltech and Apple which resulted in a total victory for the educational institution. The bout was regarding Apple infringing on Caltech's Wifi chip that can correctly perform tasks on encrypted data. Apple's lawyers stat

Crowd-source Image Labeling [Technical Journal]

February 01, 2020

The question at hand is how to increase crowdsourcing image labeling efficiency which has been an apparent issue in Machine Learning. Steps taken have revolved around solving highly variable accuracy of the data. The paper " Crowdsourcing – A Step Towards Advanced Machine Learning " briefly talks on how has crowdsourcing is used in the real world. " On the Efficiency of Data Collection for Crowdsourced Classification " uses the most common data policies for crowdsourced classification on datasets and compares their accuracy to conclude each policy's lower and upper bounds as well as adaptive policies out-perform those that do not. " Distance Metric Learning: A Comprehensive Survey " lists out algorithms utilizing a distance metric and its summary, works, strengths, and weaknesses. " Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning " touches upon the issue that data clustering has trouble find