November 19, 2024
One of the biggest perks of working with Noah is learning coding tricks.
%timeit is a measure of the execution time of small code snippets in Python. Below, we can see that when we search for the company name 'Walgreens,' for the same number of companies, finding 'Walgreens' from a list is faster than finding it from a data frame.
String Matching
I think matching company names without identifiers is one of the important skills for empirical finance researchers. As KPSS (2017) did, starting within the data we want to match if people are reporting firm names differently, so far have the best matches.
I used to use the FuzzyWuzzy library for string matching. After working on a project with an engineer, I started to use RapidFuzz which is way faster because it uses C++ in the background. Currently, we are trying different matching measures. But, as a first step, if one company appears x times (let's say 10 times), we can create a subgroup with those and first match other company names within the data set to these ones. So, if one of them is misspelled, it will be easy to match.
Information Economics
Today's Brown Bag paper was about price discovery speed and market microstructure paper... My friend from the ODT department is married to a computer science professor. He is working on information theory, and one time in their place we were talking about my shadow insider trading idea. He mentioned those things I read in 1990 information economics papers: "Entropic causal inference, mutual information, Rényi entropy."
Noah suggested that I should have a list of things I would like to learn one day. I think information theory, and information economics (not talking about game theory, cheap talk, signaling, etc. but how information reflects prices, asset pricing perspective, and economically connected companies) are some of those things I want to spend time on, even though they are very old now. Just for fun.