There are several families of sources that we unify and standardize to construct a comprehensive sense of the workforce: (i) online professional profiles: full time series of an individual, including dates of employment, companies, titles, education, skills, and more; (ii) job postings: posting dates, job descriptions, skill, and salaries for each position; (iii) employee sentiment: both numerical scores and full text of reviews, parsed into topics; (iv) freelance platform data: projects, pay rates, skills, and activities; (v) layoff notices: from state filings and crowdsourced layoff trackers; (vi) government data: published labor statistics, domestically and globally, in addition to immigration filings, census data, social security administration data, and voter registration data; and (vii) firmographic data: subsidiary-parent relationships of companies, industry classifications, and mapping to financial identifiers.
Our data covers all public and private companies, which amounts to roughly 4.5 million companies worldwide when taking into account subsidiaries and holding companies.
We typically deliver data on the 15th of every month, which represents the previous month’s counts, inflows, outflows, and salaries. For example, by January 15th, we would deliver the data that occurred over the course of December. Some datasets, like postings and sentiment data, can be delivered daily.
We deliver our workforce data in three ways: (i) API access, where data is delivered by request. (ii) Data feed, where large portions of the data are delivered monthly. And (iii) our self service dashboard, where you can instantly start getting insights.
There are two ways to consider subsidiaries: in a pro-forma approach or a point-in-time approach. In the pro-forma approach, when a company acquires or merges with another company, our standard delivery is to include the subsidiary as a part of the parent company, even retroactively, before the acquisition took place. For example, we include all of Whole Foods employees as part of Amazon, during 2008-2016, even though Amazon only acquired Whole Foods in 2017. The advantage of the pro-forma approach is that we avoid seeing an artificial spike in headcounts when the acquisition or spinoff occurs. In the point-in-time approach, we split inflows and outflows into organic flows, which are the result of corporate actions. The benefit of this added complexity is that it is more reflective of corporate structures in the past, and more suitable for back-testing. We can also present subsidiaries as separate entities, avoiding the issue altogether.
The ultimate challenge in estimating headcount is that no good source of ground truth exists. Company’s 10-K’s typically report on the W-2 employees of their company, but omit contingent workers, which in some cases can make up the majority of a company’s workforce. This makes the headcount from a 10-K filing a lower bound on the true employee headcount. Company pages on online professional sites are also a lower-bound estimate of the true employee headcount because not everyone in a company has an online profile. Those counts are also subject to errors in mapping, fake profiles, and a set of other errors. We make an effort to track all members of a workforce (employees and contingent workers) to provide a more comprehensive view of company composition and trends, as well as adjust for mapping issues and fake profiles. We also impose sampling weights on each profile, to adjust for the likelihood of it being reported in our sample. For that reason, our headcounts are often (depending on industry) higher than a company’s reported counts and higher than what may be visible on company profile pages. We believe this is more accurate and reflective of the true workforce of a company.
Because we collect data from online professional profiles, we face an issue of data being drawn from a non-representative sample of the underlying population. To resolve this, we impose sampling weights to adjust for roles and locations that are underrepresented. For example, if 9/10 engineers in San Francisco have an online profile, when we see an engineer located in San Francisco, we count them as 1.1. Similarly, if 1/3 nurses in Germany have an online profile, we count them as 3. This allows us to approximate, as closely as possible, the true estimate of the underlying population.
There is a lag that exists from the time someone gets a new job to the time they report that change on their profile. To account for these lags, we use a nowcasting method, called BSTS (Bayesian Structural Time Series) where we look at multiple snapshots of a company’s history to estimate the distribution of time it takes for updates to result in observed changes. We then use those lags to predict what is actually observed and strip away the reduction in inflows and outflows that comes from lags. The inflows and outflows that are predicted in the absence of the information loss resulting from lags is an unbiased estimate of the inflows and outflows that will be retroactively revealed.
Can’t find an answer to your question?
Ask us anything, we are here to help you to deal with anything.