This is the R version of Madhu's tutorial notebook for accessing MySQL and Hive from PAWS Internal using Python.
Assuming you've got production access and SSH configured (see Discovery/Analytics on Office wiki for more examples of SSH configs), you need to create an SSH tunnel like you would if you wanted to query Analytics-Store on your local machine:
ssh -N notebook1001.eqiad.wmnet -L 8000:127.0.0.1:8000
Then navigate to localhost:8000 in your favorite browser and login with your LDAP credentials (username/password that you use to login to Wikitech).
By the way, if you want a quick way to get into PAWS Internal, you can make an alias (in, say, ~/.bash_profile) that creates the SSH tunnel and opens the browser:
alias paws="ssh -N notebook1001.eqiad.wmnet -L 8000:127.0.0.1:8000"
alias PAWS="ssh -N notebook1001.eqiad.wmnet -L 8000:127.0.0.1:8000 & open http://localhost:8000/"
Then in terminal: PAWS
or paws
if you want to go to localhost:8000 yourself.
PAWS
will launch your default browser and output a numeric process ID. When you want to close the tunnel: kill [pid]
Sys.setenv("http_proxy" = "http://webproxy.eqiad.wmnet:8080")
Sys.setenv("https_proxy" = "http://webproxy.eqiad.wmnet:8080")
options(repos = c(CRAN = "https://cran.rstudio.com/"))
If you want, you can put those 3 lines in ~/.Rprofile and they will be executed every time you launch R.
We need to install the devtools and Discovery's wmf packages. If you want to work with user agents in R, I've included the command to install the uaparser package.
install.packages("devtools")
devtools::install_git("https://gerrit.wikimedia.org/r/wikimedia/discovery/wmf")
# uaparser:
devtools::install_github("ua-parser/uap-r", configure.args = "-I/usr/include/yaml-cpp -I/usr/include/boost")
The mysql_connect
function in wmf will try to look for some common MySQL config files (those vary between stat1002, stat1003, and notebook1001). It'll let you know if it encounters any problems but I doubt you'll have any. Try querying with mysql_read
:
log_tables <- wmf::mysql_read("SHOW TABLES;", "log")
head(log_tables)
Since hive
has been configured on notebook1001, we don't need to do anything extra to use the query_hive
function in wmf:
wmf_tables <- wmf::query_hive("USE wmf; SHOW TABLES;")
head(wmf_tables)