This is the R version of Madhu's tutorial notebook for accessing MySQL and Hive from PAWS Internal using Python.
Assuming you've got production access and SSH configured (see Discovery/Analytics on Office wiki for more examples of SSH configs), you need to create an SSH tunnel like you would if you wanted to query Analytics-Store on your local machine:
ssh -N notebook1001.eqiad.wmnet -L 8000:127.0.0.1:8000
Then navigate to localhost:8000 in your favorite browser and login with your LDAP credentials (username/password that you use to login to Wikitech).
By the way, if you want a quick way to get into PAWS Internal, you can make an alias (in, say, ~/.bash_profile) that creates the SSH tunnel and opens the browser:
alias paws="ssh -N notebook1001.eqiad.wmnet -L 8000:127.0.0.1:8000" alias PAWS="ssh -N notebook1001.eqiad.wmnet -L 8000:127.0.0.1:8000 & open http://localhost:8000/"
Then in terminal:
paws if you want to go to localhost:8000 yourself.
PAWS will launch your default browser and output a numeric process ID. When you want to close the tunnel:
Sys.setenv("http_proxy" = "http://webproxy.eqiad.wmnet:8080") Sys.setenv("https_proxy" = "http://webproxy.eqiad.wmnet:8080") options(repos = c(CRAN = "https://cran.rstudio.com/"))
If you want, you can put those 3 lines in ~/.Rprofile and they will be executed every time you launch R.
install.packages("devtools") devtools::install_git("https://gerrit.wikimedia.org/r/wikimedia/discovery/wmf") # uaparser: devtools::install_github("ua-parser/uap-r", configure.args = "-I/usr/include/yaml-cpp -I/usr/include/boost")
mysql_connect function in wmf will try to look for some common MySQL config files (those vary between stat1002, stat1003, and notebook1001). It'll let you know if it encounters any problems but I doubt you'll have any. Try querying with
log_tables <- wmf::mysql_read("SHOW TABLES;", "log") head(log_tables)
Fetched 375 rows and 1 columns.
hive has been configured on notebook1001, we don't need to do anything extra to use the
query_hive function in wmf:
wmf_tables <- wmf::query_hive("USE wmf; SHOW TABLES;") head(wmf_tables)